Text mining or text analysis are terms for analyzing documents (books, tweets, news reports, etc) with the aid of software. Text analysis is a methodological approach and discipline agnostic. Text analysis is performed on corpora, collections of machine-readable text that are designed to answer specific kinds of questions
Text and data mining is highly customized work, with varying timelines from start to conclusion. To carry out a successful project, you will need both access to data and the skills to interact with that data. The skills needed are determined by the nature of the data and what you want to do with it.
When starting a project, you need to consider:
Text mining is a "non-consumptive" use of the materials provided. That means that you are using the words from within the text but not necessarily the text as presented on a written page. Therefore, a researcher should not assume that when mining the text, a full-text article or book will also be available for reading or other consumption.
Appropriate Use of Purchased or Licensed Resources
A library subscription DOES NOT imply that text mining is permitted. Some licenses have text mining language, and some will require permission.
Most of the library's electronic resources are governed by license agreements that limit use to the University of Arkansas, Fayetteville community or to individuals who are physically present at the Libraries' facilities.
Regardless of licensing permissions, some text mining techniques can create server issues for providers. Make sure the methodology to be used follows the provider's preferences. Also, some preferred methods may need assistance from the provider.
You may need to contact the service provider. Here are some details to communicate in your request:
Need advice for the permission letter or for information about what our licenses permit? Contact the Data Services librarian for your subect librarian for assistance.
*Language adapted from similar guides at Yale and Emory.
Before you begin any data mining project, you should be aware of the limitations surrounding copyright and fair use (especially if you are dealing with data that may be under copyright). This area of copyright law is still under development. Growing support is being given for non-consumptive use of resources for computational analysis.
The Association of Research Libraries (ARL) and The International Federation of Library Associations (IFLA) both provide advice and statements on data and text mining, which you can find below.
In the news: