Skip to Main Content

Text and Data Mining (TDM): Best Practice Guidelines

Text and Data Mining Best Practices

Throughout the entire Text and Data Mining (TDM) process, it is important to prioritize Best Practices.

 

Always be mindful of privacy, data security, source legitimacy, and copyright/license compliance throughout your research!

Best Practice tips

We can help you contact database publishers.

If you want access to information that you can’t easily access through a database, we can help you get in touch with the right people at the database publisher and identify what options you might have. Options may include choices of delivery methods and negotiations regarding authorized uses. We can also provide assistance throughout the process of working with the publisher.

 

This may take time. 

If you are considering a text or data mining project, you should contact us early in your process as publishers may be slow to respond to requests for data and negotiations may take time, particularly since this is not a topic that all publishers have considered.

 

Every source is different.

Every source may have different terms of use, specified researcher tools, and possible applications of retrieved data. It's important to be mindful of these differences when investigating resources.

 

Open Access alternatives may exist.

Depending on the nature of your research, there may be Open Access journals, databases or datasets that you can use. We can help you to identify whether such a source would work for your project.

 

Best Practice tips continued

Always verify permission.

When accessing a resource for text and data mining purposes, always ensure you have adequate permission and are not breeching any licenses or terms of service. Unless otherwise indicated on the Database and Resources page, always consult with the Copyright Specialists before initiation of text and data mining on library-licensed resources. Accessing licensed resources without permission or inappropriately may jeopardize not only your research and access to that resource, but may the entire campus' access to that resource.

 

Publishers may charge for access to their data.

Some publishers offer access to text and data mining projects only if the user pays an additional fee. Alternatively, some may sell their data separately specifically for this purpose. We can help you to identify the most economical and efficient method of getting the access you need.

 

We may be able to advise on data management and preservation during your project.

Once you’ve accessed the data and your project is underway, we may be able to help advise you with data management and preservation information needs on a case-by-case basis.

 

Be mindful of etiquette.

While a publisher or service may allow for TDM, some approaches such as bulk scraping and non-rate-limited querying may place a significant burden on the data provider's servers. Best practice is to check the requirements of the data provider and comply with their preferences regarding mining activities.