When text and data mining, it is important to keep ethical principles at the forefront of all research practices.
One can quite easily get content from a Google image search, scrape it from a website, or download it from an academic digital archive. The process is imagined as a neutral act—one of taking some thing that is already offered up for consumption. The term “data mining” offers a telling example of how colonial legacies of collecting physical materials from local places and peoples are grafted onto digital content. Content is imagined as open, reusable, and unhinged from communities, individuals, or families who may have intimate ties to the materials.
-Kimberly Christen (2018) in Digital Heritage and the Ethics of Sharing Indigenous Knowledge Online
Consider the history and context of the data you are mining before you embark on your project. Ensure you have appropriate permissions and review relevant researcher guides. The below links provide further information on doing research with materials related to Indigenous Peoples.
Consider the source, context, identities and platforms from which you are mining text and data. It is your ethical responsibility to minimize or eliminate the risk of re-identification of individuals Personally Identifying Information (PII). Many seemingly anonymized pieces of PII within the text corpora may possibly be correlated with adjacent information which may enable re-identification.
Without due diligence in ethical practices in de-identifying information and seeking consent to use quotes and pictures, you risk making "...sensitive personal information identifiable beyond the context it was intended for, and under some conditions, the publication of these data may expose users to harm" (William, Burnap, & Sloan, 2017)
When taking advantage of text and data mining for research, most platforms have guidelines regarding data privacy and your responsibilities therein. For example, the Twitter Developer Platform provides detailed policies and agreements information on your responsibilities as a researcher.
When considering data privacy, researchers should not assume that all users have read and understood the Terms of Service on the host site (Beninger et al., 2014). Some social media platforms explicitly note that users' public posts are available for third party use, and not all users may fully understand this when making posts. Sensitive personal information, information on associations, and communication which are overly personal or harmful in nature may or may not be knowingly placed online.