Back in 2014, I contributed a short article for the book “Social Media in Social Research: Blogs on Blurring the Boundaries”. Recently, I found myself going through these same points with someone, who is looking at using social media data for a research project. In case this is useful for others in similar situation, here is that piece that I wrote all those years ago. It’s a bit dated, but the main idea is still relevant.
Social media are increasingly seen as promising sources of data in social sciences research. Sociologist Duncan J. Watts even compared the value of social media platforms for social scientists to the effect of Galileo’s telescope on the physical sciences. The level of enthusiasm is that high.
And there are signs that social science researchers are, indeed, embracing social media platforms as part of their approach to collecting data. For instance, research by Williams, Terras and Warwick shows an increasing number of academic papers focusing on Twitter messages and Twitter users.
Just like any other tool, however, social media have characteristics that may make them unsuitable for some research contexts. This article provides a brief overview of issues to take into account when assessing whether to use social media in your research.

Sampling
The profile of social media users differs from that of the general population. For instance, research by Hecht and Stephens (2014) suggests that there is a skew towards urban users and urban perspectives. Therefore, social media data may not be suitable for studies researching heterogeneous social groups.
It is, however, suitable for projects that investigate the behaviour of social media users, and for studies that focus on these users’ perceptions or attitudes, and which do not seek to generalise the findings to the overall population.
These platforms also give access to participants in dispersed geographic locations, in a much more cost-effective way than using face-to-face or even telephone technology. Though, it may be difficult to confirm the identity of the respondents.
Privacy
Social media platforms are, traditionally, public. Even if you choose a closed setting, the data will be held by a third party (for instance, Facebook) whose data protection policies you can not control and, most likely, do not understand. The consequences of the public nature of social media platforms are that:
- You may be unable to recruit research participants who do not wish to make their comments public
- You can not ensure the confidentiality of the data collected
- You will not be able to prevent further uses of the data generated through your project
Topics
The public nature of social media conversations also means that these platforms are not suitable to explore topics of a sensitive nature, both personal and commercial. In addition, these platforms are not suitable to discuss topics that require introspection.
Response rate
People use different social media platforms for different purposes, and this is likely to impact on their willingness to respond to invitations to participate in research. For instance, Twitter is a platform largely used for conversations, often between people who have never met each other. In contrast, Facebook and LinkedIn are largely used to build or maintain relationships. Therefore, it is possible that social media users feel more inclined to respond to a research invitation from someone they do not know on Twitter than on Facebook or LinkedIn.
Another issue to take into consideration is that, even though the use of social media platforms for data collection is an increasingly popular practice, it is, nonetheless, a nascent one. As such, it is very difficult to estimate response rates, which is a source of risk in your research plan. Likewise, it is difficult to benchmark the response rate obtained in your study and, therefore, to assess the quality and representativeness of your results.
Social desirability bias
Another consequence of the public nature of social media platforms is that the content shared is open to scrutiny. This means that social media users may be inclined to say things or behave in ways that are socially acceptable and/or that protect their ego.
Group interaction
If research participants’ responses and behaviours are visible to each other, this visibility may give rise to group dynamic behaviours, whereby one person’s actions trigger a reaction in another person, and so on.
Moreover, the awareness of other participants’ views may prevent the critical consideration of alternative perspectives. In turn, this may lead individuals to adopt views and behaviours that are consensual and compliant with the majority.
Interactivity
Early evidence is that it is difficult to sustain a conversation with research participants over social media platforms. The difficulty in establishing rapport with the respondent, the lack of visual cues and, in some cases, the asynchronous nature of communication are likely to result in reduced levels of interactivity between the researcher and the participants, limiting the number of questions that respondents are likely to answer before losing interest.
Length of responses
The characteristics of the specific social media platform that you use for data collection will influence the type of data that you get. For instance, Twitter limits messages to 140 characters, meaning that you will obtain fairly short answers. However, in my own experience, participants find inventive ways of overcoming Twitter’s 140 character limit. They may use abbreviated forms of words (e.g., FB instead of “Facebook”) or expressions (e.g., QT instead of “Quality Time”). In addition:
In some instances, users linked their responses to blog posts, articles, web pages, or other online content with additional information; in other cases, users offered to continue the conversation via e-mail so they could expand on their positive experiences. (Source: Canhoto and Clark, 2013)
Data analysis
Collecting written posts and updates means that you can avoid the cost and time required to record and transcribe an interview or observation, as well as avoid the problems of transcription accuracy.
However, because social media users tend to display varying levels of colloquial language, tend to use abbreviations, and tend to use emoticons, it is very difficult to analyse the data with qualitative data analysis software. This difficulty is compounded by the fact that respondents are also likely to provide information in various formats, such as texts, hyperlinks, pictures, videos and audio files.
Anonymity
The fact that data shared on the Internet is likely to remain available and accessible for a long time means that it is extremely difficult to maintain anonymity. In an interview transcript it is usually enough to replace the name of the interviewee and the organisation with an identifier such as participant 1 or organisation A. With social media data, however, it is possible for readers to enter the segment of text that you used in your paper in a search engine, and find the identity of your respondent.
Some researchers try to disguise the source of a particular quote, for instance, by changing the wording. However, this sort of manipulation may introduce bias in the data. Moreover, it may not be suitable for studies where the focus is how people express themselves, or how vocabulary is used.
In summary, social media have great potential for social science. But just like a telescope isn’t of much use with the wrong lighting or when pointing the wrong way, so social media will not add much value to your project if used under the wrong conditions or assumptions.
One thought on “Issues to consider when using social media to collect qualitative data”