Academic researchers are favoring more non-traditional methods of data gathering, turning to social media sites like Facebook and Twitter for information. Computer scientists from McGill University and Carnegie Mellon University, however, warn of possible pitfalls when working with big data sets from social media.

According to Derek Ruths, a School of Computer Science assistant professor from McGill University, erroneous results will have big implications, what with thousands of studies using social media data each year. These studies are in turn used to inform and justify decisions in various organizations, both private and public, as well as the government so there is no room for error.

Ruths worked with Jürgen Pfeffer from the Institute for Software Research in Carnegie Mellon and the results of their study were published in the journal Science Magazine. According to their research, several issues about using social media data needs attention.

The issues Ruths and Pfeffer identified in their paper include:

  • Different social media platforms have different users (potentially misrepresenting the population sample);
  • Publicly available data don't always represent a platform's overall data accurately (researchers don't know how and when social media sites filter information); 
  • How social media platforms are designed can dictate user behavior, and in effect, the kind of measurable behavior (without a "dislike" button in Facebook, only positive responses can be detected through "likes");
  • Not all users are real people (data gathered then includes information "fed" by bots and spammers); and
  • Results are usually taken from easy-to-classify topics, events and users, making methods appear more accurate than the reality (inferred political orientation for typical Twitter users barely meet 65 percent accuracy but studies on politically active users claim accuracy rates of up to 90 percent).

Fortunately, these issues have well-known solutions developed from other fields like machine learning, statistics and epidemiology. Ruths adds that what's common in all the issues was that researchers needed to be more aware of the information they are gathering and analyzing, whether or not the data they have is reliable.

Back in 1948, an infamous headline pushed social researchers to hone their standards and techniques, bringing the field to what it is today. Issues with social media data are posing challenges now, albeit different from 65 years ago, providing social researchers an opportunity yet again to set better standards.

"By tackling the issues we face, we'll be able to realize the tremendous potential for good promised by social media-based research," said Ruths.

ⓒ 2021 All rights reserved. Do not reproduce without permission.