Social Media Data on Human Behavior May Not Be Accurate: Experts

Social media may have significant problems predicting human behavior, due to faulty data being fed into the systems.

Governments, corporations and nongovernment organizations (NGO's) constantly monitor billions of postings placed on social media site like Facebook and Twitter each day. Using this information, they attempt to constantly monitor the "pulse" of the general public, attempting to keep track or what is popular.

Sophisticated algorithms and carefully designed methods of conducting computer-based investigations are still subject to faulty raw data. Computer engineers have a term -- Gigo -- which stands for "Garbage In, Garbage Out."

One of the most famous examples of an organization misreading the public occurred long before the Internet age.

"On 3 November 1948, the day after Harry Truman won the United States presidential elections, the Chicago Tribune published one of the most famous erroneous headlines in newspaper history: 'Dewey Defeats Truman.' The headline was informed by telephone surveys, which had inadvertently undersampled Truman supporters," researchers wrote in an article detailing their study.

Rather than eliminating the practice of telephone polling, that historic error became the impetus for more accurate polling methods and statistical analysis.

"Now, we're poised at a similar technological inflection point. By tackling the issues we face, we'll be able to realize the tremendous potential for good promised by social media-based research," Derek Ruths from the School of Computer Science at McGill University, said.

Researchers discovered several challenges that can face organizations attempting to draw conclusions about public behavior.

Web sites attract different demographics, which could affect results of studies, according to researchers. Pinterest, for instance, is dominated by females, aged 25 to 34. Most organizations do not correct for age and gender in their analysis, which could skew results.

Just as language can affect the meaning behind communication, the layout of a Web site can also alter how people respond to posts. Facebook, for example, does not offer a "dislike" button, making interpretation of "likes" difficult.

Public reporting by social media networks can, potentially, be altered by engineers at the Web site, and researchers usually have no way of accessing raw data streams.

Fake profiles, spambots, and other fake "users" on social media sites are also incorporated into social analyses, further deteriorating the accuracy of data collected online.

Even determining political affiliation of users of social media has proven challenging, with just 65 percent accuracy in many studies.

Many of these problems have solutions from other fields such as epidemiology, statistics, and machine learning, write Ruths and Jürgen Pfeffer of Carnegie Mellon's Institute for Software Research. "The common thread in all these issues is the need for researchers to be more acutely aware of what they're actually analyzing when working with social media data," Ruths says.

Investigation of how study of social media can lead to erroneous conclusions concerning public sentiment was published in the journal Science.