Twitter data may help predict HIV outbreak

Maryanne Moll, Tech Times 04 March 2014, 09:03 pm

Just before Ellen DeGeneres caused Twitter to temporarily experience technical problems after her epic group selfie from the Oscars was retweeted almost 2 million times in less than one day, a far quieter study has added yet another angle to how Twitter has evolved.

It can potentially predict an HIV outbreak.

The new study says that by tracking tweets and mapping where they come from, it may be possible to predict behaviors pertaining to drug use and sexual risk, and identify problematic geographical areas, which will allow health authorities to prevent the outbreak from ever happening.

The study was led by Sean Young, assistant professor of family medicine at the David Geffen School of Medicine at UCLA. Young also founded the Center for Digital Behavior at UCLA, for which he currently serves as a co-director.

The Center studies social media and mobile technologies and how they can be used to predict and change behavior. The Center achieves this by working with other academic researchers from different disciplines, as well as private sector companies.

For this study, about 550 million tweets were collected between May 26 and December 9, 2009. An algorithm was created to find words and phrases in the tweets that pertain to risky behaviors, such as, "sex" or "get high." These tagged tweets were then plotted on a map, to ascertain whether areas that had the highest number of HIV cases were the same ones from which the tagged tweets came.

Over 8,000 tweets were found to contain words that indicated sexually risky behavior, and 1,342 suggested use of stimulating drugs. The geographical origins of these tweets were compared with the 2009 geographical data on HIV cases on the interactive online map available at AIDSVu.org. The researchers have found a significant relationship between the tweets and the data on the online map.

California led the list as the state with the largest proportion of HIV-related tweets, at 9.4 percent. Texas came second with 9.0 percent, New York with 5.7 percent, and Florida with 5.4 percent. When it came to per capita proportions, the District of Columbia emerged as the one where the largest raw number of HIV-risk-related tweets came from. It was followed by Delaware, Louisiana, and South Carolina.

This study comes after an earlier study using the same methodology published its findings on tracking the flu outbreak with Twitter. Researchers at Johns Hopkins University came up with a method for filtering tweets to monitor cases and the spread of flu. In 2011, a study was published in PLOS ONE that revealed how Twitter was used to track levels of infection and public concern during the AH1N1 pandemic. The methodology also optimized the use of keywords to identify pandemic-related tweets.

Twitter handles an estimated 340 million tweets a day from almost 200 million registered users. This wealth of social media data, also called 'big data,' that is generated in real time, just may have some important new implications for the management of public health.

"Ultimately, these methods suggest that we can use 'big data' from social media for remote monitoring and surveillance of HIV risk behaviors and potential outbreaks," said Young.