Wikipedia searches are predicting flu trends


As we enter flu season, we tend to look up every single potential symptom for self-diagnosis. But all those searches are helping with more than just the sniffles. Researching flu symptoms on Wikipedia might be helping health experts track virus trends in the U.S.

Health researchers can now forecast the spread of the infection similarly to how meteorologists predict the weather by using Wikipedia traffic.

The Centers for Disease Control and Prevention (CDC) in Atlanta launched a competition last year to find a better way to track the flu using Internet data. Led by Kyle Hickman from the Los Alamos National Laboratories in New Mexico, researchers are now using an algorithm that links Wikipedia searches related to the flu with data gathered from the CDC.

Hickman and team believe that Wikipedia flu-related searches can indicate the spread of the illness. The team used flu data from the CDC from earlier years to teach a machine algorithms to spot links between searches and reported influenza cases.

Wikipedia is a perfect source since it is transparent and provides free data, allowing this model to work in the future. Using this model, researchers predicted flu trends in real-time for the 2013-2014 flu season. The team say their model allows for influenza forecasting to become well-founded science.

"Wikipedia article access logs are shown to be highly correlated with historical influenza-like illness records and allow for accurate prediction of influenza-like illness data several weeks before it becomes available," says Hickmann.

Seeing rises in flu diagnosis can help prevent the 3,000 to 49,000 deaths that occur in the U.S. each year. Up until now, the CDC has not been timely when it comes to tracking the trends of the infection. It takes the agency two weeks to filter through data, which includes the percentage of people that have a temperature higher than 100 degrees and cough that is collected by healthcare providers nationwide. This means the CDC can only see flu trends once the season is over.

It also doesn't take in to account unreported cases since the data comes from just patients who sought treatment, excluding those who self-diagnosed themselves on the Internet.

The new flu tracking model  seems to solve the CDC's problems when predicting the severity of the disease. The only flaw to the forecasts is that trends could be underestimated at the end of the season. This is because  people who become infected with another strain at the end of the flu season tend not to search previous Wikipedia flu articles from other strains.

Still, the model can be used as a measure of prevention, helping Americans prepare for the epidemic.

Photo Credit: William Brawley

ⓒ 2018 All rights reserved. Do not reproduce without permission.
Real Time Analytics