Scientists Use Wikipedia to Predict Disease Outbreaks


Scientists say it may be possible to predict the outbreak of infectious diseases by watching what information people search for on Wikipedia.

To test that, researchers at the Los Alamos National Laboratory, examining three years' worth of Wikipedia data such as searches for symptoms or diagnoses, managed to create very accurate forecasts of the spread of dengue fever in Brazil and influenza in the U.S., Japan, Poland and Thailand.

They were also able to make predictions, although less accurate ones, of outbreaks of tuberculosis in Thailand and China and of the spread of dengue fever in Thailand, the researchers reported in the journal PLOS Computational Biology.

Since the search data cannot indicate what country the search originated from, the researchers used the language of the search as a proxy for the country.

They were able to forecast some of the outbreaks 28 days beforehand, suggesting people begin their Wikipedia searches for disease-related information before they decide to seek medical attention.

The prediction success wasn't the same for all diseases, they acknowledge, and efforts to anticipate cases of Ebola, cholera, plague or HIV had little success.

However, they said that could improve with more sophisticated statistics and larger data sets.

Researchers at the Los Alamos lab's Defense Systems and Analysis Division say it may be possible to create a prediction model that can be "trained," by using data from one region where it's available and applying it to another regions where it is less available or less reliable.

"A global disease-forecasting system will change the way we respond to epidemics," lead researcher Sara Dell Valle says. "In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast."

The researchers say their goal is to build a working disease monitoring and forecasting system using open data made available as open source code.

"Our Wikipedia-based approach is sufficiently promising to explore in more detail," they say.

Wikipedia is an ideal source for the kind of data the researchers are interested in because it makes hourly traffic data on all of its pages publicly available.

"We don't do 'data grants' for selected individuals or research institutions, since our mandate is to make data openly available to anyone," says Dario Tarborelli, head of research and data at the Wikimedia Foundation.

His team receives "several requests a week" from researchers looking for data, he says.

ⓒ 2018 All rights reserved. Do not reproduce without permission.
Real Time Analytics