With so much misinformation spreading in social media, Rice University researchers led by computer scientist Anshumali Shrivastava developed a method using machine learning (ML) to prevent the spread of misinformation online.
This new method developed by Shrivastava and his team is presented during the 2020 Conference on Neural Information Processing Systems (NeurIPS 2020), which was held online. They improved the 50-year old Bloom filter technology for scanning social media help social media network companies prevent the spread of fake news in their platforms.
What are bloom filters?
A Bloom filter is a data structure used to test whether a certain element is a member of a set. It is space-efficient probabilistic data structure to be exact.
However, to understand bloom filters further, according to Geeks for Geeks, we must know hashing first. A hash function provides fixed length unique identifier used for identification of input.
Compared with a standard hash, a Bloom filter can embody a set of a huge number of elements. While adding an element never fails, getting false positive rate steadily increases elements until all bits included in the filter are set to 1 in which all queries give a positive result. They also do not yield false negative results. Meanwhile, deleting elements from filter is not allowed since it may lead to deletion of other elements.
The study stated that including machine learning as a binary classifier boosts Bloom filters' performance. Researchers propose new algorithms to that give lower false positive rate (FPR) and as much as 50% less memory use than existing learned Bloom filter approaches.
Researchers Uses Machine-Learning Method for Fake News Detection
Shrivastava and his team used test databases of fake news stories and computer viruses to check the efficiency of their technology. Statistics graduate student Zhenwei Dai assisted Shrivastava in achieving similar performance level as learned Bloom filters by creating Adaptive Learned Bloom Filter (Ada-BF). Shrivastava told Eurekalert that Ada-BF required 50% less memory, which allows handling twice as much information using the same resource.
Shrivastava and Dai explained their filtering approach using some Twitter data. According to Twitter, about 500 million more tweets are sent per day, which are typically published just one second after the users pressed send. However, during election Twitter was receiving about 10,000 tweets per second, which would equivalent to about six tweets per millisecond, considering the latency of one second.
"If you want to apply a filter that reads every tweet and flags the ones with information that's known to be fake, your flagging mechanism cannot be slower than six milliseconds or you will fall behind and never catch up," Shrivastava told Free Press Journal.
Researchers noted that it is also important to have a low false-positive rate when flagged tweets are sent for another manual review and generally minimize genuine tweets that are mistakenly flagged.
"If your false-positive rate is as low as 0.1%, even then you are mistakenly flagging 10 tweets per second, or more than 800,000 per day, for manual review," said Shrivastava adding that this is the reason most "AI-only approaches are prohibitive" for regulating fake news.
Although Twitter did not reveal how it filters tweets, researchers believe the social media giant uses Bloom filter, which was developed in 1970. A Bloom filter could find all codes that match the database, but it also generate some false positives results.
Shrivastava noted that researchers have been proposing various methods using machine learning to improve Bloom filters' efficiency since 2017.
This is owned by Tech Times
Written by CJ Robles