To counter the fear that artificial intelligence (AI) could someday become a threat to humanity, a team of scientists last month revealed a new algorithm that will teach robots how to behave appropriately in social situations. The key to this is by allowing them to read and understand children's stories.
Now, another group of experts has taken a wider route. Instead of limiting it to children's stories, developers used fiction from Toronto-based Wattpad to help their knowledge-based program called Augur understand the world.
Computer science researchers from Stanford University had given Augur access to the Wattpad corpus, which contained about 2 billion words or 600,000 chapters. Augur is also designed to be an open-source tool that other scientists can build on.
Ethan Fast, a fourth year PhD student and a co-author of the study, said it is basically difficult to program computers to understand the vast range of activities that people do. With enough works of fiction, the computer can be modeled with more depth.
Fast and his colleagues used the Wattpad data to create a model with 54,075 human activities related to 13,843 objects and locations. It reminds us of what Tadashi Hamada did in "Big Hero 6" to program Baymax.
For instance, the activity "take picture" occurs 10,249 times and is linked to 5,250 objects such as Instagram or cameras. The activity "unfold letter" occurs 203 times but is connected to 1,072 objects such as envelopes and handwriting.
During initial field tests, the research team combined wearable tech and natural language processing to read all the Wattpad fiction. The youth-friendly reading material from Wattpad was surprisingly good at describing the modern world in ways a computer can understand.
Augur attempts to predict what is happening by comparing objects with expected behaviors. It was able to figure out the context of situations with 71 percent precision.
The corpus has flaws such as the fact that most fiction is written to create drama or tension. One example is the act of punching someone in the face at the slightest provocation. It is something that a teen may write about, but it doesn't happen in real life more often.
Still, this flaw isn't only present in the Wattpad corpus, it is also present in the Google Books corpus as well. The Google Books corpus contains classic and literary works of fiction, and obviously it would present a different kind of world.
"It's less focused on the modern world, it doesn't know what a cellphone is, doesn't know what Facebook is," said Fast. "In some sense, the bias of super old stuff is worse than the dramatic bias of poorly crafted fan fiction."
Lastly, Fast said amateur writers are more focused on mundane details that they find more useful than great works of literature.