DARPA Trains Robots To Cook By Watching YouTube Videos: Why It's Significant

How do you teach a robot to cook? Why, have it watch YouTube videos, of course, say government researchers who've come up with a mathematical model so innovative it allows a robot to learn just by watching videos.

It's not really about cooking, of course; the Defense Advanced Research Projects Agency has been funding research to develop mathematical language to endow advanced sensors with the ability to discern which things they can hear or see are important while discarding input that is trivial.

The grants were issued under DARPA's Mathematics of Sensing, Exploitation, and Execution project.

"The MSEE program initially focused on sensing, which involves perception and understanding of what's happening in a visual scene, not simply recognizing and identifying objects," says program manager Reza Ghanadan in the agency's Defense Sciences Offices.

One initial success in the research is demonstrated by a robot at the University of Maryland capable of learning to handle kitchen tools after watching humans do it in YouTube videos.

The robot was equipped with two electronic equivalents of neural systems, one that could recognize objects and the other that could track movement and create a mathematical model of that movement that would allow the robot to reproduce it.

Using cameras, the robot can watch a person pick up a full pitcher and then pour water -- or watch a video of the same action -- and break the action down into thousands of separate snapshots of arms, hands, pitchers and the water in different subsequent positions.

The resulting mathematical model identifies the appearance of the water in the vessel it's being poured into as a goal the robot can imitate, the researchers say.

"We are trying to create a technology so that robots eventually can interact with humans," says researcher Cornelia Fermüller from the university's Institute for Advanced Computer Studies.

"[Robots] need to understand what humans are doing. For that, we need tools so that the robots can pick up a human's actions and track them in real time," she says.

"How is an action performed by humans? How is it perceived by humans? What are the cognitive processes behind it?"

With no additional programming or human help, the Maryland robots managed to exactly duplicate the tasks seen in the YouTube videos -- if access was provided to the exact same implements seen in the videos.

"Others have tried to copy the movements. Instead, we try to copy the goals. This is the breakthrough," says researcher leader Yiannis Aloimonos. "We chose cooking videos because everyone has done it and understands it. But cooking is complex in terms of manipulation, the steps involved and the tools you use."

The research is a significant step in robotics development, says Ghanadan.

"Instead of the long and expensive process of programming code to teach robots to do tasks, this research opens the potential for robots to learn much faster, at much lower cost and, to the extent they are authorized to do so, share that knowledge with other robots," he says.