Robots Can Be Trained to Do Human Chores via YouTube Tutorial Videos

Joseph Henry, Tech Times 22 June 2023, 02:06 pm

Carnegie Mellon University (CMU) researchers have achieved a significant breakthrough in robotics by enabling robots to learn household chores through video analysis. This advancement holds great potential for improving the functionality of robots in homes, facilitating assistance with tasks such as cooking and cleaning.

By watching YoutTube videos of people performing daily activities, two robots successfully learned 12 different tasks, including opening drawers, oven doors, and lids and picking up various objects like telephones, vegetables, and cans of soup.

YouTube Videos Can Teach Robots to Perform Basic Chores

Robots Can Be Trained to Do Human Chores via YouTube Tutorial Videos

(Photo: Thor Deichmann from Pixabay)
Robots can watch a YouTube video of a person doing a certain task and learn how to do it properly.

Deepak Pathak, an assistant professor at CMU's Robotics Institute, highlighted the importance of video analysis in teaching robots.

🤖 Robotics often faces a chicken and egg problem: no web-scale robot data for training (unlike CV or NLP) b/c robots aren't deployed yet & vice-versa.

Introducing VRB: Use large-scale human videos to train a *general-purpose* affordance model to jumpstart any robotics paradigm! pic.twitter.com/csbvsfswuG
— Deepak Pathak (@pathak2206) June 13, 2023

Through YouTube video tutorials, the deep learning process helps robots in imitating a simple human activity. The approach that the experts used is more advanced compared to the traditional process, such as demonstrating a lesson in a manual. Not only it is time-consuming but it's also prone to errors.

The researchers' previous work on In-the-Wild Human Imitating Robot Learning (WHIRL) required humans to demonstrate tasks in the same environment as the robot, but their latest model, Vision-Robotics Bridge (VRB), eliminates this constraint.

"We were able to take robots around campus and do all sorts of tasks. Robots can use this model to curiously explore the world around them. Instead of just flailing its arms, a robot can be more direct with how it interacts," Robotics Ph.D. student Shikhar Bahl said.

What is the Vision-Robotics Bridge?

VRB, an improvement upon WHIRL, enables robots to learn without human demonstrations and without the need for an identical environment. While practice is still essential for mastering a task, the researchers demonstrated that the robot could learn a new task in just 25 minutes using VRB.

This innovation allows robots to adapt and learn in various environments, expanding their utility in real-world scenarios.

Understanding Affordances for Object Interaction

The concept of affordances plays a crucial role in teaching robots how to interact with objects. Affordances, rooted in psychology, refer to the opportunities an environment offers an individual.

In VRB, affordances define the ways in which robots can interact with objects based on human behavior. For example, by analyzing videos of humans opening drawers, the robot identifies contact points like handles and the direction of movement. By learning from numerous videos, the robot generalizes this knowledge and can open any drawer confidently.

Leveraging Extensive Video Datasets

The research team utilized large-scale video datasets such as Ego4D and Epic Kitchens to facilitate the learning process. Ego4D comprises nearly 4,000 hours of egocentric videos capturing everyday activities from around the world.

CMU researchers actively contributed to the collection of these videos. Epic Kitchens focuses on cooking, cleaning, and kitchen tasks, providing valuable data for training computer vision models. These datasets aid in training the robots to recognize and understand human interactions in real-world settings.

Indeed, it's impressive to know how far robotics can go each year. With the guided technologies that experts use to improve the functions of the machine, only time will tell if they can catch up with humans.