Facebook AI System, DINO, Can Segment Images and Videos, Making it Easier to Distinguish Fake and Real Images

Sophie Webster, Tech Times 30 April 2021, 01:04 pm

Facebook announced on Friday, Apr. 30, that it developed an algorithm, in collaboration with Inria, called DINO that enables the training of transformers, a type of machine learning model, without labeled training data.

Facebook DINO

The company claims that it sets a new state-of-the-art tech among unlabeled data training methods and leads to a model that can discover and segment objects in an image or video without a specific objective, according to Tech Crunch.

Segmented objects is used in numerous tasks, from swapping out the background of a video chat all up to teaching robots that navigate through a factory.

But it is considered among the hardest challenges in computer vision because it requires an AI to understand what is in an image.

Also Read: Facebook Reportedly Developing an AI to Summarize News Articles

Segmentation is performed with supervised learning and requires a lot of annotated examples. In supervised learning, algorithms are trained on input data annotated for a particular output until they can detect the underlying relationships between the inputs and the outputs.

With DINO, which does not require supervised learning, the system teaches itself to classify unlabeled data, processing the unlabeled data to learn from its inherent structure, Tech Investor News reported.

Unsupervised transformers

Transformers enable AI models to selectively focus on parts of their input so they can reason more effectively. While it was applied to speech and natural language processing initially, transformers have been adopted for computer vision problems as well as image classification and detection.

At the core of the so-called vision transformers are self-attention layers, with each spatial location building a representation by attending to other locations, as per Venture Beat.

That way, the transformer builds a rich, high-level understanding of the overall scene by checking at other distant pieces of an image.

DINO works by matching the output of a model over different views of the same image. In doing this, it can effectively discover object parts and shared characteristics across images.

Furthermore, DINO can connect categories based on visual properties, like separating species with a structure that resembles the biological taxonomy.

Facebook claims that DINO is also among the best at identifying image copies, even though it was not designed for this. That means that in the future, all DINO-based models could be used to identify misinformation or copyright infringement.

The social media giant wrote in a blog post that by using self-supervised learning with transformers, DINO opens the door to building machines that understand images and video much more deeply. The need for human annotation is usually a bottleneck in the development of computer vision systems.

Facebook added that by making their approaches more annotation-efficient, they allowed models to be applied to a larger set of tasks and potentially scale the number of concepts that they can recognize.

Facebook PAWS

On Apr. 30, Facebook detailed a new machine learning approach called PAWS that ostensibly achieves better classification accuracy that previous state-of-the-art and semi-supervised approaches.

PAWS also needs an order of magnitude and less training, thus making it a fit for domains where there are not many labeled images, like medicine.

PAWS achieves its results by leveraging a portion of labeled data in conjunction with unlabeled data. Given an unlabeled training image, PAWS generates more views of the image using random data augmentations and transformations. It trains a model to make the representations of these views similar to one another.