Microsoft and Intel Create Images to Detect Malware

Microsoft and Intel are collaborating on a new approach to classify malware - by visualizing it.

They are currently working on STAMINA (Static Malware-as-Image Network Analysis), which turns rogue code into grayscale images allowing a deep learning system to study them. This converts the input file into a simple picture with varying dimensions depending on aspects like file size.

closeup photo of turned on computer monitor Markus Spiske/Unsplash

ALSO READ : Chinese Rocket is Failing and Falling Back to Earth After Just a Week in Orbit

According to a report by ZDNet, this uses a trained artificial intelligence (AI) to determine whether a file has been infected. It is trained to trace a huge amount of data Microsoft has collected from Windows Defenders installations. The technology does not require full-size and pixel recreations of viruses, translating large malware to gigantic pictures.

Currently, STAMINA works effectively with small files with as much as 99% accuracy in classifying malware and about 2.6% false-positive rate, according to a report in Engadget. However, it struggles with larger files, although it could be very useful with further enhancement.

How STAMINA works

Most malware detection relies on extracting binary signatures, but the steep number of signatures makes the method impractical. This could help anti-malware tools effectively keep up and reduce the chances of security threats slipping past defenses.

The entire process is simple. First, an input file is taken and converted to its binary form into a stream of raw pixel data.

Researchers would then take convert this one-dimensional pixel stream into a 2D photo so that normal image analysis algorithms can analyze it.

The width of the image is based on the input file's size, while the height is dynamic, resulting from dividing the raw pixel stream by the chosen width value.

After assembling the raw pixel stream into a normal-looking 2D image, researchers then resized the resulting photo to a smaller dimension.

Resizing the raw image does not "negatively impact the classification result," which was necessary so computational resources will not have to work with billions of pixels that will slow down the process.

These images were then fed into a deep neural network (DNN) that is trained to scan the 2D representation of the malware strain and classified it as clean or infected. For the training, Microsoft has provided 2.2 million samples of infected Portable Executable files as a research basis.

Researchers used 60% of the known malware samples to train the original DNN algorithm, 20% of the files to validate the DNN and the other 20% for the actual testing process.

The research team said STAMINA achieved an accuracy of 99.07% in identifying and classifying malware samples, with a false positives rate of 2.58%.

flat screen computer monitor photo Clint Patterson/Unsplash

ALSO READ: [HACKERS] Millions of PCs with Intel Thunderbolt Flaws are Vulnerable to Hacking; Thunderspy Attack Takes Only Five Minutes

"The results certainly encourage the use of deep transfer learning for malware classification," said Jugal Parikh and Marc Marino, the two Microsoft Threat Protection Intelligence Team researchers who participated in the study.

The Microsoft advantage

Earlier this month, Tanmay Ganacharya, Director for Security Research of Microsoft Threat Protection, told ZDNet that the tech giant now counts on machine learning for detecting malware. This is the same software deployed on customer systems or Microsoft servers.

Overall, STAMINA is one of those ML modules that be implemented soon at Microsoft to spot malware.

Ganacharya said that while anybody can build a model, the quality and quantity of labeled data define how effective the model will be.

"[We], at Microsoft, have that as an advantage because we do have sensors that are bringing us lots of interesting signals through email, through identity, through the endpoint, and being able to combine them," Ganacharya said.