We teach surveillance cameras to understand what's happening, not just record it

Video surveillance is one of the key tools for ensuring security today, and the number of cameras installed in public places continues to grow. However, effective real-time monitoring across all cameras remains a highly challenging task. Even the most modern devices record events, but they cannot interpret what is happening. The operator sees dozens of screens but may miss an important moment—for example, a person falling, a conflict, an attempted attack, or an abandoned object.

Yevhen Petrov, CEO and lead developer of GuardNova, a specialist in computer vision, neural network algorithms, and intelligent security systems, took on this task. With more than ten years of experience in the industry, he has been involved in the implementation of Ajax security solutions, projects at the intersection of IoT and video analytics, and the development of proprietary software modules for behavior analysis and action recognition.

In this article, we take a detailed look at how the Guard-N 4.0 system, created under Yevhen's leadership, works, where it can be used, and what technical solutions allow it to process data from thousands of cameras simultaneously, ensuring high performance and the ability to generate operational reports. We also consider the prospects for implementing such technologies and how they help make the surrounding space safer.

From "pixels" to Understanding Actions

Modern cameras use advanced technologies to improve video quality even in challenging shooting conditions. HDR support captures detail in high-contrast lighting, while advanced stabilization reduces motion and vibration. Yet despite all these improvements, a camera still transmits an image as millions of points, pixels, without discerning the meaning of what is happening.

The Guard-N 4.0 system goes one step further: it does not simply capture images, but interprets them, recognizing the actions and behavior of people in the frame.

"This is no longer video surveillance in the traditional sense, but semantic analysis of human behavior—a technology that attempts to 'understand' what a person is doing and whether their actions pose a potential threat," explains Yevhen Petrov, founder and technical director of the company.

To enable the system not just to "see" but to understand, deep knowledge of video systems, networks, and integrations was required. Yevhen Petrov's prior experience proved decisive here. Before founding GuardNova, he worked at VidiNova as CTO, where he designed and deployed smart-home systems, CCTV, and network infrastructure. He managed teams, designed audio and video systems, and was responsible for network security, routing, and the integration of external solutions. This experience let him master the principles of video transmission, latency, and stream scaling, as well as the logic of sensor interaction and automation. The knowledge he gained underpinned GuardNova's future solutions—from building the data-processing architecture to integrating JSON-based protocols, camera calibration, and behavior-analysis algorithms. Skills spanning hardware operations through systems integration and network configuration set the technical direction for Guard-N 4.0 and defined the approach to creating the new architecture.

Guard-N 4.0 is built on neural models that locate 33 body keypoints—the head, arms, legs, shoulders, and other joints. Based on these data, it constructs a geometric model where each point is linked to the others by angles and temporal dependencies. Algorithms developed by GuardNova then analyze the motion of these points, assessing their duration, context, and deviations from natural behavior patterns.

If a person falls and remains motionless for ten seconds, the system classifies the event as a "fall." Raised arms are interpreted as a "distress signal," an arm extended forward as a "shooter's pose," and a potential threat. If an item remains in view without an owner, the event is flagged as a "suspicious object." All these events are automatically written to the database and surfaced in the interface: the operator immediately sees the relevant moment and can open the corresponding video clip with a single click, without rewinding.

How It Was Trained and Developed

One of the key tasks facing Yevhen Petrov's team was to train the system so that it could work confidently in unpredictable, "live" conditions—where light, angles, crowd density, and movement are constantly changing. Most open datasets contain typical scenes and do not reflect the full complexity of real video streams, so the system undergoes additional training on its own video recordings.

To solve this problem, Yevhen's team used a combined approach: basic models trained on open datasets serve as the foundation, and a specially created dataset is built on top of them: "To teach the system to recognize rare and dangerous situations, we used synthetic video, fully simulated scenes that allow us to safely show the system complex scenarios without risk to people."

Another important solution proposed by Yevhen Petrov was data tokenization. Instead of analyzing "raw" images, Guard-N 4.0 breaks videos down into individual frames and converts them into a set of numerical features—tokens. Essentially, the system translates the video stream into a sequence of meaningful elements that can be worked with like text: searching for specific events, grouping them by type, and generating reports.

This approach significantly reduces the load on the network, increases the speed of analysis, and protects user data, since images of faces and objects are not stored. As a result, Guard-N 4.0 encodes scenes as events (actions and context) rather than pixels, making them recognizable and interpretable: each scene is described not by a set of pixels, but by actions and context that the system is able to recognize and interpret.

New Approach to Data Processing

To ensure that Guard-N 4.0 could operate stably even under heavy loads and process data from thousands of cameras simultaneously, Yevhen Petrov's team implemented a hybrid Edge-Cloud architecture. This approach allows the system to divide computations between local devices and the cloud: critical processes are performed directly "on site," while centralized analysis and training take place in the cloud environment.

Thanks to this, Guard-N 4.0 responds to events instantly. For example, if a person falls or an attack is attempted, the operator receives a notification without delay.

"At the same time, only short metadata such as 'fall in frame 3, time 14:32' is transmitted to the network, rather than video streams, which reduces the load on the channels and helps to comply with the privacy requirements of different countries," explains Yevhen Petrov, the system's creator and developer.

The cloud component is used for continuous improvement of the system: it collects anonymized data, updates the models, and pushes them back to the devices. This ongoing training loop enables the algorithms to become more accurate without interrupting operations.

The approach applied in Guard-N 4.0 aligns with global trends: hybrid edge-cloud architectures are now actively discussed at international conferences and within professional associations. In October, Yevhen Petrov will present the team's results at IEEE UEMCON at the IBM Center in New York, focusing on how combining cloud and on-site computation helps video analytics systems operate faster and more reliably.

Prospects for Use

Analysts estimate that by the end of the decade, the number of installed video cameras worldwide will exceed 1.5 billion devices. They are used in urban infrastructure, at industrial facilities, in commercial buildings, and in private security systems. The scale of deployment is growing, and with it the volume of data generated by video systems every day.

According to the expert, storing and processing such volumes is becoming increasingly difficult. The video analytics field is moving toward an event-driven model, where structured events with timestamps and context replace continuous video streams. This approach reduces infrastructure load, accelerates analysis, and enables working not with images but with data.

The system developed by Yevhen Petrov and GuardNova was created specifically to work in such a multi-scenario environment. In urban infrastructure, it helps to record falls, conflicts, and suspicious activities, while at industrial facilities, it monitors compliance with safety regulations and the presence of personnel in hazardous areas. In private and corporate systems, Guard-N 4.0 is used for perimeter protection and integrates with Ajax and NVR solutions, providing real-time event notifications.

Guard-N 4.0's architecture scales without degrading performance and integrates with existing infrastructure. All software is developed in-house, ensuring a high level of cyber security and resistance to external interference—a critical factor for systems that process video in real time.

Such solutions reflect the industry's overall trajectory: video is ceasing to be an archive of images and is becoming a source of structured data suitable for analysis and forecasting.

"Today we are teaching cameras to understand what is happening, not just to record it. I am confident that over time we will enable such systems to anticipate situations and remain an unobtrusive part of the digital security ecosystem," the expert concludes.

Join the Discussion

We teach surveillance cameras to understand what's happening, not just record it

From "pixels" to Understanding Actions

How It Was Trained and Developed

New Approach to Data Processing

Prospects for Use

Viral Tesla Self-Driving Video Debunked: Driver Confirms Manual Control During Plane Landing

Azure Outage Takes Down Microsoft 365, Xbox, and More, But Company Says It's in Recovery

PSA: Update Your Starlink Software Before Nov. 17 to Avoid Deactivation, Internet Issues

Tesla Cybercab Finally Lands in China: Robotaxi Debuts Just as Rival Xpeng Reveals Its Own AI

AI Spending Hits Record Levels as Microsoft, Google, and Meta Race for Dominance