By Paul Kong
AI has become a trending topic this year. Whether it is artificial intelligence, machine learning or deep learning with neural nets, the terms are mentioned seemingly interchangeably in conversation and marketing materials. Before moving forward, it’s time to define some terms and set boundaries so our conversations and expectations can be better informed.
By Paul Kong
• AI (Artificial Intelligence)
AI is the theory and development of computer systems that can perform tasks which usually require human intelligence. For the security industry, AI might represent a self-aware intelligence, similar to the human brain, that is capable of learning by itself, reasoning, and making decisions, even when presented with new and complex situations. We are still a long way away from such a machine, but there’s no doubt we are headed in this ultimate direction. Because we’re far from actual AI, it is helpful for the industry to refrain from using the term when describing some of the current sub-domains of AI such as machine learning and deep learning.
• Machine Learning
Machine learning is a subset of AI and more correctly mirrors where we are today with technology and where we’ll be in the near future. Stanford University defines machine learning as “the science of getting computers to act without being explicitly programmed.” Based on the construction of algorithms, statistical data can be analyzed and compared against known data models. This allows a machine to then make informed predictions, and in a sense, “learn.” A good example is self-driving cars. Self-driving cars are not yet completely autonomous and certainly not 100 per cent safe. Using these same technologies, we can teach cameras or servers to recognize patterns in objects and alert us or take actions when it finds matches. There can be no AI without machine learning, since “learning” is an essential part of any AI.
• Deep learning and convolutional neural networks
Deep learning takes machine learning to a new level based on neural network theory that mimics the complexity of the human brain. Deep learning is a specialized subset of machine learning algorithms and is typically used to analyze video and still images. Although deep learning is still in the early stages of development, it is a natural fit for security system analytics both for server-based and on-the-edge processing.
Current use and limitations for security and surveillance
Machine and deep learning-based analytics are actively being marketed and productized by some vendors.
However, many customers have been dismayed at the number of false positives (or false negatives) that are generated. The algorithms in use are frequently not mature enough to provide the 100 per cent accuracy that many businesses require for real-time event notification and decision making. It’s not necessarily the algorithm’s fault, as ultimately they are only as good as the data they have been given.
Existing data models can be purchased from third-party companies, but the question remains: How accurate are they?
Additionally, many of these new AI-based analytics are offered as “post processes” run on a server or workstation, when a majority of customer use cases demand real-time feedback.
It takes a significant amount of computational power for a machine to “self-learn” in the field, so learning might be best left to powerful server farms in R&D laboratories and universities for now. Real-time analytics also require that video streams be uncompressed to be analyzed. While this might make sense for installations with a few cameras connected to a server, it’s clearly an unacceptable resource drain to decode (open) each compressed camera stream when hundreds of cameras are involved. Once an algorithm and data set are created, it can be “packed up” and embedded on the edge to perform real-time detection and alerts for the specific conditions it has been trained to recognize. It will not self-learn, but it can provide a useful function for recognizing known objects or behaviours.
Having deep learning analytics on the edge — analyzing the image before it is compressed and sent to storage — is clearly what everyone has in mind when they imagine the usefulness of such technology to alert staff and make decisions in real time.
Value proposition of deep learning-based analytics
The more we can automate video processing and decision making, the more we can save operators from redundant and mundane tasks. Computers capable of sophisticated analysis, self-learning and basic decision making are much better and faster at analyzing large volumes of data while humans are best utilized where advanced intelligence, situational analysis and dynamic interactions are required. Increased value comes from the efficiencies gained when each resource is used most effectively. The goal is to help operators make better informed judgements by providing detailed input and analytics. This way, false alarms can be drastically reduced. Precise object classification will be a big part of the future, as will action recognition and behaviour analysis.
We see airport security, subways and other mass transit customers as early adopters of this technology. Detecting objects left behind is crucial in the world we live in today. The benefits to retail organizations wanting to optimize their business operations is equally important. It may even be possible to rank reaction to products based on learned postures and skeletal movements.
Systems integrators will be able to sell value far beyond a security system. Integrations with other systems, even customer relationship management (CRM) systems will be not only possible, but desirable as a way to enhance metrics for retail establishments.
What to expect in the near term
Solutions for the near future may be a hybrid model where on the edge hardware combines with server software to deliver a powerful combination of analysis and early warning and detection capabilities. Comprehensive facial recognition will not be practical on the edge anytime soon, so that is a good example of where servers can do some heavy lifting. Object recognition and classification is perfectly suited for in-camera analytics. We can easily imagine that deep learning-based analytics might take the same progression as traditional analytics.
The first implementations were server-based, then there was a migration to a hybrid approach where some algorithms were in the camera, but a server was still required for in-depth analysis, and ultimately the entire analytic process could be hosted on the edge.
The future looks bright for image classification
Being able to identify what’s going on in a still or moving image is one of the most interesting areas of development in machine and deep learning. With moving images, analysis goes beyond simply classifying objects, to classifying the behaviours and moods of those in the frame. Already, vendors are talking about “aggression detection” as a desirable feature, but it’s easy to imagine that the cost to an organization of a false positive or false negative in such a scenario could be very high.
Whether it’s big players like Facebook, Google or Nvidia, significant investments are being made in AI and machine learning to classify images and objects as well as text and speech. Some of this technology development will trickle down into the security industry, and much of it will have to be custom developed to suit the needs of surveillance work flows.
It may be OK to mis-label a person’s face in Facebook, but security organizations should not be willing to make such a mistake. This is one of many reasons why our industry must insist on higher standards.
Customer expectations for AI and all its variants are understandably high. However, careful consideration is needed when choosing any AI type solution, as the reality of where the technology is today and the marketing hype may not line up.
At Hanwha Techwin, our R&D department continues to actively develop deep learning algorithms. We are training edge devices to correctly recognize actions and motion of interest separately from normal environmental variables such as wind, snow and rain. Our continued focus is to streamline operations and deliver truly actionable intelligence. Object classification and skeleton-based action recognition lets us better detect violent postures and abnormal behaviours.
We are currently developing our Wisenet 7 chip, which will be focused on providing deep learning analytics on the edge. Hanwha invests a significant amount of resources to develop its own SoC. This allows us to make edge devices smarter, while focusing on an analytic engine geared toward video surveillance specific intelligence. Our development so far has shown that the benefits to surveillance analytics, as this eld continues to evolve, will be substantial.
Paul Kong is the technical director for Hanwha Techwin (www.hanwhatechwin.com).
This article originally appeared in the August/September 2018 issue of SP&T News.