Current State of Enterprise Video
Enterprises and corporations install a lot of Closed Circuit TV (CCTV) cameras in order to perform video surveillance. As of 2016, it is estimated that there are 350 million surveillance cameras installed worldwide. With all these cameras installed, a huge amount of video is being generated around the clock, and typically a human security officer is looking at this video feed in order to look for criminal, suspicious or safety-related behavior. There are multiple problems with the current approach:
1. Manual monitoring is expensive and prone to human error. A 1999 study (Green, 1999) found that after 20 minutes, guards watching a video scene will miss up to 95 percent of all activity.
2. The large amount of video data being generated is being stored in its raw format in order to ensure that no significant information is lost. This leads to the requirement for large investments in storage and video management infrastructure. With the growth of High-Definition video, the requirements for storage infrastructure just keep going up.
3. Video surveillance feed is primarily used for post-analysis, to look for evidence of suspected activities during investigations or for lawsuits. Using video feeds to take actions in real-time is not commonplace.
Applying Deep Learning to Understand Behavior from Video
Deep Learning is a class of machine learning that is loosely based on how the brain works. With the explosion of compute capacity and availability of large data sets, deep learning has been shown to be very powerful in finding patterns in unstructured data, such as images, test, speech and video. Deep learning is rapidly replacing traditional computer vision and natural language processing techniques because of its accuracy, surpassing human accuracy in many cases.
Deep learning models are built using deep neural networks, which consist of multiple layers of neurons connected together in a specific architecture. This deep learning network is trained by feeding examples of data that you want the network to learn (ex: cats, dogs, faces). Intuitively, each layer of the network learns different types of patterns – the first layer learns to find edges, the second layer learns to find object parts, the next layer learns to find objects, and so on and so forth each higher layer of the network learns more and more complicated combinations of patterns relevant to the training data set.
There are a multitude of deep neural network architectures that have different structures of connecting the neurons in the network. Convolutional neural networks (CNNs) are a popular deep learning architecture to understand images. Recurrent Neural Networks (RNNs) and specifically Long Short Term Memory Networks (LSTMs) are popular architectures to understand patterns from sequential data. For videos, a combination of CNNs and LSTMs is used to extract the spatial patterns from each frame as well as the temporal aspect from a sequence of frames.
What if deep learning could be applied to find behavioral patterns in video surveillance data? The benefits would be manifold:
1. Automatically identifying behavior patterns directly from video feed can reduce the need for manual monitoring of video feed, saving cost and eliminating potential human error.
2. Selective videos with only meaningful behaviors can be stored and managed, greatly reducing storage infrastructure needs.
3. Identifying behaviors in real-time can enable fast response to those behaviors.
4. Additional patterns can be extracted from the sequence of behaviors to predict outcomes or ensure compliance.
5. Deep learning technology continuously learns from video data to distinguish finer and finer details related to the behavior.
Enterprise Behavior Recognition
Samsung SDS has developed a pioneering platform that leverages deep learning technology to understand behavior from video for enterprise use cases.
Understanding behavior automatically from video data enables a variety of use cases with high business value for multiple industry verticals. Below are some such use case examples:
Law Enforcement:
Retail:
Healthcare:
Manufacturing:
Samsung SDS’ enterprise video intelligence platform utilizes cutting edge deep learning technology to enable multiple enterprise behavior recognition use cases. We aim to help enterprises unlock the tremendous value from their video data and we are looking to work closely with customers to apply our technology. If you want to obtain more information about this platform, please contact us at bd.sdsa@samsung.com.
Rajesh Anantharaman is a part of SDSA’s Artificial Intelligence team based in Silicon Valley. He is focused on helping solve customer problems and bringing new AI-based solutions to market, including Samsung SDS’ enterprise video intelligence platform.