Cashless Shopping Stores – A Step into the Future?
Amazon is moving towards autonomous retail with its “Just Walk Out” technology.
On March 4th of this year, Amazon opened its first cashless supermarket in Europe. The concept is simple and convenient for both retailers and consumers. Upon entering the store, the customer authenticates via an app and is tracked in the store using frame-based image recognition. Sensors such as shelf scales assist in tracking the shopping process. Payment is made automatically upon leaving the store.
Stores with high visitor frequency, such as those in airports or football stadiums, could benefit the most from this technology.
However, the investment costs are very high even for Amazon, with the upgrade of a pilot store estimated to cost around three million US dollars.
It is still unclear when the first supermarkets will come to Germany.
To use this concept, it is essential that all customers agree to the data protection policies. However, this is challenging for German retailers, as customer surveillance has a negative connotation for most.
According to the GDPR, customer-sensitive information cannot be analyzed or processed without consent.
The New Way of Video Recording
Our approach at HS Analysis is to analyze purchasing behavior separately from the person to guarantee maximum information gain and the protection of the customer’s privacy.
Thanks to the latest camera technologies, event-based cameras, raw data streams are not individual images as with conventional digital cameras but mathematical matrices that capture pixel changes. Neuromorphic cameras, as they are also called, work similarly to the human visual system.
Unlike conventional image-based cameras, the event-based photodetector detects local changes in individual pixels. Figure 1 illustrates the difference with a recording of a running wildcat. In conventional cameras, an image is taken at timed intervals. All image information such as background, colors, and the cat are present. The event-based photosensor only detects pixel changes, in this case, the running cat and the duration of the change for each pixel individually.
Only movements are tracked. In addition to protecting the person, this approach has another advantage. The data volume is approximately five times lower than with conventional digital cameras, according to our own experience. This positively impacts image processing with neural networks, as static image information is not captured.
Initial data collection has already been carried out. There are also recordings of this experiment.
Biological Programming with Spiking Neural Networks
Like most inventions, the origin of Spiking Neuronal Networks (SNNs) comes from nature. The SNN architecture is an imitation of the brain. The human brain is the most powerful and optimized computer nature has produced. As shown in Figure 2, the brain remains in rest mode most of the time to conserve energy.
Since approximately 10% of global CO2 emissions are generated by computations, particularly for the internet and increasingly for artificial intelligence, there is great interest in finding a resource-efficient solution. One solution could be SNNs.
The functionality can be imagined as follows. The smallest building block is the neuron, just like in the brain. The inputs, connected to the nervous system, can apply a voltage to the neuron when excited. When the breakthrough voltage, also known as the threshold voltage, is exceeded, excitation and reaction occur.
Event-Based Videos as Input for SNNs
In combination with event-based camera data, as described in the chapter “The New Way of Video Recording,” the advantages of the deep learning network are multiplied. The concept of computational efficiency has a particularly positive effect on energy consumption. If the input image does not change, the input is not processed.
The Advantages for Retail
What does this mean for retail? Amazon’s concept is not feasible for many retailers. It is impossible to guarantee that all customers have agreed to the data protection policies. Therefore, a surveillance system is needed that can analyze customer purchasing behavior without infringing on privacy.
Event-based data would be a promising way to track movements without revealing the person’s identity. This is an important building block for the automation of retail. However, it is unlikely that purchased products can be identified through event-based data clouds. Possible solutions could be RFID chips, shelf scales, or another neural network trained on product recognition.
Purchasing Behavior
It has been proven that customers quickly become familiar with their preferred store after just a few visits. This results in a drastic reduction in shopping time. By analyzing customer movements throughout the store, it can be determined when it’s time to rearrange the shelves.
Store Design
A closer look at individual customer behavior could reveal which product placements could be optimized.
Shopping Atmosphere
Nothing is worse for sales than stressed customers. By analyzing movements, it is also possible to determine the emotions of customers. DVS monitoring could also identify stress points within the store. It could detect crowded areas
HSA KIT
Artificial intelligence has been successfully used for image recognition in the medical field for years at HS Analysis.
HS Analysis stands out with its know-how and academic expertise and provides a solid foundation for research and integration of existing technologies in the retail sector. This expertise was used to expand the annotation tool HSA KIT with the HSA TRAC module.
HSA TRAC
With the HSA TRAC module, event-based video data can be annotated and processed.
By determining and annotating selected keyframes, interpolations between frames are made to ensure the most accurate object detection possible.
Annotated ground truth data can then be analyzed with advanced AI modules.
By networking multiple camera systems, rooms can be virtually networked. This networking enables comprehensive tracking
The trained models can be trained within HSA TRAC without programming knowledge. The expert receives a numerical and visual overview of the results of various low-energy AI models and can choose the optimal solution. This process of model training is built iteratively and allows for controlled creation of analysis methods. This process is reproducible, traceable, and objective. The user knows which number and version of the GTDs belong to which versions of the low-energy AI models. This allows for a controlled understanding of model deterioration and improvement depending on newly added GTDs.
In HSA TRAC, it is possible to use the created AI model for automated data processing of low-energy AI models for mass data analysis
Training using GTD. After training, validation with a well-founded model evaluation visually and with metrics. Model architectures are provided for specific use cases and can also be optimized by the user. The user receives numbers as a result. Based on these results, the software can create predictions for the use cases. Thus, possible developments in business processes can be foreseen. This enables optimization within the shortest time. For example, supply chains of goods can be optimized depending on the region and the product to be delivered so that the recipient receives their ordered goods as quickly as possible.
HSA TRAC offers a high-quality ground truth data library.
No AI is immediately perfect and ready-to-use and must be adapted and optimized for specific data scenarios. This is possible without programming knowledge with HSA TRAC. HSA provides full customer support and expertise in DVS camera integration. Our deep learning engineers have a deep understanding of event-based data and can perfectly adapt deep learning networks to the use cases and the resulting data clusters.