Conventional cameras create redundant output especially when the frame rate is high. Dynamic vision sensors (DVSs), on the other hand, generate asynchronous and sparse brightness change events only when an… Click to show full abstract
Conventional cameras create redundant output especially when the frame rate is high. Dynamic vision sensors (DVSs), on the other hand, generate asynchronous and sparse brightness change events only when an object in the field of view is in motion. Such event-based output can be processed as a 1D time sequence, or it can be converted to 2D frames that resemble conventional camera frames. Frames created, e.g., by accumulating a fixed number of events, can be used as input for conventional deep learning algorithms, thus upgrading existing computer vision pipelines through low-power, low-redundancy sensors. This paper describes a hand symbol recognition system that can quickly be trained to incrementally learn new symbols recorded with an event-based camera, without forgetting previously learned classes. By using the iCaRL incremental learning algorithm, we show that we can learn up to 16 new symbols using only 4000 samples for each symbol and achieving a final symbol accuracy of over 80%. The system achieves latency of under 0.5s and training requires 3 minutes for 5 epochs on an NVIDIA 1080TI GPU.
               
Click one of the above tabs to view related content.