Automatic fall detection in videos could enable timely delivery of medical service to the injured elders who have fallen and live alone. Deep ConvNets have been used to detect fall… Click to show full abstract
Automatic fall detection in videos could enable timely delivery of medical service to the injured elders who have fallen and live alone. Deep ConvNets have been used to detect fall actions. However, there still remain problems in deep video representations for fall detection. First, video frames are directly inputted to deep ConvNets. The visual features of human actions may be interfered with surrounding environments. Second, redundant frames increase the difficulty of time encoding for human actions. To address these problems, this paper presents trajectory-weighted deep-convolutional rank-pooling descriptor (TDRD) for fall detection, which is robust to surrounding environments and can describe the dynamics of human actions in long time videos effectively. First, CNN feature map of each frame is extracted through a deep ConvNet. Then, we present a new kind of trajectory attention map which is built with improved dense trajectories to optimally localize the subject area. Next, the CNN feature map of each frame is weighted with its corresponding trajectory attention map to get trajectory-weighted convolutional visual feature of human region. Further, we propose a cluster pooling method to reduce the redundancy of the trajectory-weighted convolutional features of a video in the time sequence. Finally, rank pooling method is used to encode the dynamic of the cluster-pooled sequence to get our TDRD. With TDRD, we get superior result on SDUFall dataset and get comparable performances on UR dataset and Multiple cameras dataset with SVM classifiers.
               
Click one of the above tabs to view related content.