Violence detection in surveillance videos is a complicated task, due to the requirements of extracting the spatio-temporal features in different videos environment, and various video perspective cases. Hereby, in this… Click to show full abstract
Violence detection in surveillance videos is a complicated task, due to the requirements of extracting the spatio-temporal features in different videos environment, and various video perspective cases. Hereby, in this paper, different architectures are proposed to perform this task in high performance, by using the UBI-Fights dataset as a comprehensive case study. The proposed architectures are based on involving the Convolutional Block Attention Modules (CBAM) with other simple layers (e.g., ConvLSTM2D or Conv2D&LSTM). In addition, using the Categorical Focal Loss (CFL) as a loss function during various architecture training, to increase the focus on the most important features. To evaluate the proposed architectures, the performance metrics like Area Under the Curve (AUC), and Equal Error Rate (EER); are mainly used, to declare the architecture’s ability to identify the violence correctly, with low interaction value between classes. The performance results declare the ability of the proposed architectures, to achieve higher results that the state-of-the-art techniques. For example, the Conv2D&LSTM-based architecture gets an AUC value of 0.9493, and an EER value of 0.0507; which outperforms most of the other proposed ones, and the state-of-the-art performance.
               
Click one of the above tabs to view related content.