Wearing masks is an effective and simple method to prevent the spread of the COVID-19 pandemic in public places, such as train stations, classrooms, and streets. It is of positive… Click to show full abstract
Wearing masks is an effective and simple method to prevent the spread of the COVID-19 pandemic in public places, such as train stations, classrooms, and streets. It is of positive significance to urge people to wear masks with computer vision technology. However, the existing detection methods are mainly for simple scenes, and facial missing detection is prone to occur in dense crowds with different scales and occlusions. Moreover, the data obtained by surveillance cameras in public places are difficult to be collected for centralized training, due to the privacy of individuals. In order to solve these problems, a cascaded network is proposed: the first level is the Dilation RetinaNet Face Location (DRFL) Network, which contains Enhanced Receptive Field Context (ERFC) module with the dilation convolution, aiming to reduce network parameters and locate faces of different scales. In order to adapt to embedded camera devices, the second level is the SRNet20 network, which is created by Neural Architecture Search (NAS). Due to privacy protection, it is difficult for surveillance video to share in practice, so our SRNet20 network is trained in federated learning. Meanwhile, we have made a masked face dataset containing about 20,000 images. Finally, the experiments highlight that the detection mAP of the face location is 90.6% on the Wider Face dataset, and the classification mAP of the masked face classification is 98.5% on the dataset we made, which means our cascaded network can detect masked faces in dense crowd scenes well.
               
Click one of the above tabs to view related content.