Automated people counting in crowd scenes is challenging due to large variations in scale, density, and background clutter. To tackle them, we propose a novel cross-level parallel network (CLPNet) by… Click to show full abstract
Automated people counting in crowd scenes is challenging due to large variations in scale, density, and background clutter. To tackle them, we propose a novel cross-level parallel network (CLPNet) by extracting multiple low-level features from VGG16 and fusing them with specific scale aggregation modules in the high-level stage. To deal with scale variation, we design five different aggregation modules for multiscale fusion. Furthermore, the ground truth is processed skillfully to eliminate the mismatches caused by the scale variation between heads and density maps. To cope with background clutter, cross-level feature fusion is implemented. Higher-level semantic information could effectively separate head from background and regain the lost low-level detailed information. To address the variation of density, we design a parallel network, in which two separate channels focus on different density-level estimation, and attain more accurate counting results. Finally, we evaluate the proposed CLPNet on four representative crowd counting datasets, i.e., ShanghaiTech, UCF_CC_50, WorldExpo’10, and UCF_QNRF. The experimental results demonstrate that with the cross-level and multiscale structure CLPNet achieves superior performance compared with the state-of-the-art crowd counting methods.
               
Click one of the above tabs to view related content.