ABSTRACT Crowd counting and localisation are essential tasks in crowd analysis and are vital to ensure public safety. However, these tasks via UAV bring new obstacles compared with video surveillance… Click to show full abstract
ABSTRACT Crowd counting and localisation are essential tasks in crowd analysis and are vital to ensure public safety. However, these tasks via UAV bring new obstacles compared with video surveillance (e.g. viewpoint and scale variations, background clutter, and small scales). To overcome the difficulties, this research presents a novel network named PDNet. It employs the multi-task learning approach to combine the point regression and density map regression. PDNet includes a backbone to extract multi-scale features, a Dilated Feature Fusion module (DFF), a Density Map Attention module (DMA), a density map branch and a point branch. Aims of DFF is to address the difficulties of small targets and scale variations by establishing relationships between targets and their surroundings. DMA is created to address the challenges of complicated backgrounds, allowing the PDNet to focus on the target's location. In addition, the density map branch and point branch are designed for density maps regression and point regression, respectively. Experiments on the DroneCrowd dataset demonstrate that our proposed network outperforms state-of-the-art approaches in terms of localisation, L-mAP (53.85%), L-AP@10 (59.14%), L-AP@15 (63.64%), and L-AP@20 (66.21%), and we improved counting performance and significantly reduced inference time. In addition, ablation experiments are conducted to prove the modules' effectiveness.
               
Click one of the above tabs to view related content.