Abstract Crowd counting is a challenging vision task which aims to accurately estimate the crowd count from a single image. To this end, we propose a novel architecture called De-background… Click to show full abstract
Abstract Crowd counting is a challenging vision task which aims to accurately estimate the crowd count from a single image. To this end, we propose a novel architecture called De-background Detail Convolutional Network (DDCN) to learn a mapping from the input image to the corresponding crowd density map. DDCN focuses on removing the interference of background from crowds and reducing the mapping range from input to output. Such design optimizes the learning process to a large extent. The proposed DDCN is composed of three components: a decomposer, a feature extraction CNN and a regression CNN. Specifically, the decomposer produces a detail layer by subtracting the background interference from the crowd image. Feature extraction CNN works for extracting high level features and regression CNN is used to estimate the density map. In addition, a weighted Euclidean loss is designed to calculate the Euclidean distances of the crowd and the background separately with different loss weights, which further improves the counting performance. Extensive experiments were conducted on three crowd counting datasets to validate the performance of DCNN. And experimental results demonstrate that DDCN achieves performance improvements compared with the state-of-the-art.
               
Click one of the above tabs to view related content.