In this paper, we propose a large-scale video based animal counting dataset collected by drones (AnimalDrone) for agriculture and wildlife protection. The dataset consists of two subsets, i.e., PartA captured… Click to show full abstract
In this paper, we propose a large-scale video based animal counting dataset collected by drones (AnimalDrone) for agriculture and wildlife protection. The dataset consists of two subsets, i.e., PartA captured on site by drones and PartB collected from the Internet, with rich annotations of more than 4 million objects in 53, 644 frames and corresponding attributes in terms of density, altitude and view. Moreover, we develop a new graph regularized flow attention network (GFAN) to perform density map estimation in dense crowds of video clips with arbitrary crowd density, perspective, and flight altitude. Specifically, our GFAN method leverages optical flow to warp the multi-scale feature maps in sequential frames to exploit the temporal relations, and then combines the enhanced features to predict the density maps. Moreover, we introduce the multi-granularity loss function including pixel-wise density loss and region-wise count loss to enforce the network to concentrate on discriminative features for different scales of objects. Meanwhile, the graph regularizer is imposed on the density maps of multiple consecutive frames to maintain temporal coherency. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method, compared with several state-of-the-art counting algorithms. The AnimalDrone dataset is available at https://github.com/VisDrone/AnimalDrone.
               
Click one of the above tabs to view related content.