Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on… Click to show full abstract
Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided promising detection results. However, the accuracy of the convolutional object detectors can be degraded often due to the low feature discriminability caused by geometric variation or transformation of an object. In this paper, we propose a deformable part region (DPR) learning in order to allow decomposed part regions to be deformable according to the geometric transformation of an object. Because the ground truth of the part models is not available in many cases, we design part model losses for the detection and segmentation, and learn the geometric parameters by minimizing an integral loss including those part losses. As a result, we can train our DPR network without extra supervision, and make multi-part models deformable according to object geometric variation. Moreover, we propose a novel feature aggregation tree (FAT) so as to learn more discriminative region of interest (RoI) features via bottom-up tree construction. The FAT can learn the stronger semantic features by aggregating part RoI features along the bottom-up pathways of the tree. We also present a spatial and channel attention mechanism for the aggregation between different node features. Based on the proposed DPR and FAT networks, we design a new cascade architecture that can refine detection tasks iteratively. Without bells and whistles, we achieve impressive detection and segmentation results on MSCOCO and PASCAL VOC datasets. Our Cascade D-PRD achieves the 57.9 box AP with the Swin-L backbone. We also provide an extensive ablation study to prove the effectiveness and usefulness of the proposed methods for large-scale object detection.
               
Click one of the above tabs to view related content.