Multilabel remote sensing (RS) image annotation is a challenging and time-consuming task that requires a considerable amount of expert knowledge. Most existing RS image annotation methods are based on handcrafted… Click to show full abstract
Multilabel remote sensing (RS) image annotation is a challenging and time-consuming task that requires a considerable amount of expert knowledge. Most existing RS image annotation methods are based on handcrafted features and require multistage processes that are not sufficiently efficient and effective. An RS image can be assigned with a single label at the scene level to depict the overall understanding of the scene and with multiple labels at the object level to represent the major components. The multiple labels can be used as supervised information for annotation, whereas the single label can be used as additional information to exploit the scene-level similarity relationships. By exploiting the dual-level semantic concepts, we propose an end-to-end deep learning framework for object-level multilabel annotation of RS images. The proposed framework consists of a shared convolutional neural network for discriminative feature learning, a classification branch for multilabel annotation and an embedding branch for preserving the scene-level similarity relationships. In the classification branch, an attention mechanism is introduced to generate attention-aware features, and skip-layer connections are incorporated to combine information from multiple layers. The philosophy of the embedding branch is that images with the same scene-level semantic concepts should have similar visual representations. The proposed method adopts the binary cross-entropy loss for classification and the triplet loss for image embedding learning. The evaluations on three multilabel RS image data sets demonstrate the effectiveness and superiority of the proposed method in comparison with the state-of-the-art methods.
               
Click one of the above tabs to view related content.