Referring image segmentation (RIS) has obtained an impressive achievement by fully convolutional networks (FCNs). However, previous RIS methods require a large number of pixel-level annotations. In this article, we present… Click to show full abstract
Referring image segmentation (RIS) has obtained an impressive achievement by fully convolutional networks (FCNs). However, previous RIS methods require a large number of pixel-level annotations. In this article, we present a weakly supervised RIS method by using bounding box (BB) annotations. In the first stage, we introduce an adversarial boundary loss to extract the object contour from the BB, which is then used to select appropriate region proposals for pseudoground-truth (PGT) generation. In the second stage, we design a co-training (Co-T) strategy to purify the pseudolabels. Specifically, we train two networks and interactively guide them to pick clean labels for each other's networks, which can weaken the effect of noisy labels on model training. Experiment results on four benchmark datasets demonstrate that the proposed method can produce high-quality masks with a speed of 63 frames/s.
               
Click one of the above tabs to view related content.