Abstract Weakly-supervised semantic segmentation (WSSS) using only tags can significantly ease the label costing, because full supervision needs pixel-level labeling. It is, however, a very challenging task because it is… Click to show full abstract
Abstract Weakly-supervised semantic segmentation (WSSS) using only tags can significantly ease the label costing, because full supervision needs pixel-level labeling. It is, however, a very challenging task because it is not straightforward to associate tags to visual appearance. Existing researches can only do tag-based WSSS on simple images, where only two or three tags exist in each image, and different images usually have different tags, such as the PASCAL VOC dataset. Therefore, it is easy to relate the tags to visual appearance and supervise the segmentation. However, real-world scenes are much more complex. Especially, the autonomous driving scenes usually contain nearly 20 tags in each image and those tags can repetitively appear from image to image, which means the existing simple image strategy does not work. In this paper, we propose to solve the problem by using region based deep clustering. The key idea is that, since each tagged object is repetitively appearing from image to image, it allows us to find the common appearance through region clustering, and particular deep neural network based clustering. Later, we relate the clustered region appearance to tags and utilize the tags to supervise the segmentation. Furthermore, regions found by clustering with weak supervision can be very noisy. We further propose a mechanic to improve and refine the supervision in an iterative manner. To our best knowledge, it is the first time that image tags weakly-supervised semantic segmentation can be applied in complex autonomous driving datasets with still images. Experimental results on the Cityscapes and CamVid datasets demonstrate the effectiveness of our method.
               
Click one of the above tabs to view related content.