In interactive image segmentation methods, users can participate in and influence the segmentation process through their interactions, such as scribbles or bounding boxes. Similarly, the process of deep interactive segmentation… Click to show full abstract
In interactive image segmentation methods, users can participate in and influence the segmentation process through their interactions, such as scribbles or bounding boxes. Similarly, the process of deep interactive segmentation utilizes users’ interactions to guide the network to learn the target of interest. This article mainly considers mouse clicking, which is the simplest interaction mode. Then, how to effectively characterize the click interaction (we call this “click encoding”) and fuse the click-related information with the network are the key issues in a deep interactive segmentation framework. However, the current click encoding method concentrates only on the spatial information of the clicks, so the region affected by each click is difficult to control, and the stability of the network is therefore reduced. Therefore, we propose a feature-interactive map that builds a close relationship between interaction information and target semantics. The affected region of the feature interactive map is determined by semantic information. Furthermore, we introduce an interactive nonlocal block by embedding a feature-interactive map into a nonlocal block, so that the long-range dependencies of the interaction information can be captured. Finally, based on the early fusion strategy, the features of the interactive nonlocal block are fused with the high-level features, thus amplifying the impacts of position and semantics on the final prediction results. Comprehensive experiments demonstrate that our click embedding approach significantly boosts the efficiency of the network and achieves state-of-the-art performance.
               
Click one of the above tabs to view related content.