Current multi object tracking and segmentation (MOTS) methods made great progress for the simultaneous detection and tracking of heterogeneous objects like cars and pedestrians. Nevertheless, all of these scenes consisted… Click to show full abstract
Current multi object tracking and segmentation (MOTS) methods made great progress for the simultaneous detection and tracking of heterogeneous objects like cars and pedestrians. Nevertheless, all of these scenes consisted of dissimilar objects, which are easier to track than homogeneous and smaller objects, as those are more similar in appearance. Therefore, this is the first paper that explores the implementation of MOTS algorithms for the simultaneous detection and tracking of homogeneous objects. Towards this end, video data was acquired in an apple orchard using a wearable camera and unmanned aerial vehicles (UAV). The dataset, called APPLE MOTS, contains almost 86000 manually annotated apple masks and is the first public dataset in which apple instances are temporally consistent labelled across frames. Implementation of the MOTS architectures called TrackR-CNN and PointTrack indicates that they could be suitable for the joint detection (MOTSP: 80.4) and tracking (sMOTSA: 38.7, MOTSA: 52.9) of apples. This letter exposes the challenge of tracking homogeneous objects due to their similar shape and colour while detection performance remains state-of-the-art.
               
Click one of the above tabs to view related content.