Recently, satellite videos provide a new way to dynamically monitor the Earth’s surface. The interpretation of satellite videos has attracted more and more attentions. In this article, we focus on… Click to show full abstract
Recently, satellite videos provide a new way to dynamically monitor the Earth’s surface. The interpretation of satellite videos has attracted more and more attentions. In this article, we focus on the problem of the vehicle tracking in satellite videos. Satellite videos usually own a lower resolution, which leads to the following phenomena: 1) the size of a vehicle target usually includes a few pixels and 2) vehicles are usually with similar appearance which easily results in the wrong tracking within the observing region. General popular tracking methods usually focus on the representation of the target and recognize it from background which are limited in this problem. As a consequence, in this article, we propose to learn motion and background of the target in order to help the trackers recognize the target with higher accuracy. A prediction network is proposed to predict the location probability of the target in each pixel in next frame based on fully convolutional network (FCN) which is learned from previous results. In addition, a segmentation method is introduced to generate the feasible region for target in each frame and assign high probability for such a region. For quantitative comparison, we manually annotate 20 representative vehicle targets from nine satellite videos taken by JiLin-1. In addition, we also selected two public satellite video datasets for experiments. Numerous experimental results demonstrate the superior of the proposed method.
               
Click one of the above tabs to view related content.