When attempting to analyze and understand large-scale video datasets, choosing training videos using active learning can significantly reduce the annotation costs associated with supervised learning without sacrificing the accuracy of… Click to show full abstract
When attempting to analyze and understand large-scale video datasets, choosing training videos using active learning can significantly reduce the annotation costs associated with supervised learning without sacrificing the accuracy of classifiers. However, to further reduce the computational overhead of exhaustive comparisons between low-level visual feature with high dimensions, we have developed a novel video hashing coding model based on an active learning framework. The model optimizes mean average precision in a straightforward way by explicitly considering the structure information in video clips when learning the optimal hash functions. The structure information considered includes the temporal consistency between successive frames and the local visual patterns shared by videos with same semantic labels. Rather than relying on the similarity between paired videos, we use a ranking-based loss function to directly optimize mean average precision. Then, combined with the active learning component, we jointly evaluate the unlabeled training videos according to the uncertainty and average precision values with an efficient algorithm based on structured SVM. Extensive experimental results over several benchmark datasets demonstrate that our approach produces significantly higher search accuracy than traditional query refinement schemes.
               
Click one of the above tabs to view related content.