Dynamic scene classification has been extensively studied in computer vision due to its widespread applications. The key to dynamic scene classification lies in jointly characterizing spatial appearance and temporal dynamics… Click to show full abstract
Dynamic scene classification has been extensively studied in computer vision due to its widespread applications. The key to dynamic scene classification lies in jointly characterizing spatial appearance and temporal dynamics to achieve informative representation, which remains an outstanding task in the literature. In this paper, we propose a unified framework to extract spatial and temporal features for dynamic scene representation. More specifically, we deploy two variants of deep convolutional neural networks to encode spatial appearance and short-term dynamics into short-term deep features (STDF). Based on STDF, we propose using the autoregressive moving average model to extract long-term frequency features (LTFF). By combining STDF and LTFF, we establish the long–short-term feature (LSTF) representations of dynamic scenes. The LSTF characterizes both spatial and temporal patterns of dynamic scenes for comprehensive and information representation that enables more accurate classification. Extensive experiments on three-dynamic scene classification benchmarks have shown that the proposed LSTF achieves high performance and substantially surpasses the state-of-the-art methods.
               
Click one of the above tabs to view related content.