Articles with "video question" as a keyword



Photo from archive.org

Uncovering the Temporal Context for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2017 at "International Journal of Computer Vision"

DOI: 10.1007/s11263-017-1033-7

Abstract: In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent Neural Networks to learn the… read more here.

Keywords: video; question; temporal context; video question ... See more keywords

VideoQA-SC: Adaptive Semantic Communication for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Journal on Selected Areas in Communications"

DOI: 10.1109/jsac.2025.3559160

Abstract: Although semantic communication (SC) has shown its potential in efficiently transmitting multimodal data such as texts, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal… read more here.

Keywords: video; video question; reconstruction; semantic communication ... See more keywords

ERM: Energy-Based Refined-Attention Mechanism for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2023 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2022.3212463

Abstract: Spatiotemporal attention learning remains a challenging video question answering (VideoQA) task as it requires a sufficient understanding of cross-modal spatiotemporal information. Existing methods usually leverage different cross-modal attention mechanisms to reveal potential associations between video… read more here.

Keywords: video question; question; cross modal; energy based ... See more keywords

CFMMC-Align: Coarse-Fine Multi-Modal Contrastive Alignment Network for Traffic Event Video Question Answering

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2024.3409453

Abstract: Traffic video question answering (TrafficVQA) constitutes a specialized VideoQA task designed to enhance the basic comprehension and intricate reasoning capacities of videos, specifically focusing on traffic events. Recent VideoQA models employ pretrained visual and textual… read more here.

Keywords: cfmmc align; video; traffic; video question ... See more keywords

Collaborative Aware Bidirectional Semantic Reasoning for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2025 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2024.3490665

Abstract: Video question answering (VideoQA) is the challenging task of accurately responding to natural language questions based on a given video. Most previous methods focus on designing complex cross-modal interactions to perform question-oriented video scene mining… read more here.

Keywords: semantic reasoning; video question; collaborative aware; bidirectional semantic ... See more keywords

Unifying the Video and Question Attentions for Open-Ended Video Question Answering

Sign Up to like & get
recommendations!
Published in 2017 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2017.2746267

Abstract: Video question answering is an important task toward scene understanding and visual data retrieval. However, current visual question answering works mainly focus on a single static image, which is distinct from the dynamic and sequential… read more here.

Keywords: question; ended video; video question; open ended ... See more keywords
Photo from wikipedia

Graph-Based Multi-Interaction Network for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2021 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2021.3051756

Abstract: Video question answering is an important task combining both Natural Language Processing and Computer Vision, which requires a machine to obtain a thorough understanding of the video. Most existing approaches simply capture spatio-temporal information in… read more here.

Keywords: video; multi interaction; video question; graph based ... See more keywords

Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2022 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2022.3142526

Abstract: Due to the rich spatio-temporal visual content and complex multimodal relations, Video Question Answering (VideoQA) has become a challenging task and attracted increasing attention. Current methods usually leverage visual attention, linguistic attention, or self-attention to… read more here.

Keywords: video question; temporal semantic; spatio temporal; attention ... See more keywords

Video Question Answering With Prior Knowledge and Object-Sensitive Learning

Sign Up to like & get
recommendations!
Published in 2022 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2022.3205212

Abstract: Video Question Answering (VideoQA), which explores spatial-temporal visual information of videos given a linguistic query, has received unprecedented attention over recent years. One of the main challenges lies in locating relevant visual and linguistic information,… read more here.

Keywords: prior knowledge; video question; knowledge; object sensitive ... See more keywords

A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering

Sign Up to like & get
recommendations!
Published in 2023 at "IEEE Transactions on Multimedia"

DOI: 10.1109/tmm.2021.3120544

Abstract: Fusion and interaction of multimodal features are essential for video question answering. Structural information composed of the relationships between different objects in videos is very complex, which restricts understanding and reasoning. In this paper, we… read more here.

Keywords: video question; quaternion; hypergraph; video ... See more keywords

TR-Adapter: Parameter-Efficient Transfer Learning for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2025 at "IEEE Transactions on Multimedia"

DOI: 10.1109/tmm.2024.3521708

Abstract: In recent years, the use of large-scale pre-trained models for vision-language tasks has gained significant attention and has shown promising results in the video question answering. However, the increasing size of these models has made… read more here.

Keywords: efficient transfer; video question; language; question answering ... See more keywords