Articles with "video question" as a keyword



Photo from archive.org

Uncovering the Temporal Context for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2017 at "International Journal of Computer Vision"

DOI: 10.1007/s11263-017-1033-7

Abstract: In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent Neural Networks to learn the… read more here.

Keywords: video; question; temporal context; video question ... See more keywords
Photo from wikipedia

ERM: Energy-Based Refined-Attention Mechanism for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2023 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2022.3212463

Abstract: Spatiotemporal attention learning remains a challenging video question answering (VideoQA) task as it requires a sufficient understanding of cross-modal spatiotemporal information. Existing methods usually leverage different cross-modal attention mechanisms to reveal potential associations between video… read more here.

Keywords: video question; question; cross modal; energy based ... See more keywords
Photo by art_maltsev from unsplash

Unifying the Video and Question Attentions for Open-Ended Video Question Answering

Sign Up to like & get
recommendations!
Published in 2017 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2017.2746267

Abstract: Video question answering is an important task toward scene understanding and visual data retrieval. However, current visual question answering works mainly focus on a single static image, which is distinct from the dynamic and sequential… read more here.

Keywords: question; ended video; video question; open ended ... See more keywords
Photo from wikipedia

Graph-Based Multi-Interaction Network for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2021 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2021.3051756

Abstract: Video question answering is an important task combining both Natural Language Processing and Computer Vision, which requires a machine to obtain a thorough understanding of the video. Most existing approaches simply capture spatio-temporal information in… read more here.

Keywords: video; multi interaction; video question; graph based ... See more keywords
Photo from wikipedia

Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2022 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2022.3142526

Abstract: Due to the rich spatio-temporal visual content and complex multimodal relations, Video Question Answering (VideoQA) has become a challenging task and attracted increasing attention. Current methods usually leverage visual attention, linguistic attention, or self-attention to… read more here.

Keywords: video question; temporal semantic; spatio temporal; attention ... See more keywords
Photo by art_maltsev from unsplash

Video Question Answering With Prior Knowledge and Object-Sensitive Learning

Sign Up to like & get
recommendations!
Published in 2022 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2022.3205212

Abstract: Video Question Answering (VideoQA), which explores spatial-temporal visual information of videos given a linguistic query, has received unprecedented attention over recent years. One of the main challenges lies in locating relevant visual and linguistic information,… read more here.

Keywords: prior knowledge; video question; knowledge; object sensitive ... See more keywords

A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering

Sign Up to like & get
recommendations!
Published in 2023 at "IEEE Transactions on Multimedia"

DOI: 10.1109/tmm.2021.3120544

Abstract: Fusion and interaction of multimodal features are essential for video question answering. Structural information composed of the relationships between different objects in videos is very complex, which restricts understanding and reasoning. In this paper, we… read more here.

Keywords: video question; quaternion; hypergraph; video ... See more keywords
Photo from wikipedia

Memory Augmented Deep Recurrent Neural Network for Video Question Answering

Sign Up to like & get
recommendations!
Published in 2020 at "IEEE Transactions on Neural Networks and Learning Systems"

DOI: 10.1109/tnnls.2019.2938015

Abstract: Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we… read more here.

Keywords: video; memory augmented; augmented deep; video question ... See more keywords
Photo by hajjidirir from unsplash

Learning to Answer Visual Questions from Web Videos

Sign Up to like & get
recommendations!
Published in 2022 at "IEEE transactions on pattern analysis and machine intelligence"

DOI: 10.1109/tpami.2022.3173208

Abstract: Recent methods for visual question answering rely on large-scale annotated datasets. Manual annotation of questions and answers for videos, however, is tedious, expensive and prevents scalability. In this work, we propose to avoid manual annotation… read more here.

Keywords: video question; question; learning answer; videoqa ... See more keywords