LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering

Fusion and interaction of multimodal features are essential for video question answering. Structural information composed of the relationships between different objects in videos is very complex, which restricts understanding and… Click to show full abstract

Fusion and interaction of multimodal features are essential for video question answering. Structural information composed of the relationships between different objects in videos is very complex, which restricts understanding and reasoning. In this paper, we propose a quaternion hypergraph network (QHGN) for multimodal video question answering, to simultaneously involve multimodal features and structural information. Since quaternion operations are suitable for multimodal interactions, four components of the quaternion vectors are applied to represent the multimodal features. Furthermore, we construct a hypergraph based on the visual objects detected in the video. Most importantly, the quaternion hypergraph convolution operator is theoretically derived to realize multimodal and relational reasoning. Question and candidate answers are embedded in quaternion space, and a Q&A reasoning module is creatively designed for selecting the answer accurately. Moreover, the unified framework can be extended to other video-text tasks with different quaternion decoders. Experimental evaluations on the TVQA dataset and DramaQA dataset show that our method achieves state-of-the-art performance.

Keywords: video question; quaternion; hypergraph; video; question answering

Journal Title: IEEE Transactions on Multimedia
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.