Articles with "video grounding" as a keyword



Intra- and Inter-modal Multilinear Pooling with Multitask Learning for Video Grounding

Sign Up to like & get
recommendations!
Published in 2020 at "Neural Processing Letters"

DOI: 10.1007/s11063-020-10205-y

Abstract: Video grounding aims to temporally localize an action in an untrimmed video referred to by a query in natural language, which plays an important role in fine-grained video understanding. Given temporal proposals of limited granularity,… read more here.

Keywords: inter modal; video; intra inter; video grounding ... See more keywords

Efficient Video Grounding With Which-Where Reading Comprehension

Sign Up to like & get
recommendations!
Published in 2022 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2022.3174136

Abstract: Video grounding aims at localizing the temporal moment related to the given language description, which is very helpful to many cross-modal content understanding applications like visual question answering and sentence-video search. Existing approaches usually directly… read more here.

Keywords: efficient video; reading comprehension; video grounding; decision space ... See more keywords

Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Image Processing"

DOI: 10.1109/tip.2024.3365249

Abstract: Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding… read more here.

Keywords: video grounding; video; verification; language ... See more keywords

Learning Feature Semantic Matching for Spatio-Temporal Video Grounding

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Multimedia"

DOI: 10.1109/tmm.2024.3387696

Abstract: Spatio-temporal video grounding (STVG) aims to localize a spatio-temporal tube, including temporal boundaries and object bounding boxes, that semantically corresponds to a given language description in an untrimmed video. The existing one-stage solutions in this… read more here.

Keywords: video grounding; video; temporal video; query ... See more keywords