Sign Up to like & get
recommendations!
0
Published in 2025 at "IEEE Transactions on Multimedia"
DOI: 10.1109/tmm.2024.3521708
Abstract: In recent years, the use of large-scale pre-trained models for vision-language tasks has gained significant attention and has shown promising results in the video question answering. However, the increasing size of these models has made…
read more here.
Keywords:
efficient transfer;
video question;
language;
question answering ... See more keywords