Sign Up to like & get
recommendations!
0
Published in 2024 at "International Journal of Computer Vision"
DOI: 10.1007/s11263-025-02440-4
Abstract: Recent advancements in multimodal fusion have witnessed the remarkable success of vision-language (VL) models, which excel in various multimodal applications such as image captioning and visual question answering. However, building VL models requires substantial hardware…
read more here.
Keywords:
efficient vision;
fusion;
vision;
language ... See more keywords