LAUSR: vision language

ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning

Sign Up to like & get
recommendations!
0 Published in 2024 at "International Journal of Computer Vision"

DOI: 10.1007/s11263-025-02440-4

Abstract: Recent advancements in multimodal fusion have witnessed the remarkable success of vision-language (VL) models, which excel in various multimodal applications such as image captioning and visual question answering. However, building VL models requires substantial hardware… read more here.

Keywords: efficient vision; fusion; vision; language ... See more keywords

Investigating the capabilities of large vision language models in dog emotion recognition

Sign Up to like & get
recommendations!
0 Published in 2025 at "Scientific Reports"

DOI: 10.1038/s41598-025-25199-7

Abstract: Identifying emotional states in animals is a key challenge in behavioural science and a prerequisite for developing reliable welfare assessments, ethical frameworks, and robust human–animal communication models. Recently, large vision-language models (LVLMs) such as GPT-4o,… read more here.

Keywords: emotion; large vision; language; vision language ... See more keywords

Situation classification of living environment by daily life support robot using pre-trained large-scale vision-language model

Sign Up to like & get
recommendations!
0 Published in 2025 at "Advanced Robotics"

DOI: 10.1080/01691864.2025.2487608

Abstract: Various conditions exist in individual daily life environments. It is important for a daily life support robot to observe states in the daily life environment and perform tasks depending on the living environment. Today, pre-trained… read more here.

Keywords: pre trained; environment; daily life; language ... See more keywords

VisGraphVar: A benchmark generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models

Sign Up to like & get
recommendations!
0 Published in 2025 at "IEEE Access"

DOI: 10.1109/access.2025.3535837

Abstract: The fast advancement of Large Vision-Language Models (LVLMs) has shown immense potential. These models are increasingly capable of tackling abstract visual tasks. Geometric structures, particularly graphs with their inherent flexibility and complexity, serve as an… read more here.

Keywords: benchmark generator; large vision; language; vision language ... See more keywords

Vision-Language Transformer for Interpretable Pathology Visual Question Answering

Sign Up to like & get
recommendations!
1 Published in 2022 at "IEEE Journal of Biomedical and Health Informatics"

DOI: 10.1109/jbhi.2022.3163751

Abstract: Pathology visual question answering (PathVQA) attempts to answer a medical question posed by pathology images. Despite its great potential in healthcare, it is not widely adopted because it requires interactions on both the image (vision)… read more here.

Keywords: vision language; question; language; pathology ... See more keywords

A Vision Language Correlation Framework for Screening Disabled Retina

Sign Up to like & get
recommendations!
0 Published in 2024 at "IEEE Journal of Biomedical and Health Informatics"

DOI: 10.1109/jbhi.2024.3462653

Abstract: Retinopathy is a group of retinal disabilities that causes severe visual impairments or complete blindness. Due to the capability of optical coherence tomography to reveal early retinal abnormalities, many researchers have utilized it to develop… read more here.

Keywords: language correlation; proposed framework; framework; language ... See more keywords

Medical Vision-Language Modeling With Semantic Interaction and Adaptive Refinement Prompting for Bias Mitigation.

Sign Up to like & get
recommendations!
0 Published in 2025 at "IEEE journal of biomedical and health informatics"

DOI: 10.1109/jbhi.2025.3631270

Abstract: Vision-Language Models (VLMs) have demonstrated impressive capabilities across various medical tasks, including report generation and visual question answering (VQA). However, pixel-level tasks such as image segmentation remain relatively underexplored, despite their critical importance for clinical… read more here.

Keywords: medical vision; semantic interaction; language; vision language ... See more keywords

Multi-Agent Collaborative Decision-Making Using Small Vision-Language Models for Autonomous Driving

Sign Up to like & get
recommendations!
0 Published in 2025 at "IEEE Internet of Things Journal"

DOI: 10.1109/jiot.2025.3624038

Abstract: Autonomous vehicles face challenges in complex environments due to the computational inefficiency of large language models (LLMs) and the lack of multiagent collaboration in existing decision-making approaches. This article proposes a small vision–language model (VLM)-based… read more here.

Keywords: small vision; decision making; language; vision language ... See more keywords

OSClip: Domain-Adaptive Prompt Tuning of Vision-Language Models for Open-Set Remote Sensing Image Classification

Sign Up to like & get
recommendations!
0 Published in 2025 at "IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing"

DOI: 10.1109/jstars.2025.3617915

Abstract: Remote sensing image classification models face significant challenges when adapting to new domains due to variations in image acquisition conditions, sensor types, and scene categories. Conventional domain adaptation methods rely on multistage adaptation pipelines with… read more here.

Keywords: adaptation; image; remote sensing; language ... See more keywords

Enhancing Scene Understanding for Vision-and-Language Navigation by Knowledge Awareness

Sign Up to like & get
recommendations!
0 Published in 2024 at "IEEE Robotics and Automation Letters"

DOI: 10.1109/lra.2024.3483042

Abstract: Vision-and-Language Navigation (VLN) has garnered widespread attention and research interest due to its potential applications in real-world scenarios. Despite significant progress in the VLN field in recent years, limitations persist. Many agents struggle to make… read more here.

Keywords: navigation; history; knowledge; language ... See more keywords

A Hierarchical Vision-Language and Reinforcement Learning Framework for Robotic Task and Motion Planning in Collaborative Manipulation

Sign Up to like & get
recommendations!
0 Published in 2026 at "IEEE Robotics and Automation Letters"

DOI: 10.1109/lra.2025.3629984

Abstract: Vision-language-action models (VLAs) use an end-to-end learning architecture, which can realize the integration of visual perception, semantic understanding and motion control. However, when tackling with the dynamic or long-horizon tasks, VLAs have poor robustness and… read more here.

Keywords: task; language; vision language; reinforcement learning ... See more keywords

LAUSR

You are not signed in:

Sign Up!