Articles with "visual question" as a keyword



Photo from wikipedia

A survey of deep learning-based visual question answering

Sign Up to like & get
recommendations!
Published in 2021 at "Journal of Central South University"

DOI: 10.1007/s11771-021-4641-x

Abstract: With the warming up and continuous development of machine learning, especially deep learning, the research on visual question answering field has made significant progress, with important theoretical research significance and practical application value. Therefore, it… read more here.

Keywords: survey deep; visual question; question answering; research ... See more keywords

Multi visual and textual embedding on visual question answering for blind people

Sign Up to like & get
recommendations!
Published in 2021 at "Neurocomputing"

DOI: 10.1016/j.neucom.2021.08.117

Abstract: Abstract Visual impairment community, especially blind people have a thirst for assis- tance from advanced technologies for understanding and answering the image. Through the development and intersection between vision and language, Visual Question Answering (VQA)… read more here.

Keywords: image; question; visual question; multi visual ... See more keywords

Co-Attention Network With Question Type for Visual Question Answering

Sign Up to like & get
recommendations!
Published in 2019 at "IEEE Access"

DOI: 10.1109/access.2019.2908035

Abstract: Visual Question Answering (VQA) is a challenging multi-modal learning task since it requires an understanding of both visual and textual modalities simultaneously. Therefore, the approaches used to represent the images and questions in a fine-grained… read more here.

Keywords: attention; question; visual question; question type ... See more keywords

A Semantic Weight Adaptive Model Based on Visual Question Answering

Sign Up to like & get
recommendations!
Published in 2025 at "IEEE Access"

DOI: 10.1109/access.2024.3442129

Abstract: Visual Question Answering (VQA) is an advanced artificial intelligence task that combines computer vision and natural language processing technologies. Its core objective is to enable computers to accurately answer natural language questions posed by users… read more here.

Keywords: semantic weight; language; model; question answering ... See more keywords

Medical Knowledge-Based Differential Image Visual Question Answering

Sign Up to like & get
recommendations!
Published in 2025 at "IEEE Access"

DOI: 10.1109/access.2025.3565695

Abstract: Visual Question Answering (VQA) technology shows great promise for cross-disciplinary applications, with its integration into the medical field emerging as a major research focus in recent years. The current mainstream medical visual question answering (VQA)… read more here.

Keywords: module; image; medical knowledge; question answering ... See more keywords

Visual Question Generation From Remote Sensing Images

Sign Up to like & get
recommendations!
Published in 2023 at "IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing"

DOI: 10.1109/jstars.2023.3261361

Abstract: Visual question generation (VQG) is a fundamental task in vision-language understanding that aims to generate relevant questions about the given input image. In this article, we propose a paragraph-based VQG approach for generating intelligent questions… read more here.

Keywords: visual question; remote sensing; question generation; sensing images ... See more keywords

PERS: Parameter-Efficient Multimodal Transfer Learning for Remote Sensing Visual Question Answering

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing"

DOI: 10.1109/jstars.2024.3447086

Abstract: Remote sensing (RS) visual question answering (VQA) provides accurate answers through the analysis of RS images (RSIs) and associated questions. Recent research has increasingly adopted transformers for feature extraction. However, this trend leads to escalating… read more here.

Keywords: remote sensing; sensing visual; efficient multimodal; question answering ... See more keywords

See and Learn More: Dense Caption-Aware Representation for Visual Question Answering

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2023.3291379

Abstract: With the rapid development of deep learning models, great improvements have been achieved in the Visual Question Answering (VQA) field. However, modern VQA models are easily affected by language priors, which ignore image information and… read more here.

Keywords: language; question answering; dense captions; visual question ... See more keywords

Deep Fuzzy Multiteacher Distillation Network for Medical Visual Question Answering

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Fuzzy Systems"

DOI: 10.1109/tfuzz.2024.3402086

Abstract: Medical visual question answering (medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pretraining paradigms to mitigate the limitation of small-scale… read more here.

Keywords: logic; language; medical visual; distillation ... See more keywords

RSVQA: Visual Question Answering for Remote Sensing Data

Sign Up to like & get
recommendations!
Published in 2020 at "IEEE Transactions on Geoscience and Remote Sensing"

DOI: 10.1109/tgrs.2020.2988782

Abstract: This article introduces the task of visual question answering for remote sensing data (RSVQA). Remote sensing images contain a wealth of information, which can be useful for a wide range of tasks, including land cover… read more here.

Keywords: remote sensing; visual question; sensing data; question answering ... See more keywords

Bi-Modal Transformer-Based Approach for Visual Question Answering in Remote Sensing Imagery

Sign Up to like & get
recommendations!
Published in 2022 at "IEEE Transactions on Geoscience and Remote Sensing"

DOI: 10.1109/tgrs.2022.3192460

Abstract: Recently, vision-language models based on transformers are gaining popularity for joint modeling of visual and textual modalities. In particular, they show impressive results when transferred to several downstream tasks such as zero and few-shot classification.… read more here.

Keywords: modal transformer; visual question; remote sensing; approach ... See more keywords