Sign Up to like & get
recommendations!
2
Published in 2023 at "IEEE Access"
DOI: 10.1109/access.2023.3263512
Abstract: As a fundamental branch in cross-modal retrieval, image-text retrieval is still a challenging problem largely due to the complementary and imbalanced relationship between different modalities. However, existing works have not effectively scanned and aligned the…
read more here.
Keywords:
text retrieval;
foreground background;
image;
image text ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2025 at "IEEE Geoscience and Remote Sensing Letters"
DOI: 10.1109/lgrs.2024.3494543
Abstract: Existing remote sensing (RS) image-text retrieval methods generally fall into two categories: dual-stream approaches and single-stream approaches. Dual-stream models are efficient but often lack sufficient interaction between visual and textual modalities, while single-stream models offer…
read more here.
Keywords:
remote sensing;
text retrieval;
stream;
image text ... See more keywords
Sign Up to like & get
recommendations!
1
Published in 2022 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2022.3182426
Abstract: Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received growing attention since it connects heterogeneous data. Previous methods that perform well on image-text retrieval mainly focus on the interaction between…
read more here.
Keywords:
text retrieval;
level representation;
image;
image text ... See more keywords
Sign Up to like & get
recommendations!
2
Published in 2023 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2022.3207910
Abstract: Video-text retrieval is a crucial task that has been a powerful application for multi-media data analysis and attracted tremendous interest in the research area. The core steps are feature representations and alignment to overcome the…
read more here.
Keywords:
video text;
text retrieval;
video;
local alignment ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2024.3422869
Abstract: Video-Text Retrieval is a fundamental task in multi-modal understanding and has attracted increasing attention from both academia and industry communities in recent years. Generally, video inherently contains multi-grained semantic and each video corresponds to several…
read more here.
Keywords:
text retrieval;
video;
phrase;
reliable phrase ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2025 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2025.3574726
Abstract: Large-scale image-text pre-trained models have shown promising transferability to various downstream tasks. Video-text retrieval benefits from it by transferring pre-trained CLIP to video-text domain. Although these pre-trained models have shown impressive performance, full fine-tuning becomes…
read more here.
Keywords:
prompt tuning;
video text;
text retrieval;
agent ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2024 at "IEEE Transactions on Multimedia"
DOI: 10.1109/tmm.2023.3316077
Abstract: Current image-text retrieval methods mainly utilize region features that provide object-level information to represent images, making the retrieval results more accurate and interpretable. However, there are several issues with region features, such as lack of…
read more here.
Keywords:
text retrieval;
text;
image text;
visual features ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2024 at "IEEE Transactions on Pattern Analysis and Machine Intelligence"
DOI: 10.1109/tpami.2024.3496576
Abstract: The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. However, existing methods can only handle…
read more here.
Keywords:
text retrieval;
text line;
text;
partial patches ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2025 at "PLOS One"
DOI: 10.1371/journal.pone.0333084
Abstract: Vision-language pre-training (VLP) methods have significantly advanced cross-modal tasks in recent years. However, image-text retrieval still faces two critical challenges: inter-modal matching deficiency and intra-modal fine-grained localization deficiency. These issues significantly impede the accuracy of…
read more here.
Keywords:
image text;
text retrieval;
dual stage;