Sign Up to like & get
recommendations!
2
Published in 2023 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2022.3207910
Abstract: Video-text retrieval is a crucial task that has been a powerful application for multi-media data analysis and attracted tremendous interest in the research area. The core steps are feature representations and alignment to overcome the…
read more here.
Keywords:
video text;
text retrieval;
video;
local alignment ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2023.3303945
Abstract: We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on…
read more here.
Keywords:
text tasks;
training;
video text;
network ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2023.3337247
Abstract: Existing video text detection methods mostly track texts with appearance feature only, thus are easily influenced by the change of perspective and illumination. In this paper, we propose an end-to-end video text detector that tracks…
read more here.
Keywords:
video;
text detection;
video text;
robust feature ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2024.3422869
Abstract: Video-Text Retrieval is a fundamental task in multi-modal understanding and has attracted increasing attention from both academia and industry communities in recent years. Generally, video inherently contains multi-grained semantic and each video corresponds to several…
read more here.
Keywords:
text retrieval;
video;
phrase;
reliable phrase ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2025 at "IEEE Transactions on Circuits and Systems for Video Technology"
DOI: 10.1109/tcsvt.2025.3574726
Abstract: Large-scale image-text pre-trained models have shown promising transferability to various downstream tasks. Video-text retrieval benefits from it by transferring pre-trained CLIP to video-text domain. Although these pre-trained models have shown impressive performance, full fine-tuning becomes…
read more here.
Keywords:
prompt tuning;
video text;
text retrieval;
agent ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2025 at "IEEE Transactions on Multimedia"
DOI: 10.1109/tmm.2025.3535394
Abstract: Video-to-text generation is a challenging task that involves translating video contents into accurate and expressive sentences. Existing methods often ignore the importance of establishing fine-grained semantics within visual representations and exploring textual knowledge implied by…
read more here.
Keywords:
video;
text generation;
language;
video text ... See more keywords
Sign Up to like & get
recommendations!
0
Published in 2020 at "IEEE transactions on neural networks and learning systems"
DOI: 10.1109/tnnls.2020.2997020
Abstract: Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable…
read more here.
Keywords:
retrieval;
text;
image text;
video text ... See more keywords