Articles with "video text" as a keyword



Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text Retrieval

Sign Up to like & get
recommendations!
Published in 2023 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2022.3207910

Abstract: Video-text retrieval is a crucial task that has been a powerful application for multi-media data analysis and attracted tremendous interest in the research area. The core steps are feature representations and alignment to overcome the… read more here.

Keywords: video text; text retrieval; video; local alignment ... See more keywords

SNP-S3: Shared Network Pre-Training and Significant Semantic Strengthening for Various Video-Text Tasks

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2023.3303945

Abstract: We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on… read more here.

Keywords: text tasks; training; video text; network ... See more keywords

Video Text Detection With Robust Feature Representation

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2023.3337247

Abstract: Existing video text detection methods mostly track texts with appearance feature only, thus are easily influenced by the change of perspective and illumination. In this paper, we propose an end-to-end video text detector that tracks… read more here.

Keywords: video; text detection; video text; robust feature ... See more keywords

Reliable Phrase Feature Mining for Hierarchical Video-Text Retrieval

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2024.3422869

Abstract: Video-Text Retrieval is a fundamental task in multi-modal understanding and has attracted increasing attention from both academia and industry communities in recent years. Generally, video inherently contains multi-grained semantic and each video corresponds to several… read more here.

Keywords: text retrieval; video; phrase; reliable phrase ... See more keywords

Agent-Based Control Prompt Tuning for Video-Text Retrieval

Sign Up to like & get
recommendations!
Published in 2025 at "IEEE Transactions on Circuits and Systems for Video Technology"

DOI: 10.1109/tcsvt.2025.3574726

Abstract: Large-scale image-text pre-trained models have shown promising transferability to various downstream tasks. Video-text retrieval benefits from it by transferring pre-trained CLIP to video-text domain. Although these pre-trained models have shown impressive performance, full fine-tuning becomes… read more here.

Keywords: prompt tuning; video text; text retrieval; agent ... See more keywords

Vision-Language Relational Transformer for Video-to-Text Generation

Sign Up to like & get
recommendations!
Published in 2025 at "IEEE Transactions on Multimedia"

DOI: 10.1109/tmm.2025.3535394

Abstract: Video-to-text generation is a challenging task that involves translating video contents into accurate and expressive sentences. Existing methods often ignore the importance of establishing fine-grained semantics within visual representations and exploring textual knowledge implied by… read more here.

Keywords: video; text generation; language; video text ... See more keywords

Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals.

Sign Up to like & get
recommendations!
Published in 2020 at "IEEE transactions on neural networks and learning systems"

DOI: 10.1109/tnnls.2020.2997020

Abstract: Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable… read more here.

Keywords: retrieval; text; image text; video text ... See more keywords