LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

YOLO-Former: Marrying YOLO and Transformer for Foreign Object Detection

Photo from wikipedia

The automatic detection of foreign objects between platform screen doors (PSDs) and metro train doors significantly affects personnel and property safety and maintains the train’s normal operation. However, some existing… Click to show full abstract

The automatic detection of foreign objects between platform screen doors (PSDs) and metro train doors significantly affects personnel and property safety and maintains the train’s normal operation. However, some existing works only determine the presence of foreign objects but cannot indicate their categories. Besides, although deep-learning-based object detection algorithms can indicate the presence and categories of foreign objects, most of them only harness the information in region proposals, ignoring global contextual information. Furthermore, their performance comes at the considerable cost of computational complexity, and leading cannot be well deployed in the metro environment. To address these issues and better implement foreign object detection (FOD), we present You Only Look Once-Transformer (YOLO-Former), a simple but efficient model. YOLO-Former is accomplished based on YOLOv5 through the following procedure. First, the vision transformer (ViT) is introduced for dynamic attention and global modeling, thereby solving the problem that the original YOLOv5 only utilizes information in region proposals and has insufficient ability to capture global information. Second, the convolutional block attention module (CBAM) and the stem module are used to improve feature expression ability further and reduce floating-point operations (FLOPs). Finally, we design various variants with different widths and depths to meet every need. Experiments on the FOD dataset (FODD) and the PASCAL VOC dataset demonstrate that YOLO-Former-x consistently outperforms other state of the arts with significant margins (0.5–11.3 mean average precision (mAP) on FODD and 0.6–13.6 on the PASCAL VOC dataset). Last but not least, YOLO-Former-x maintains real-time processing speed (27.32 and 28.17 frames/s (FPS) on TITAN Xp).

Keywords: information; yolo former; foreign object; yolo; object detection

Journal Title: IEEE Transactions on Instrumentation and Measurement
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.