LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

MICER: a pre-trained encoder-decoder architecture for molecular image captioning

Photo by joakimnadell from unsplash

MOTIVATION Automatic recognition of chemical structures from molecular images provides an important avenue for the rediscovery of chemicals. Traditional rule-based approaches that rely on expert knowledge and fail to consider… Click to show full abstract

MOTIVATION Automatic recognition of chemical structures from molecular images provides an important avenue for the rediscovery of chemicals. Traditional rule-based approaches that rely on expert knowledge and fail to consider all the stylistic variations of molecular images usually suffer from cumbersome recognition processes and low generalization ability. Deep learning-based methods that integrate different image styles and automatically learn valuable features are flexible, but currently under-researched and have limitations, and are therefore not fully exploited. RESULTS MICER, an encoder-decoder-based, reconstructed architecture for molecular image captioning, combines transfer learning, attention mechanisms, and several strategies to strengthen effectiveness and plasticity in different datasets. The effects of stereochemical information, molecular complexity, data volume, and pre-trained encoders on MICER performance were evaluated. Experimental results show that the intrinsic features of the molecular images and the sub-model match have a significant impact on the performance of this task. These findings inspire us to design the training dataset and the encoder for the final validation model, and the experimental results suggest that the MICER model consistently outperforms the state-of-the-art methods on four datasets. MICER was more reliable and scalable due to its interpretability and transfer capacity and provides a practical framework for developing comprehensive and accurate automated molecular structure identification tools to explore unknown chemical space. AVAILABILITY https://github.com/Jiacai-Yi/MICER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Keywords: architecture molecular; encoder decoder; molecular image; pre trained; image captioning; image

Journal Title: Bioinformatics
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.