LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Illuminating the druggable genome through patent bioactivity data

Photo by nci from unsplash

The patent literature is a potentially valuable source of bioactivity data. The SureChEMBL database (https://www.surechembl.org/) is a publicly available large-scale resource that contains compounds extracted on a daily basis from… Click to show full abstract

The patent literature is a potentially valuable source of bioactivity data. The SureChEMBL database (https://www.surechembl.org/) is a publicly available large-scale resource that contains compounds extracted on a daily basis from the full text, images and attachments of patent documents, through an automated text and image-mining pipeline. In this paper we describe a process to prioritise 3.7 million life science relevant patents obtained from SureChEMBL, according to how likely they were to contain bioactivity data for potent small molecules on less-studied targets, according to the classification developed by the Illuminating the Druggable Genome (IDG) project. The overall goal was to select a smaller number of patents that could be manually curated and incorporated into the ChEMBL database. We describe the approach taken, the results obtained, and provide some illustrative examples.

Keywords: genome patent; illuminating druggable; druggable genome; bioactivity data; bioactivity

Journal Title: PeerJ
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.