"Illuminating the druggable genome through patent bioactivity data"

The patent literature is a potentially valuable source of bioactivity data. The SureChEMBL database (https://www.surechembl.org/) is a publicly available large-scale resource that contains compounds extracted on a daily basis from the full text, images and attachments of patent documents, through an automated text and image-mining pipeline. In this paper we describe a process to prioritise 3.7 million life science relevant patents obtained from SureChEMBL, according to how likely they were to contain bioactivity data for potent small molecules on less-studied targets, according to the classification developed by the Illuminating the Druggable Genome (IDG) project. The overall goal was to select a smaller number of patents that could be manually curated and incorporated into the ChEMBL database. We describe the approach taken, the results obtained, and provide some illustrative examples.

Keywords: genome patent; illuminating druggable; druggable genome; bioactivity data; bioactivity

Journal Title: PeerJ
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended