LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Secure and Scalable Collection of Biomedical Data for Machine Learning Applications

Photo from wikipedia

Recently, digitization of biomedical processes has accelerated, in no small part due to the use of machine learning techniques which require large amounts of labeled data. This chapter focuses on… Click to show full abstract

Recently, digitization of biomedical processes has accelerated, in no small part due to the use of machine learning techniques which require large amounts of labeled data. This chapter focuses on the prerequisite steps to the training of any algorithm: data collection and labeling. In particular, we tackle how data collection can be set up with scalability and security to avoid costly and delaying bottlenecks. Unprecedented amounts of data are now available to companies and academics, but digital tools in the biomedical field encounter a problem of scale, since high-throughput workflows such as high content imaging and sequencing can create several terabytes per day. Consequently data transport, aggregation, and processing is challenging.A second challenge is maintenance of data security. Biomedical data can be personally identifiable, may constitute important trade-secrets, and be expensive to produce. Furthermore, human biomedical data is often immutable, as is the case with genetic information. These factors make securing this type of data imperative and urgent. Here we address best practices to achieve security, with a focus on practicality and scalability. We also address the challenge of obtaining usable, rich metadata from the collected data, which is a major challenge in the biomedical field because of the use of fragmented and proprietary formats. We detail tools and strategies for extracting metadata from biomedical scientific file formats and how this underutilized metadata plays a key role in creating labeled data for use in the training of neural networks.

Keywords: machine learning; secure scalable; biomedical data; collection; scalable collection

Journal Title: Methods in molecular biology
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.