LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Big data in clinical biochemistry

Photo by campaign_creators from unsplash

The term ‘big data’ is used to refer both to data-sets and to how the data are analysed and used. The data may be heterogeneous and complex and may change… Click to show full abstract

The term ‘big data’ is used to refer both to data-sets and to how the data are analysed and used. The data may be heterogeneous and complex and may change or accumulate rapidly. A given data-set may combine many different databases, so connecting them together can present a big challenge. Big data appeal to providers of services. By knowing more about the target audience, companies can target their products more accurately and perhaps more cheaply, thus increasing profit or reducing the cost to the consumer. Using big data may allow providers of public services, such as local boroughs, transport providers, and health authorities, to understand what services are needed and to target them at the appropriate consumers. The public is naturally suspicious about how data are collected and how they are used. This concern may be well founded. For instance, combining socioeconomic status (postcode) with health data, tax returns, Google searches, credit card bills and police data may reveal information about us of which we are unaware. The fear is that the data may be used to control us and to benefit the provider rather than the consumer. This has been realized recently with the demonstration of data leaks by Facebook. The UK is fortunate in having a unified health system, albeit that the component parts are often difficult to unite. This means that, in principle, all public health service data about patients in the UK are available to inform health policy and health outcomes and identify the factors that can help to improve health and provide healthcare in as rational and cost-effective a way as possible. Sources of data include GP records, collected and anonymized in the General Practice Research Database and the Health Improvement Network, individual hospital data, and laboratory databases. Health sources include genetics databases, pathology and laboratory medicine, pharmacy, outpatient attendances and socioeconomic status (postcodes).The main problems with using these data are maintaining anonymity, data compatibility, analysis and identifying the owner (s) of the combined data. It is possible to identify individuals simply by knowing their date of birth, sex and postcode, or because they have rare conditions or uncommon single nucleotide polymorphisms. Using effective algorithms to anonymize data is therefore crucial to allowing data to be released safely. To combine health data from different sources requires common identifiers for people, healthcare professionals and locations. While everyone registered with the NHS has an NHS number, not everyone in the UK is registered, and certain organizations, such as prisons and the armed forces, tend not to use them. Hospitals often use their own registration numbers internally in preference to NHS numbers, and the format may differ between departments, for instance by use of a check letter and the number may be stored as a number or as a character string. These factors increase the difficulty of combining data between departments and trusts. Even dates may be recorded in different formats. Analysing large data-sets requires the ability to scale up familiar methods. These include database

Keywords: biochemistry; big data; data may; clinical biochemistry; biochemistry big; health

Journal Title: Annals of Clinical Biochemistry
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.