The term “big data” was first coined in 1997 in the computer science literature to refer to data that is too large to be warehoused in traditional data storage systems… Click to show full abstract
The term “big data” was first coined in 1997 in the computer science literature to refer to data that is too large to be warehoused in traditional data storage systems and analyzed by traditional data processing applications. Since then, its definition has evolved to encompass exceptionally high-volume data compared to traditional datasets, the velocity at which data are collected and available for analysis, and the variety of different types and locations from which data can be sourced.2–5 These are known as the Vs of big data—volume, velocity, and variety. Common examples of very large data sets are video subscription libraries (e.g., Netflix), social media, video or audio data, and DNA sequencing databases. The term big data has also come to refer to the advanced data analytic methods (otherwise known as big data analytics or BDA) that may be used to extract information from data. Big data and BDA are commonly employed in nonmedicine sectors including banking, communications, media, manufacturing, insurance, and energy to improve operational management and decision-making. For example, supply chain management can be enhanced by the provision of accurate inventory and operational data, as well as real-time corrections of supplier changes. In the retail market, user analytics can detect purchasing patterns and provide products frequently purchased together as suggestions to end users. Governments are using big data for fraud detection and cyber security. In these sectors, 80% of executives have reported their big data investments as a success. In healthcare, adoption of big data and BDA has lagged but is increasingly employed in both research and non-research settings. Applications are broad: diagnostics, preventative medicine, precision medicine, cost reduction, and population health are a few examples. For instance, machine learning has been used for automated abnormality detection and finding interpretation for diagnostics in radiology, as well as automated correlations of imaging results with other laboratory and clinical parameters. Databases from California-based Kaiser Permanente of over 2 million patient follow-up years were used to determine an association between Rofecoxib (a nonsteroidal anti-inflammatory medication) with an increased risk of coronary artery disease, leading to its withdrawal from the market. Department of Health in Western Australia has used the jurisdiction's drug and alcohol services for visualization of drug-related hospitalizations, arrests, ambulance requests, and mortality to identify at-risk populations and regions. In cancer research, next generation sequencing is producing large volumes of genomic data used for molecular analysis of tumors and for predictive analytics by using machine learning techniques. Received: 21 April 2022 Revised: 5 May 2022 Accepted: 8 May 2022
               
Click one of the above tabs to view related content.