We are delighted to present this special issue of the distributed and parallel databases journal (DPDB) on large-scale data curation and metadata management. Data curation and annotation are becoming essential… Click to show full abstract
We are delighted to present this special issue of the distributed and parallel databases journal (DPDB) on large-scale data curation and metadata management. Data curation and annotation are becoming essential mechanisms for capturing a wide variety of metadata related to data. This metadata may carry different semantics ranging from tracking the data’s lineage and provenance, quality information, exchanging knowledge and discussion messages among scientists, attaching related articles or documents, linking to relevant statistics about the data, and highlighting erroneous or conflicting values. This metadata may be represented in many different formats including free-text values, articles or binary files, images, structured information such as provenance, or semi-structured content such as email messages. The creation and maintenance of annotated databases and metadata repositories require a great deal of effort (and cost) from many scientists and domain experts. Yet, the gain from the maintained annotations is still very limited, because of the lack of comprehensive solutions that automate large-scale metadata management tasks such as storage of annotations, extracting meaningful information from large sets of annotations, and their propagation through operations. Thus, the virtue of the hidden knowledge in thismetadata is still uncharted. The growing volume, profound complex-
               
Click one of the above tabs to view related content.