Secondary use of health data is made difficult in part because of large semantic heterogeneity. Many efforts are being made to align local terminologies with international standards. With increasing concerns… Click to show full abstract
Secondary use of health data is made difficult in part because of large semantic heterogeneity. Many efforts are being made to align local terminologies with international standards. With increasing concerns about data privacy, we focused here on the use of machine learning methods to align biological data elements using aggregated features that could be shared as open data. A 3-step methodology (features engineering, blocking strategy and supervised learning) was proposed. The first results, although modest, are encouraging for the future development of this approach.
               
Click one of the above tabs to view related content.