BACKGROUND Outcomes are variables monitored during a clinical trial to assess the impact of an intervention on humans' health.Automatic assessment of semantic similarity of trial outcomes is required for a… Click to show full abstract
BACKGROUND Outcomes are variables monitored during a clinical trial to assess the impact of an intervention on humans' health.Automatic assessment of semantic similarity of trial outcomes is required for a number of tasks, such as detection of outcome switching (unjustified changes of pre-defined outcomes of a trial) and implementation of Core Outcome Sets (minimal sets of outcomes that should be reported in a particular medical domain). OBJECTIVE We aimed at building an algorithm for assessing semantic similarity of pairs of primary and reported outcomes.We focused on approaches that do not require manually curated domain-specific resources such as ontologies and thesauri. METHODS We tested several approaches, including single measures of similarity (based on strings, stems and lemmas, paths and distances in an ontology, and vector representations of phrases), classifiers using a combination of single measures as features, and a deep learning approach that consists in fine-tuning pre-trained deep language representations.We tested language models provided by BERT (trained on general-domain texts), BioBERT and SciBERT (trained on biomedical and scientific texts, respectively).We explored the possibility of improving the results by taking into account the variants for referring to an outcome (e.g.the use of a measurement tool name instead on the outcome name; the use of abbreviations).We release an open corpus with annotation for similarity of pairs of outcomes. RESULTS Classifiers using a combination of single measures as features outperformed the single measures, while deep learning algorithms using BioBERT and SciBERT models outperformed the classifiers.BioBERT reached the best F-measure of 89.75%.The addition of variants of outcomes did not improve the results for the best-performing single measures nor for the classifiers, but it improved the performance of deep learning algorithms: BioBERT achieved an F-measure of93.38%. CONCLUSIONS Deep learning approaches using pre-trained language representations outperformed other approaches for similarity assessment of trial outcomes, without relying on any manually curated domain-specific resources (ontologies and other lexical resources). Addition of variants of outcomes further improved the performance of deep learning algorithms.
               
Click one of the above tabs to view related content.