LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Continual Knowledge Infusion into Pre-trained Biomedical Language Models.

Photo from wikipedia

MOTIVATION Biomedical language models produce meaningful concept representations that are useful for a variety of biomedical natural language processing (bioNLP) applications such as named entity recognition, relationship extraction, and question… Click to show full abstract

MOTIVATION Biomedical language models produce meaningful concept representations that are useful for a variety of biomedical natural language processing (bioNLP) applications such as named entity recognition, relationship extraction, and question answering. Recent research trends have shown that the contextualized language models (e.g., BioBERT, BioELMo) possess tremendous representational power and are able to achieve impressive accuracy gains. However, these models are still unable to learn high-quality representations for concepts with low context information (i.e., rare words). Infusing the complementary information from knowledge-bases (KBs) is likely to be helpful when the corpus-specific information is insufficient to learn robust representations. Moreover, as the biomedical domain contains numerous KBs, it is imperative to develop approaches that can integrate the KBs in a continual fashion. RESULTS We propose a new representation learning approach that progressively fuses the semantic information from multiple KBs into the pretrained biomedical language models. Since most of the KBs in the biomedical domain are expressed as parent-child hierarchies, we choose to model the hierarchical KBs and propose a new knowledge modeling strategy that encodes their topological properties at a granular level. Moreover, the proposed continual learning technique efficiently updates the concepts representations to accommodate the new knowledge whilst preserving the memory efficiency of contextualized language models. Altogether, the proposed approach generates knowledge-powered embeddings with high fidelity and learning efficiency. Extensive experiments conducted on bioNLP tasks validate the efficacy of the proposed approach and demonstrates its capability in generating robust concept representations. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Keywords: information; biomedical language; language models; continual knowledge; language

Journal Title: Bioinformatics
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.