LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Deep contextualized embeddings for quantifying the informative content in biomedical text summarization

Photo by maxchen2k from unsplash

BACKGROUND AND OBJECTIVE Capturing the context of text is a challenging task in biomedical text summarization. The objective of this research is to show how contextualized embeddings produced by a… Click to show full abstract

BACKGROUND AND OBJECTIVE Capturing the context of text is a challenging task in biomedical text summarization. The objective of this research is to show how contextualized embeddings produced by a deep bidirectional language model can be utilized to quantify the informative content of sentences in biomedical text summarization. METHODS We propose a novel summarization method that utilizes contextualized embeddings generated by the Bidirectional Encoder Representations from Transformers (BERT) model, a deep learning model that recently demonstrated state-of-the-art results in several natural language processing tasks. We combine different versions of BERT with a clustering method to identify the most relevant and informative sentences of input documents. Using the ROUGE toolkit, we evaluate the summarizer against several methods previously described in literature. RESULTS The summarizer obtains state-of-the-art results and significantly improves the performance of biomedical text summarization in comparison to a set of domain-specific and domain-independent methods. The largest language model not specifically pretrained on biomedical text outperformed other models. However, among language models of the same size, the one further pretrained on biomedical text obtained best results. CONCLUSIONS We demonstrate that a hybrid system combining a deep bidirectional language model and a clustering method yields state-of-the-art results without requiring labor-intensive creation of annotated features or knowledge bases or computationally demanding domain-specific pretraining. This study provides a starting point towards investigating deep contextualized language models for biomedical text summarization.

Keywords: biomedical text; text summarization; contextualized embeddings; text; language

Journal Title: Computer methods and programs in biomedicine
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.