LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Comparison of text-based and linked-based metrics in terms of estimating the similarity of articles

Photo by maxchen2k from unsplash

The aim of this study is to identify the power of text-based metrics (Cosine and Lucene similarity) and linked-based (Co-citation, bibliographic coupling, Amsler, PageRank, and HITS) and their combination in… Click to show full abstract

The aim of this study is to identify the power of text-based metrics (Cosine and Lucene similarity) and linked-based (Co-citation, bibliographic coupling, Amsler, PageRank, and HITS) and their combination in estimating the similarity of articles with each other. The experiments were conducted on a test collection of 26,262 articles in the PubMed Central Open Access Subset (PMC OAS) of CITREC that, in addition to having linked-based metrics, their full text was available for calculating text-based metrics. Thirty articles were selected as primary articles, and articles related to each of them were retrieved based on the mesh similarity metric. Then, the similarity of the retrieved documents based on text-based and linked-based metrics was also extracted. In the next stage, text-based, linked-based, and hybrid metrics were entered into the generalized regression model to estimate the similarity of the articles to determine their power; finally, the performance of the models was compared based on the mean squared error and correlation. The results showed that the model included Cosine and Lucene similarity metrics in text-based metrics. In linked-based metrics, HITS (Hub), HITS (authority), PageRank, and co-citation had the highest power, respectively; but the bibliographic coupling and Amsler could not enter the model. In general, a comparison of text-based, linked-based, and hybrid metrics performance indicated that the linked-based model estimates similarity between articles better than the text-based model, and the combination of text-based and linked-based metrics makes little change in improving the power of the articles. Despite the importance and application of text-based and linked-based metrics to measure the similarity of articles, a study that examines the power of these metrics alone and in comparison with each other in estimating the similarity of articles was not observed.

Keywords: linked based; text based; based metrics; similarity articles

Journal Title: Journal of Librarianship and Information Science
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.