LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications

Photo by maxchen2k from unsplash

Recently, tremendous advances have been observed in information retrieval systems designed to search for relevant knowledge in scientific publications. Although these techniques are quite powerful, there is still room for… Click to show full abstract

Recently, tremendous advances have been observed in information retrieval systems designed to search for relevant knowledge in scientific publications. Although these techniques are quite powerful, there is still room for improvement in the area of searching for metadata relating to algorithms in full-text publication datasets—for instance, efficiency-related metrics such as precision, recall, f-measure and accuracy, and other useful metadata such as the datasets deployed and the algorithmic run-time complexity. In this study, we proposed a novel deep learning-based feature engineering approach that improves search capabilities by mining algorithmic-specific metadata from full-text scientific publications. Typically, traditional term frequency-inverse document frequency (TF-IDF)-based approaches function like a ‘bag of words’ model and thus fail to capture either the text’s semantics or the word sequence. In this work, we designed a semantically enriched synopsis of each full-text document by adding algorithmic-specific deep metadata text lines to enhance the search mechanism of algorithm search systems. These text lines are classified by our deployed deep learning-based bi-directional long short term memory (LSTM) model. The designed bi-directional LSTM model outperformed the support vector machine by 9.46%, with a 0.81 f1-score on a dataset of 37,000 algorithm-specific deep metadata text lines that had been tagged by four human experts. Lastly, we present a case study on 21,940 full-text publications downloaded from ACL (https://aclweb.org/) to show the effectiveness of deep learning-based advanced feature engineering search compared to the conventional TF-IDF-based (Lucene) search.

Keywords: feature engineering; text; information retrieval; full text

Journal Title: Scientometrics
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.