LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Coverage-based query subtopic diversification leveraging semantic relevance

Photo by cdc from unsplash

Generally, users are reserved in describing their search intention when submitting queries into the search engine. Therefore, a large number of search queries are usually short, ambiguous and tend to… Click to show full abstract

Generally, users are reserved in describing their search intention when submitting queries into the search engine. Therefore, a large number of search queries are usually short, ambiguous and tend to have multiple interpretations. With the gigantic size of the web, ignoring the information needs underlying such queries can misguide the search engine. To mitigate these issues, an effective approach is to diversify the search results considering the query subtopics with diverse intents. The task of identifying possible subtopics with diverse intents underlying a query is known as subtopic mining. This paper is aimed at mining and diversifying subtopics underlying a query. Our method first exacts noun phrases containing the query terms from the top-retrieved web documents. We also extract query suggestions and completions from commercial search engines. The extracted candidates highly related to the query are then selected as subtopics. We introduce a new relatedness score function to estimate the degree of relatedness between the query and the candidate. To estimate the relevancy between the query and the subtopic, this paper introduces a semantic relevance measure using a locally trained sentence embedding model. Finally, we propose a novel coverage-based diversification technique to rank the subtopics combining their relevancy and the coverage estimated by the web documents. The experimental results on two NTCIR English subtopic mining datasets demonstrate that our proposed method achieves new state-of-the-art performance and significantly outperforms some known related methods in terms of relevance (D-nDCG) and diversity (D#-nDCG) metric at cut of 10.

Keywords: search; semantic relevance; query subtopic; coverage; coverage based; query

Journal Title: Knowledge and Information Systems
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.