LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Dynamic clustering for short text stream based on Dirichlet process

Photo from wikipedia

Due to the explosive growth of short text on various social media platforms, short text stream clustering has become an increasingly prominent issue. Unlike traditional text streams, short text stream… Click to show full abstract

Due to the explosive growth of short text on various social media platforms, short text stream clustering has become an increasingly prominent issue. Unlike traditional text streams, short text stream data present the following characteristics: short length, weak signal, high volume, high velocity, topic drift, etc. Existing methods cannot simultaneously address two major problems very well: inferring the number of topics and topic drift. Therefore, we propose a dynamic clustering algorithm for short text streams based on the Dirichlet process (DCSS), which can automatically learn the number of topics in documents and solve the topic drift problem of short text streams. To solve the sparsity problem of short texts, DCSS considers the correlation of the topic distribution at neighbouring time points and uses the inferred topic distribution of past documents as a prior of the topic distribution at the current moment while simultaneously allowing newly streamed documents to change the posterior distribution of topics. We conduct experiments on two widely used datasets, and the results show that DCSS outperforms existing methods and has better stability.

Keywords: based dirichlet; dynamic clustering; text stream; text; short text

Journal Title: Applied Intelligence
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.