Sparse topic models (STMs) are widely used for learning a semantically rich latent sparse representation of short texts in large scale, mainly by imposing sparse priors or appropriate regularizers on… Click to show full abstract
Sparse topic models (STMs) are widely used for learning a semantically rich latent sparse representation of short texts in large scale, mainly by imposing sparse priors or appropriate regularizers on topic models. However, it is difficult for these STMs to model the sparse structure and pattern of the corpora accurately, since their sparse priors always fail to achieve real sparseness, and their regularizers bypass the prior information of the relevance between sparse coefficients. In this paper, we propose a novel Bayesian hierarchical topic models called Bayesian Sparse Topical Coding with Poisson Distribution (BSTC-P) on the basis of Sparse Topical Coding with Sparse Groups (STCSG). Different from traditional STMs, it focuses on imposing hierarchical sparse prior to leverage the prior information of relevance between sparse coefficients. Furthermore, we propose a sparsity-enhanced BSTC, Bayesian Sparse Topical Coding with Normal Distribution (BSTC-N), via mathematic approximation. We adopt superior hierarchical sparse inducing prior, with the purpose of achieving the sparsest optimal solution. Experimental results on datasets of Newsgroups and Twitter show that both BSTC-P and BSTC-N have better performance on finding clear latent semantic representations. Therefore, they yield better performance than existing works on document classification tasks.
               
Click one of the above tabs to view related content.