Probabilistic topic models, such as latent dirichlet allocation (LDA), are often used to discover hidden semantic structure of a collection of documents. In recent years, various inference algorithms have been… Click to show full abstract
Probabilistic topic models, such as latent dirichlet allocation (LDA), are often used to discover hidden semantic structure of a collection of documents. In recent years, various inference algorithms have been developed to cope with learning of topic models, among which Gibbs sampling methods remain a popular choice. In this paper, we aim to improve the inference of topic models based on the Gibbs sampling framework. We extend a state augmentation based Gibbs sampling method by maximizing the replications of latent states, and propose a new generic deterministic inference method, named maximal latent state replication (MAX), for learning of a family of probabilistic topic models. One key benefit of the proposed method lies in the deterministic nature for inference, which may help to improve its running efficiency as well as predictive perplexity. We have conducted extensive experiments on real-life publicly available datasets, and the results have validated that our proposed method MAX significantly outperforms state-of-the-art baselines for inference of existing well-known topic models.
               
Click one of the above tabs to view related content.