The mining of network sensitive information is of great significance for understanding the social stability of the network. Obtaining the network public opinion of sensitive information is helpful to master… Click to show full abstract
The mining of network sensitive information is of great significance for understanding the social stability of the network. Obtaining the network public opinion of sensitive information is helpful to master Internet users’ attitudes toward important social events. The related artificial intelligence technology can achieve the topics from the network texts. At present, the current topic recognition model has a low recognition rate for sensitive information and usually generates some inaccurate topic keywords. In this paper, a topic recognition method of the network sensitive information based on a sensitive word weighted-latent Dirichlet allocation (LDA) model is proposed. First, the basic sensitive word vocabulary is constructed by manual collection, and the embedding representation of the word is obtained through the training of a large amount of network corpus based onWord2vec. The semantic similarity between the word embedding is calculated to extend the basic sensitive word vocabulary. Second, the extended sensitive word vocabulary is embedded in the LDA model. On the one hand, it can improve the semantic understanding and the recognition ability of LDA for the sensitive topic words and promote the quality of the generated topic words. On the other hand, it can also improve the relevance of the topic keywords and the related topics and find more fine-grained keywords. The experimental results show that the sensitive word weighted-LDA model can effectively improve the topic recognition quantity and quality of sensitive information. This paper is helpful to the development of artificial intelligence. The generated corpus in this paper is meaningful to the research of text classification, clustering and information retrieval, and so on.
               
Click one of the above tabs to view related content.