In this paper, a novel method is proposed for Chinese large-scale online encyclopedia knowledge denoising. Firstly, the initial similarity of the triples is acquired by the similarity computing method integrating… Click to show full abstract
In this paper, a novel method is proposed for Chinese large-scale online encyclopedia knowledge denoising. Firstly, the initial similarity of the triples is acquired by the similarity computing method integrating the Edit-Distance and TongYiCiCiLin similarity algorithm. Secondly, a novel nuclear field-like potential function of the Infobox knowledge triples is constructed in virtue of Chinese encyclopedia entry semantic tag. Finally, large-scale knowledge triple clustering and denoising are performed by means of the improved potential function proposed in this paper for the purpose of minimizing the influence of massive repetition and ambiguity in the Chinese open encyclopedia Knowledge Base (KB). The proposed method has solved the problems of semantic duplication, ambiguity and inappropriate classification of knowledge triples arising from constructing Chinese KBs. The experimental results indicate that the open-domain oriented Chinese encyclopedia KBs constructed by the method proposed in this paper is outperformed than the state-of-the-art methods.
               
Click one of the above tabs to view related content.