LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

TriTag-NFPF: Knowledge Denoising for Chinese Encyclopedia based on Triple Tag-Constructed Potential Function

Photo by ziegi from unsplash

In this paper, a novel method is proposed for Chinese large-scale online encyclopedia knowledge denoising. Firstly, the initial similarity of the triples is acquired by the similarity computing method integrating… Click to show full abstract

In this paper, a novel method is proposed for Chinese large-scale online encyclopedia knowledge denoising. Firstly, the initial similarity of the triples is acquired by the similarity computing method integrating the Edit-Distance and TongYiCiCiLin similarity algorithm. Secondly, a novel nuclear field-like potential function of the Infobox knowledge triples is constructed in virtue of Chinese encyclopedia entry semantic tag. Finally, large-scale knowledge triple clustering and denoising are performed by means of the improved potential function proposed in this paper for the purpose of minimizing the influence of massive repetition and ambiguity in the Chinese open encyclopedia Knowledge Base (KB). The proposed method has solved the problems of semantic duplication, ambiguity and inappropriate classification of knowledge triples arising from constructing Chinese KBs. The experimental results indicate that the open-domain oriented Chinese encyclopedia KBs constructed by the method proposed in this paper is outperformed than the state-of-the-art methods.

Keywords: knowledge; method; chinese encyclopedia; knowledge denoising; potential function

Journal Title: IEEE Access
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.