Bayesian network classifiers (BNCs) are powerful tools for graphically encoding the dependency relationships among variables in a directed acyclic graph and reasoning under conditions of uncertainty. Ever increasing data quantity… Click to show full abstract
Bayesian network classifiers (BNCs) are powerful tools for graphically encoding the dependency relationships among variables in a directed acyclic graph and reasoning under conditions of uncertainty. Ever increasing data quantity makes ever more urgent the need of BNCs that are highly scalable and can perform significantly better in terms of classification. Numerous approaches have been proposed to mine conditional dependencies among attributes implicated in labeled training data under the framework of supervised learning, whereas the specific characteristics of unlabeled testing instances receive less attention. That may lead to overfitting and degradation in classification performance. In this paper, we argue that the knowledge learned from labeled training dataset and that from unlabeled testing instance are complementary in nature. The testing instance is pre-assigned with any possible label to make it complete, then log-likelihood function is introduced and redefined to measure the extents to which the learned BNC fits training or testing data. Heuristic search strategy is applied to learn two kinds of arbitrary k-dependence BNCs (general BNC for modeling training dataset and local BNC for modeling testing instance), which will work as an ensemble to make the final prediction under the framework of semi-supervised learning. The experimental evaluation on 40 publicly available datasets from the UCI machine learning repository reveals that the proposed algorithm achieves competitive classification performance compared with state-of-the-art BNCs and their variants, such as CFWNB, WATAN, FKDB, SKDB and IWAODE.
               
Click one of the above tabs to view related content.