Self-training method can train an effective classifier by exploiting labeled instances and unlabeled instances. In the process of self-training method, the high confidence instances are usually selected iteratively and added… Click to show full abstract
Self-training method can train an effective classifier by exploiting labeled instances and unlabeled instances. In the process of self-training method, the high confidence instances are usually selected iteratively and added to the training set for learning. Unfortunately, the structure information of high confidence instances is so similar that it leads to local over-fitting during the iterations. In order to avoid the over-fitting phenomenon, and improve the classification effect of self-training methods, a novel divide-and-conquer ensemble self-training framework based on probability difference is proposed. Firstly, the probability difference of instances is calculated by the category probability of each classifier, the low-fuzzy and high-fuzzy instances of each classifier are divided through the probability difference. Then, a divide-and-conquer strategy is adopted. That is, the low-fuzzy instances determined by all the classifiers are directly labeled and high-fuzzy instances are manually labeled. Finally, the labeled instances are added to the training set for iteration self-training. This method expands the training set by selecting low-fuzzy instances with accurate structure information and high-fuzzy instances with more comprehensive structure information, and it improves the generalization performance of the method effectively. The method is more suitable for noise data sets and it can obtain structure information even in a few labeled instances. The effectiveness of the proposed method is verified by comparative experiments on the University of California Irvine (UCI).
               
Click one of the above tabs to view related content.