Traditional feature selection methods assume that the entire input feature set is available from the beginning. However, streaming features (SF) is an integral part of many real-world applications. In this… Click to show full abstract
Traditional feature selection methods assume that the entire input feature set is available from the beginning. However, streaming features (SF) is an integral part of many real-world applications. In this scenario, the number of training examples is fixed while the number of features grows with time as new features stream in. A critical challenge for streamwise feature selection (SFS) is the unavailability of the entire feature set before learning starts. Several efforts have been made to address the SFS problem, however they all need some prior knowledge about the entire feature set. In this paper, the SFS problem is considered from the rough sets (RS) perspective. The main motivation for this consideration is that RS-based data mining does not require any domain knowledge other than the given dataset. The proposed method uses the significance analysis concepts in RS theory to control the unknown feature space in SFS problems. This algorithm is evaluated extensively on several high-dimensional datasets in terms of compactness, classification accuracy, and running time. Experimental results demonstrate that the algorithm achieves better results than existing SFS algorithms.
               
Click one of the above tabs to view related content.