Feature selection is an important preprocessing step for high-dimensional data mining and machine learning; it is viewed as the selection of the optimal granularity to describe the target concept in… Click to show full abstract
Feature selection is an important preprocessing step for high-dimensional data mining and machine learning; it is viewed as the selection of the optimal granularity to describe the target concept in rough set theory. Currently, research on rough sets mainly focuses on granularity selection in flat classification scenarios, while organizing hundreds of labels for hierarchical classification (HC) can provide additional external information and achieve better performance in terms of both accuracy and efficiency. However, HC also faces the following problems: 1) the current measures’ failure to characterize the uncertainty in HC; 2) the inability to select the optimal granularity of the target concept in HC; and 3) no valid approach to select features in a decision system with hierarchical classification (HieDS). To address these problems, this article introduces HC into rough set theory and proposes an approach to granularity selection for HC. First, we introduce the knowledge distance to reflect the uncertainty of HC and define related important characteristic functions to describe a HieDS. Then, from the perspective of uncertainty, granularity selection for the target concept and feature selection are presented based on these characteristic functions. Finally, we experimentally realize granularity selection and demonstrate excellent performance of feature selection in a HieDS in terms of both feature selection and classification accuracy.
               
Click one of the above tabs to view related content.