We propose a weighted outlier mining method called WATCH to identify outliers in high-dimensional categorical datasets. WATCH is composed of two distinctive modules: 1) feature grouping by the virtue of… Click to show full abstract
We propose a weighted outlier mining method called WATCH to identify outliers in high-dimensional categorical datasets. WATCH is composed of two distinctive modules: 1) feature grouping by the virtue of correlation measurement among features and 2) outlier mining by assigning scores to objects in each feature groups. At the heart of WATCH is the feature grouping module, which groups an array of features into multiple groups to discover various aspects of feature patterns in each group. The outlier mining module detects outliers from high-dimensional categorical datasets. Except for the number of outliers specified by users, WATCH is conducive to bypassing the optimization of any user-given parameter. We implement and evaluate WATCH using synthetic and real-world datasets. Our experimental results show that WATCH is a promising and practical algorithm to detect outliers in high-dimensional categorical datasets, because WATCH achieves high performance in terms of precision, efficiency, and interpretability.
               
Click one of the above tabs to view related content.