The SPiForest, a new isolation-based approach to outlier detection, constructs iTrees on the space containing all attributes by probability density-based inverse sampling. Most existing iForest (iF)-based approaches can precisely and… Click to show full abstract
The SPiForest, a new isolation-based approach to outlier detection, constructs iTrees on the space containing all attributes by probability density-based inverse sampling. Most existing iForest (iF)-based approaches can precisely and quickly detect outliers scattering around one or more normal clusters. However, the performance of these methods seriously decreases when facing outliers whose nature "few and different" disappears in subspace (e.g., anomalies surrounded by normal samples). To solve this problem, SPiForest is proposed, which is different from existing approaches. First, SPiForest uses the principal component analysis (PCA) to find principal components and estimate each component's probability density function (pdf). Second, SPiForest utilizes the inv-pdf, which is inversely proportional to the pdf estimated from the given dataset, to generate support points in the space containing all attributes. Third, the hyperplane decided by these support points is used to isolate the outliers in the space. Next, these steps are repeated to build an iTree. Finally, many iTrees construct a forest for outlier detection. SPiForest provides two benefits: 1) it isolates outliers with fewer hyperplanes, which significantly improves the accuracy and 2) it effectively detects the outliers whose nature "few and different" disappears in subspace. Comparative analyses and experiments show that the SPiForest achieves a significant improvement in terms of area under the curve (AUC) when compared with the state-of-the-art methods. Specifically, our method improves by at most 17.7% on AUC when compared to iF-based algorithms.
               
Click one of the above tabs to view related content.