In the regime of change-point detection, a nonparametric framework based on scan statistics utilizing graphs representing similarities among observations is gaining attention due to its flexibility and good performances for… Click to show full abstract
In the regime of change-point detection, a nonparametric framework based on scan statistics utilizing graphs representing similarities among observations is gaining attention due to its flexibility and good performances for high-dimensional and non-Euclidean data sequences, which are ubiquitous in this big data era. However, this graph-based framework encounters problems when there are repeated observations in the sequence, which often happens for discrete data, such as network data. In this work, we extend the graph-based framework to solve this problem by averaging or taking union of all possible optimal graphs resulted from repeated observations. We consider both the single change-point alternative and the changed-interval alternative, and derive analytic formulas to control the Type I error for the new methods, making them fast applicable to large datasets. The extended methods are illustrated on an application in detecting changes in a sequence of dynamic networks over time. All proposed methods are implemented in an R package gSeg available on CRAN.
               
Click one of the above tabs to view related content.