Given a dataset an outlier can be defined as an observation that does not follow the statistical properties of the majority of the data. Computation of the location estimate is… Click to show full abstract
Given a dataset an outlier can be defined as an observation that does not follow the statistical properties of the majority of the data. Computation of the location estimate is of fundamental importance in data analysis, and it is well known in statistics that classical methods, such as taking the sample average, can be greatly affected by the presence of outliers in the data. Using the median instead of the mean can partially resolve this issue but not completely. For the univariate case, a robust version of the median is the Least Trimmed Absolute Deviation (LTAD) robust estimator introduced in Tableman (Stat Probab Lett 19(5):387–398, 1994), which has desirable asymptotic properties such as robustness, consistently, high breakdown and normality. There are different generalizations of the LTAD for multivariate data, depending on the choice of norm. Chatzinakos et al. (J Comb Optim, 2015) we present such a generalization using the Euclidean norm and propose a solution technique for the resulting combinatorial optimization problem, based on a necessary condition, that results in a highly convergent local search algorithm. In this subsequent work, we use the $$L^1$$L1 norm to generalize the LTAD to higher dimensions, and show that the resulting mixed integer programming problem has an integral relaxation, after applying an appropriate data transformation. Moreover, we utilize the structure of the problem to show that the resulting LP’s can be solved efficiently using a subgradient optimization approach. The robust statistical properties of the proposed estimator are verified by extensive computational results.
               
Click one of the above tabs to view related content.