Local depth

https://doi.org/10.1016/j.jspi.2010.08.001Get rights and content

Abstract

Local depth is a generalization of ordinary depth able to reveal local features of the probability distribution. Liu's simplicial depth is primarily used, but results for Tukey's halfspace depth are also derived. It is shown that the maximizers of local depth can help to detect the mode(s) of a probability distribution. This work is devoted to the univariate case, but the main definitions are stated in the general multivariate case. Theoretical results and applications are illustrated with several examples.

Introduction

Data depth is a general method for the analysis of probability distributions and data sets. Of particular interest are geometrical depth functions, like Liu's simplicial depth (Liu, 1990), Oja's simplicial volume depth (Oja, 1983, Zuo and Serfling, 2000) and Tukey's halfspace depth (Tukey, 1975), because no distributional assumptions are required and the results are distribution-free. Depth methods allow several features and properties of univariate distributions to be extended to multivariate distributions. Notable examples are new functionals describing multivariate location and spread (e.g., deepest points and depth regions), depth L-statistics and general purpose graphical displays for data exploration and comparison in any dimension, e.g., the DD-plot and the scale curve (Liu et al., 1999). Inferential applications include multivariate analogues of Wilcoxon rank sum test (Liu and Singh, 1993), diagnostics of non-normality (Liu et al., 1999) and tests of location and scale differences derived from the DD-plot (Li and Liu, 2004). Depth methods are also used in classification and clustering (Ghosh and Chauduri, 2005, Jornsten, 2004) with results which appear competitive with classical methods.

The basic idea underlying all applications of data depth is to rank the points of the space according to their degree of centrality with the location(s) of the deepest point(s) identifying the center(s) of the distribution. Regular depth functions, including halfspace, simplicial and simplicial volume depth, are decreasing from the center and this monotonicity property implies that they admit just one maximizer, whatever the distribution may be, unimodal or multimodal. This fact has long been known (e.g., Zuo and Serfling, 2000, p. 461, see also Rafalin et al., 2006) and is perfectly coherent when the focus is on the global features of the distribution, disregarding local features like the peaks of the density function. However, the existence of multiple centers, their locations and relative importance are of interest in several theoretical and applied domains, e.g., mixture distributions and cluster analysis. In this paper we suggest simple modifications of simplicial and halfspace depth so as to allow them to record the local space geometry, near any given point. For simplicial depth, the proposal is to consider only random simplices with size not greater than a fixed threshold. For halfspace depth, halfspaces are replaced by infinite slabs with finite width. The formal definitions are given in Section 2. When the tuning parameter (simplex size, slab width) tends to infinity, the ordinary definitions are recovered, thus regular depth is a special case of local depth and this could be an advantage over alternative approaches, like Rafalin et al. (2006). Indeed, the role of the tuning parameter is to provide a continuum of degrees of detail of the description, from very narrow neighbourhoods of the points up to the whole space. Hence local depth is a class of center-outward ranking functions serving multiple purposes, according to the value of the tuning parameter: low values describe centralness of the points of the space conditional on a small window around them, higher values force wider windows and therefore produce rankings more and more similar to ordinary depth. The stationary points of local depth, also indexed by the window size, define a (set) parameter of the distribution including the points corresponding to local maxima/minima of local depth, coincident with (or near to) the modes and anti-modes of the distribution. The rationale is different from the familiar descriptions provided by the moments or, in the univariate case, by the quantiles: this new parameter corresponds to the points with comparatively higher/lower probability mass in the surrounding window. Again, as the window size grows higher, the parameter converges to the (unique) overall center of the distribution. In summary, local depth provides a broad framework for nonparametric analysis, including all the issues of ordinary depth as particular instances.

In the univariate case, the local versions of simplicial and halfspace depth are simple enough to allow a detailed investigation, at least for absolutely continuous distributions. The main result is that, unlike their global counterparts, they can indeed register multiple modes. This suggests that in the sample case the maximizers of local depth can be used to estimate the mode(s) of the underlying distribution. The details are given in 3 Maximizers of local depth, 4 Sample data.

Local depth has some contact points with density function and the relationship in the univariate case is described in Section 2.3. However, the method retains some distinctive features. The ranks it produces are probabilities (or estimates of probabilities, in the sample case) based on nonparametric functionals and they are invariant to all affine transformations. We argue that, in the univariate case, for suitable low values of the tuning parameter, sample local depth can be a valid substitute of a density estimate but at the present stage it is still unclear whether this result could extend to the general multivariate case.

To illustrate the topics, examples are provided throughout, both with specific distribution models and real data. The final section is devoted to an overall discussion.

Section snippets

Local simplicial depth

We consider a p-dimensional random vector X with probability distribution function F on the Euclidean space Rp. Let Xi, i=1,…,p+1, be p+1 independent copies of X. We write Sp+1S(X1,,Xp+1) for the (closed) random simplex whose vertices are X1,…,Xp+1. Liu's simplicial depth is the coverage function of this random set.

Definition 1

Liu, 1990

The simplicial depth function ds(·,F)ds(·) maps xRp to the probability that Sp+1 covers x, that is ds(x)=PF(Sp+1:xSp+1).

The ordinary measure of the size of a subset of Rp is its

Maximizers of local depth

For any depth function d(·;F) the functionalθF=argmaxxd(x;F)defines the center of the distribution and provides a generalization of the standard notion of median to the multivariate setting. If F is absolutely continuous, there is a unique point satisfying (4) and the depth decreases monotonically as a point moves away from θF along any given ray through the center. This uniqueness of the center holds not only for unimodal distributions, but also for uniform, U-shaped (in the univariate case)

Sample data

In this section the sampling theory of local depth is considered. We concentrate on simplicial depth and related quantities because local halfspace depth provides equivalent results in the univariate case. The distributional properties of sample simplicial depth and its maximizers in the general multivariate case carry over the local version with only minor modifications. The most important results are uniform consistency and asymptotic normality of the empirical local simplicial depth process

Discussion

Local depth is a generalization of ordinary depth able to reveal local features of the distribution, like local minima and maxima of the density. Common to the different definitions is the idea of considering constant-size neighbourhoods of each point of the space, whose radius is the τ parameter, with essentially the same meaning as the window size in density function estimation. The usual definition is recovered when τ, thus local depth includes ordinary depth as a particular case. The most

References (18)

  • R. Jornsten

    Clustering and classification based on the L1 data depth

    Journal of Multivariate Analysis

    (2004)
  • H. Oja

    Descriptive statistics for multivariate distributions

    Statistics and Probability Letters

    (1983)
  • Agostinelli, C., Romanazzi, M., 2008. Multivariate local depth. Technical Report, Department of Statistics, Ca’ Foscari...
  • M.A. Arcones et al.

    Limit theorems for U-processes

    The Annals of Probability

    (1993)
  • M.A. Arcones et al.

    Estimators related to U-processes with applications to multivariate medians: asymptotic normality

    The Annals of Statistics

    (1994)
  • A.K. Ghosh et al.

    On maximum depth and related classifiers

    Scandinavian Journal of Statistics

    (2005)
  • W. Hardle

    Smoothing Techniques with Applications in S

    (1990)
  • P.J. Kelly et al.

    Geometry and Convexity

    (1979)
  • J. Li et al.

    New nonparametric tests of multivariate locations and scales using data depth

    Statistical Science

    (2004)
There are more references available in the full text version of this article.

Cited by (41)

  • Data depth for measurable noisy random functions

    2019, Journal of Multivariate Analysis
  • Local half-region depth for functional data

    2018, Journal of Multivariate Analysis
    Citation Excerpt :

    Half-region Depth, [29] Local depths [2] are a recent proposal to extend the classical definition of statistical data depth. In early days, it was often assumed that depth ranks could single out just one center of a distribution, corresponding to the maximizer of the ranks, whether the distribution is unimodal or multimodal.

  • A weighted localization of halfspace depth and its properties

    2017, Journal of Multivariate Analysis
    Citation Excerpt :

    It will be shown that this concept leads to a characterization of each point in terms of a global component (its data depth) and a local one (its probability density function). Localization of different depth functions was previously considered by Agostinelli and Romanazzi [1], mostly in terms of simplicial depth. These authors’ localization of halfspace depth is a special variant of the weighted halfspace depth discussed herein.

  • Air quality across a European hotspot: Spatial gradients, seasonality, diurnal cycles and trends in the Veneto region, NE Italy

    2017, Science of the Total Environment
    Citation Excerpt :

    This latter correction helps in investigating daily patterns of anthropogenic emission sources. Data were analysed using R (R Core Team, 2016) and a series of supplementary packages, including ‘openair’ (Carslaw and Ropkins, 2012; Carslaw, 2015), ‘PMCMR’ (Pohlert, 2015) and ‘localdepth’ (Agostinelli and Romanazzi, 2011, 2013). A classification analysis was used to check the accuracy of the site categorization, i.e., to verify whether the sampling sites in a category (RUR, SUB + URB, TRA, IND) are characterised by a general homogeneity in air pollutant levels.

View all citing articles on Scopus
View full text