Abstract
Slepian and Sudakov–Fernique type inequalities, which compare expectations of maxima of Gaussian random vectors under certain restrictions on the covariance matrices, play an important role in probability theory, especially in empirical process and extreme value theories. Here we give explicit comparisons of expectations of smooth functions and distribution functions of maxima of Gaussian random vectors without any restriction on the covariance matrices. We also establish an anti-concentration inequality for the maximum of a Gaussian random vector, which derives a useful upper bound on the Lévy concentration function for the Gaussian maximum. The bound is dimension-free and applies to vectors with arbitrary covariance matrices. This anti-concentration inequality plays a crucial role in establishing bounds on the Kolmogorov distance between maxima of Gaussian random vectors. These results have immediate applications in mathematical statistics. As an example of application, we establish a conditional multiplier central limit theorem for maxima of sums of independent random vectors where the dimension of the vectors is possibly much larger than the sample size.
Similar content being viewed by others
1 Introduction
We derive a bound on the difference in expectations of smooth functions of maxima of finite dimensional Gaussian random vectors. We also derive a bound on the Kolmogorov distance between distributions of these maxima. The key property of these bounds is that they depend on the dimension \(p\) of Gaussian random vectors only through \(\log p\), and on the max norm of the difference between the covariance matrices of the vectors. These results extend and complement the work of [7] that derived an explicit Sudakov–Fernique type bound on the difference of expectations of maxima of Gaussian random vectors. See also [1], Chapter 2. As an application, we establish a conditional multiplier central limit theorem for maxima of sums of independent random vectors where the dimension of the vectors is possibly much larger than the sample size. In all these results, we allow for arbitrary covariance structures between the coordinates in random vectors, which is plausible especially in applications to high-dimensional statistics. We stress that the derivation of bounds on the Kolmogorov distance is by no means trivial and relies on a new anti-concentration inequality for maxima of Gaussian random vectors, which is another main result of this paper (see Comment 4 for what anti-concentration inequalities here precisely refer to and how they differ from the concentration inequalities). These anti-concentration bounds are non-trivial in the following sense: (i) they apply to every dimension \(p\) and they are dimension-free in the sense that the bounds depend on the dimension \(p\) only through the expectation of the maximum of the Gaussian random vector, thereby admitting direct extensions to the infinite dimensional case, namely, separable Gaussian processes (see [10] for this extension and applications to empirical processes). This dimension-free nature is parallel to the Gaussian concentration inequality, which states that the supremum concentrates around the expected supremum. (ii) They allow for arbitrary covariance structures between the coordinates in Gaussian random vectors, and (iii) they are sharp in the sense that there is an example for which the bound is tight up to a dimension independent constant. We note that these anti-concentration bounds are sharper than those that result from application of the universal reverse isoperimetric inequality of [2] (see also [3], pp. 386–367).
Comparison inequalities for Gaussian random vectors play an important role in probability theory, especially in empirical process and extreme value theories. We refer the reader to [7, 12, 14, 16–18, 24], and [30] for standard references on this topic. The anti-concentration phenomenon has attracted considerable interest in the context of random matrix theory and the Littlewood–Offord problem in number theory. See, for example, [22, 23], and [29] who remarked that concentration is better understood than anti-concentration. Those papers were concerned with the anti-concentration in the Euclidean norm for sums of independent random vectors, and the topic and the proof technique here are substantially different from theirs.
Either of the comparison or anti-concentration bounds derived in the paper have many immediate statistical applications, especially in the context of high-dimensional statistical inference, where the dimension \(p\) of vectors of interest is much larger than the sample size (see [5] for a textbook treatment of the recent developments of high-dimensional statistics). In particular, the results established here are helpful in deriving an invariance principle for sums of high-dimensional random vectors, and also in establishing the validity of the multiplier bootstrap for inference in practice. We refer the reader to our companion paper [9], where the results established here are applied in several important statistical problems, particularly the analysis of Dantzig selector of [6] in the non-Gaussian setting.
The proof strategy for our anti-concentration inequalities is to directly bound the density function of the maximum of a Gaussian random vector. The paper by [20] is concerned with bounding such a density (see [20], Proposition 3.12) but under positive covariances restriction. This is related to but different from our anti-concentration bounds. The crucial assumption in their Proposition 3.12 is positivity of all the covariances between the coordinates in the Gaussian random vector, which does not hold in our targeted applications in high-dimensional statistics, for example, analysis of Danzig selector. Moreover, their upper bound on the density depends on the inverse of the lower bound on the covariances—and hence, for example, if there are two independent coordinates in the Gaussian random vector, then the upper bound becomes infinite. Our anti-concentration bounds do not require such positivity (or other) assumptions on covariances and hence are not implied by the results of [20]. Another method for deriving reverse isoperimetric inequalities is to use geometric results of [19], as shown by [13], which leads to dimension-dependent anti-concentration inequalities, which are essentially different from ours. Moreover, our density-bounding proof technique is substantially different from that of [20] based on Malliavin calculus or [19] based on geometric arguments.
The rest of the paper is organized as follows. In Sect. 2, we present comparison bounds for Gaussian random vectors and its application, namely the conditional multiplier central limit theorem. In Sect. 3, we present anti-concentration bounds for maxima of Gaussian random vectors. In Sects. 4 and 5, we give proofs of the theorems in Sects. 2 and 3. The Appendix contains a proof of a technical lemma.
Notation. Denote by \((\Omega ,{\mathcal {F}},{\text {P}})\) the underlying probability space. For \(a,b \in {\mathbb {R}}\), we write \(a_{+} = \max \{ 0,a \}\) and \(a \vee b = \max \{ a,b \}\). Let \(1(\cdot )\) denote the indicator function. The transpose of a vector \(z\) is denoted by \(z^{T}\). For a function \(g: {\mathbb {R}}\rightarrow {\mathbb {R}}\), we use the notation \(\Vert g \Vert _{\infty } = \sup _{z \in {\mathbb {R}}} | g(z) |\). Let \(\phi (\cdot )\) and \(\Phi (\cdot )\) denote the density and distribution functions of the standard Gaussian distribution, respectively: \(\phi (x) = (1/\sqrt{2 \pi }) e^{-x^{2}/2}\) and \(\Phi (x) = \int _{-\infty }^{x} \phi (t) dt\).
2 Comparison bounds and multiplier bootstrap
2.1 Comparison bounds
Let \(X =(X_{1},\dots ,X_{p})^{T}\) and \(Y=(Y_{1},\dots ,Y_{p})^{T}\) be centered Gaussian random vectors in \({\mathbb {R}}^{p}\) with positive semi-definite covariance matrices \(\Sigma ^{X} =(\sigma ^{X}_{jk})_{1 \le j,k \le p}\) and \(\Sigma ^{Y}=(\sigma ^{Y}_{jk})_{1 \le j,k \le p}\), respectively. The purpose of this section is to give error bounds on the difference of the expectations of smooth functions and the distribution functions of
in terms of \(p\),
The problem of comparing distributions of maxima is of intrinsic difficulty since the maximum function \(z=(z_{1},\dots ,z_{p})^{T} \mapsto \max _{1 \le j \le p} z_{j}\) is non-differentiable. To circumvent the problem, we use a smooth approximation of the maximum function. For \(z=(z_{1},\dots ,z_{p})^{T} \in {\mathbb {R}}^{p}\), consider the function:
which approximates the maximum function, where \(\beta > 0\) is the smoothing parameter that controls the level of approximation (we call this function the “smooth max function”). Indeed, an elementary calculation shows that for every \(z \in {\mathbb {R}}^{p}\),
This smooth max function arises in the definition of “free energy” in spin glasses. See, for example, [21, 27]. Here is the first theorem of this section.
Theorem 1
(Comparison bounds for smooth functions) For every \(g \in C^2({\mathbb {R}})\) with \(\Vert g' \Vert _{\infty } \vee \Vert g'' \Vert _{\infty } < \infty \) and every \(\beta >0\),
Proof
See Sect. 4.\(\square \)
Comment 1
Minimizing the second bound with respect to \(\beta > 0\), we have
This result extends the work of [7], which derived the following Sudakov–Fernique type bound on the expectation of the difference between two Gaussian maxima:
Theorem 1 is not applicable to functions of the form \(g(z) = 1(z \le x)\) and hence does not directly lead to a bound on the Kolmogorov distance between \(\max _{1 \le j \le p} X_{j}\) and \(\max _{1 \le j \le p} Y_{j}\) (recall that the Kolmogorov distance between (the distributions) of two real valued random variables \(\xi \) and \(\eta \) is defined by \(\sup _{x \in {\mathbb {R}}} | {\text {P}}( \xi \le x ) - {\text {P}}(\eta \le x) |\)). Nevertheless, we have the following bounds on the Kolmogorov distance. Recall \(a_p={\text {E}}[ \max _{1 \le j \le p} (Y_{j}/\sigma ^{Y}_{jj})]\).
Theorem 2
(Comparison of distributions) Suppose that \(p \ge 2\) and \(\sigma ^{Y}_{jj} > 0\) for all \(1 \le j \le p\). Then
where \(C>0\) depends only on \(\min _{1 \le j \le p} \sigma _{jj}^{Y}\) and \(\max _{1 \le j \le p} \sigma _{jj}^{Y}\) (the right side is understood to be \(0\) when \(\Delta = 0\)). Moreover, in the worst case, \(a_{p} \le \sqrt{2 \log p}\), so that
where as before \(C' > 0\) depends only on \(\min _{1 \le j \le p} \sigma _{jj}^{Y}\) and \(\max _{1 \le j \le p} \sigma _{jj}^{Y}\).
Proof
See Sect. 4.\(\square \)
The first bound (2) is generally sharper than the latter. To see this, consider the simple case where \(a_{p} = O(1)\) as \(p \rightarrow \infty \), which would happen, for example, when \(Y_{1},\dots ,Y_{p}\) come from discretization of a single continuous Gaussian process. Then the right side on (2) is \(o(1)\) if \(\Delta (\log p) \log \log p= o(1)\), while the second bound requires \(\Delta (\log p)^{2}= o(1)\).
Comment 2
(On the proof strategy) Bounding the Kolmogorov distance between \(\max _{1 \le j \le p}X_{j}\) and \(\max _{1 \le j \le p}Y_{j}\) is not immediate from Theorem 1 and this step relies on the anti-concentration inequality for the maximum of a Gaussian random vector, which we will study in Sect. 3. More formally, by smoothing the indicator and maximum functions, we obtain from Theorem 1 a bound of the following form:
where \( \mathcal {L}(\max _{1 \le j \le p} Y_{j},\epsilon )\) is the Lévy concentration function for \(\max _{1 \le j \le p}Y_{j}\) (see Definition 1 in Sect. 3 for the formal definition), and \(\beta ,\delta > 0\) are smoothing parameters (see Eqs. (10) and (11) in the proof of Theorem 2 given in Sect. 4 for the derivation of the above bound). The bound (2) then follows from bounding the Lévy concentration function by using the anti-concentration inequality derived in Sect. 3, and optimizing the bound with respect to \(\beta ,\delta \).
The proof of Theorem 2 is substantially different from the (“textbook”) proof of classical Slepian’s inequality. The simplest form of Slepian’s inequality states that
whenever \(\sigma _{jj}^{X} = \sigma _{jj}^{Y}\) and \(\sigma _{jk}^{X} \le \sigma _{jk}^{Y}\) for all \(1 \le j,k \le p\). This inequality is immediately deduced from the following expression:
where we assume \(\Sigma ^{X}\) and \(\Sigma ^{Y}\) are non-singular, and \(\sigma _{jj}^{X}=\sigma _{jj}^{Y}, 1 \le \forall j \le p\). Here \(f_{t}\) denotes the density function of \(N(0,t \Sigma ^{X} + (1-t) \Sigma ^{Y})\). See [14], p. 82, for this expression. The expression (3) is of importance and indeed a source of many interesting probabilistic results (see, for example, [18, 30] for recent related works). It is not clear (or at least non-trivial), however, whether a bound similar in nature to Theorem 2 can be deduced from the expression (3) when there is no restriction on the covariance matrices except for the condition that \(\sigma _{jj}^{X} = \sigma _{jj}^{Y},1 \le \forall j \le p\), and here we take the different route.
The key features of Theorem 2 are: (i) the bound on the Kolmogorov distance between the maxima of Gaussian random vectors in \({\mathbb {R}}^{p}\) depends on the dimension \(p\) only through \(\log p\) and the maximum difference of the covariance matrices \(\Delta \), and (ii) it allows for arbitrary covariance matrices for \(X\) and \(Y\) (except for the nondegeneracy condition that \(\sigma _{jj}^{Y} > 0, \ 1 \le \forall j \le p\)). These features have an important implication to statistical applications, as discussed below.
2.2 Conditional multiplier central limit theorem
Consider the following problem. Suppose that \(n\) independent centered random vectors in \({\mathbb {R}}^{p}\) of observations \(Z_{1},\dots ,Z_{n}\) are given. Here \(Z_{1},\dots ,Z_{n}\) are generally non-Gaussian, and the dimension \(p\) is allowed to increase with \(n\) (that is, the case where \(p=p_{n} \rightarrow \infty \) as \(n \rightarrow \infty \) is allowed). We suppress the possible dependence of \(p\) on \(n\) for the notational convenience. Suppose that each \(Z_{i}\) has a finite covariance matrix \({\text {E}}[ Z_{i} Z_{i}^{T} ]\). Consider the following normalized sum:
The problem here is to approximate the distribution of \(\max _{1 \le j \le p} S_{n,j}\).
Statistics of this form arise frequently in modern statistical applications. The exact distribution of \(\max _{1 \le j \le p} S_{n,j}\) is generally unknown. An intuitive idea to approximate the distribution of \(\max _{1 \le j \le p} S_{n,j}\) is to use the Gaussian approximation. Let \(V_{1},\dots ,V_{n}\) be independent Gaussian random vectors in \({\mathbb {R}}^{p}\) such that \(V_{i} \sim N(0,{\text {E}}[ Z_{i} Z_{i}^{T} ])\), and define
It is expected that the distribution of \(\max _{1 \le j \le p} T_{n,j}\) is close to that of \(\max _{1 \le j \le p} S_{n,j}\) in the following sense:
When \(p\) is fixed, (4) will follow from the classical Lindeberg–Feller central limit theorem, subject to the Lindeberg conditions. The recent paper by [9] established conditions under which this Gaussian approximation (4) holds even when \(p\) is comparable or much larger than \(n\). For example, [9] proved that if \(c_{1} \le n^{-1} \sum _{i=1}^{n} {\text {E}}[ Z_{ij}^{2} ] \le C_{1}\) and \({\text {E}}[ \exp (| Z_{ij} |/ C_{1} ) ] \le 2\) for all \(1 \le i \le n\) and \(1 \le j \le p\) for some \(0 < c_{1} < C_{1}\), then (4) holds as long as \(\log p = o(n^{1/7})\).
The Gaussian approximation (4) is in itself an important step, but in the general case where the covariance matrix \(n^{-1} \sum _{i=1}^{n} {\text {E}}[ Z_{i} Z_{i}^{T} ]\) is unknown, it is not directly applicable for purposes of statistical inference. In such cases, the following multiplier bootstrap procedure will be useful. Let \(\eta _{1},\dots ,\eta _{n}\) be independent standard Gaussian random variables independent of \(Z_{1}^{n} := \{ Z_{1},\dots ,Z_{n} \}\). Consider the following randomized sum:
Since conditional on \(Z_{1}^{n}\),
it is natural to expect that the conditional distribution of \(\max _{1 \le j \le p} S_{n,j}^{\eta }\) is “close” to the distribution of \(\max _{1 \le j \le p} T_{n,j}\) and hence that of \(\max _{1 \le j \le p} S_{n,j}\). Note here that the conditional distribution of \(S^{\eta }_{n}\) is completely known, which makes this distribution useful for purposes of statistical inference. The following proposition makes this intuition rigorous.
Proposition 1
(Conditional multiplier central limit theorem) Work with the setup as described above. Suppose that \(p \ge 2\) and there are some constants \(0 < c_{1} < C_{1}\) such that \(c_{1} \le n^{-1}\sum _{i=1}^{n} \mathrm {E}[Z_{ij}^{2}] \le C_{1}\) for all \(1 \le j \le p\). Moreover, suppose that \(\widehat{\Delta } := \max _{1 \le j,k \le p}|n^{-1}\sum _{i=1}^{n}(Z_{ij}Z_{ik} - \mathrm {E}[Z_{ij}Z_ {ik}] )|\) obeys the following conditions: as \(n \rightarrow \infty \),
Then we have
Here recall that \(p\) is allowed to increase with \(n\).
Proof
Follows immediately from Theorem 2.\(\square \)
We call this result a “conditional multiplier central limit theorem,” where the terminology follows that in empirical process theory. See [28], Chapter 2.9. The notable features of this proposition, which inherit from the features of Theorem 2 discussed above, are: (i) (6) can hold even when \(p\) is much larger than \(n\), and (ii) it allows for arbitrary covariance matrices for \(Z_{i}\) (except for the mild scaling condition that \(c_{1} \le n^{-1} \sum _{i=1}^{n} {\text {E}}[ Z_{ij}^{2} ] \le C_{1}\)). The second point is clearly desirable in statistical applications as the information on the true covariance structure is generally (but not always) unavailable. For the first point, we have the following estimate on \({\text {E}}[ \widehat{\Delta } ]\).
Lemma 1
Let \(p \ge 2\). There exists a universal constant \(C > 0\) such that
Proof
See the Appendix. \(\square \)
Hence with help of Lemma 2.2.2 in [28], we can find various primitive conditions under which (5) holds.
Example 1
Consider the following examples. Here for the sake of simplicity, we use the worst case bound \({\text {E}}[ \max _{1 \le j \le p} T_{n,j} ] \le \sqrt{2 C_{1} \log p}\), so that conditions (5) reduce to \(\widehat{\Delta } = o_{{\text {P}}}((\log p)^{-2})\).
Case a Suppose that \({\text {E}}[ \exp (| Z_{ij} |/ C_{1} ) ] \le 2\) for all \(1 \le i \le n\) and \(1 \le j \le p\) for some \(C_{1} > 0\). In this case, it is not difficult to verify that \(\widehat{\Delta } = o_{{\text {P}}}((\log p)^{-2})\) as soon as \(\log p = o(n^{1/5})\).
Case b Another type of \(Z_{ij}\) which arises in regression applications is of the form \(Z_{ij} = \varepsilon _{i} x_{ij}\) where \(\varepsilon _{i}\) are stochastic with \({\text {E}}[ \epsilon _{i} ] = 0\) and \(\max _{1 \le i \le n}{\text {E}}[ | \varepsilon _{i} |^{4q} ] = O(1)\) for some \(q \ge 1\), and \(x_{ij}\) are non-stochastic (typically, \(\varepsilon _{i}\) are “errors” and \(x_{ij}\) are “regressors”). Suppose that \(x_{ij}\) are normalized in such a way that \(n^{-1} \sum _{i=1}^{n} x_{ij}^{2} = 1\), and there are bounds \(B_{n} \ge 1\) such that \(\max _{1 \le i \le n} \max _{1 \le j \le p} | x_{ij} | \le B_{n}\), where we allow \(B_{n} \rightarrow \infty \). In this case, \(\widehat{\Delta } = o_{{\text {P}}}((\log p)^{-2})\) as soon as
since \(\max _{1 \le j \le p} ( n^{-1}\sum _{i=1}^{n} {\text {E}}[ (\varepsilon _{i} x_{ij})^{4} ]) \le B_{n}^{2} \max _{1 \le i \le n} {\text {E}}[ \varepsilon _{i}^{4} ] = O(B_{n}^{2})\) and \({\text {E}}[ \max _{1 \le i \le n} \max _{1 \le j \le p} (\varepsilon _{i} x_{ij})^{4} ] \le B_{n}^{4} {\text {E}}[\max _{1 \le i \le n} \varepsilon _{i}^{4} ] = O(n^{1/q} B_{n}^{4})\).
Importantly, in these examples, for (6) to hold, \(p\) can increase exponentially in some fractional power of \(n\).
3 Anti-concentration bounds
The following theorem provides bounds on the Lévy concentration function of the maximum of a Gaussian random vector in \({\mathbb {R}}^{p}\), where the terminology is borrowed from [23].
Definition 1
([23], Definition 3.1) The Lévy concentration function of a real valued random variable \(\xi \) is defined for \(\epsilon > 0\) as
Theorem 3
(Anti-concentration) Let \((X_{1},\dots ,X_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\) with \(\sigma ^{2}_{j} := \mathrm {E}[X_{j}^{2}] > 0\) for all \(1 \le j \le p\). Moreover, let \(\underline{\sigma } := \min _{1 \le j \le p} \sigma _{j}, \overline{\sigma } := \max _{1 \le j \le p} \sigma _{j}\), and \(a_{p} := \mathrm {E}[\max _{1 \le j \le p}(X_{j}/\sigma _{j})]\).
-
(i)
If the variances are all equal, namely \(\underline{\sigma } = \overline{\sigma } = \sigma \), then for every \(\epsilon > 0\),
$$\begin{aligned} \mathcal {L}\left( \max _{1 \le j \le p} X_{j},\epsilon \right) \le 4 \epsilon (a_p + 1)/\sigma . \end{aligned}$$ -
(ii)
If the variances are not equal, namely \(\underline{\sigma } < \overline{\sigma }\), then for every \(\epsilon > 0\),
$$\begin{aligned} \mathcal {L}\left( \max _{1 \le j \le p} X_{j},\epsilon \right) \le C \epsilon \{ a_p + \sqrt{1 \vee \log (\underline{\sigma }/\epsilon )} \}, \end{aligned}$$
where \(C>0\) depends only on \(\underline{\sigma }\) and \(\overline{\sigma }\).
Since \(X_{j}/\sigma _{j} \sim N(0,1)\), by a standard calculation, we have \(a_{p} \le \sqrt{2 \log p}\) in the worst case (see, for example, [27], Proposition 1.1.3), so that the following simpler corollary follows immediately from Theorem 3.
Corollary 1
Let \((X_{1},\dots ,X_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\) with \(\sigma ^{2}_{j} := \mathrm {E}[X_{j}^{2}] > 0\) for all \(1 \le j \le p\). Let \(\underline{\sigma } := \min _{1 \le j \le p} \sigma _{j}\) and \(\overline{\sigma } := \max _{1 \le j \le p} \sigma _{j}\). Then for every \(\epsilon > 0\),
where \(C>0\) depends only on \(\underline{\sigma }\) and \(\overline{\sigma }\). When \(\sigma _{j}\) are all equal, \(\log (p/\epsilon )\) on the right side can be replaced by \(\log p\).
Comment 3
(Anti-concentration vs. small ball probabilities) The problem of bounding the Lévy concentration function \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon )\) is qualitatively different from the problem of bounding \({\text {P}}( \max _{1 \le j \le p} | X_{j} | \le x )\). For a survey on the latter problem, called the “small ball problem”, we refer the reader to [17].
Comment 4
(Concentration vs. anti-concentration) Concentration inequalities refer to inequalities bounding \({\text {P}}(| \xi - x | > \epsilon )\) for a random variable \(\xi \) (typically \(x\) is the mean or median of \(\xi \)). See the monograph [15] for a study of the concentration of measure phenomenon. Anti-concentration inequalities in turn refer to reverse inequalities, that is, inequalities bounding \({\text {P}}(| \xi - x | \le \epsilon )\). Theorem 3 provides anti-concentration inequalities for \(\max _{1 \le j \le p} X_{j}\). [29] remarked that “concentration is better understood than anti-concentration”. In the present case, the Gaussian concentration inequality (see [15], Theorem 7.1) states that
where the mean can be replace by the median. This inequality is well known and dates back to [4] and [26]. To the best of our knowledge, however, the reverse inequalities in Theorem 3 were not known and are new.
Comment 5
(Anti-concentration for maximum of moduli, \(\max _{1 \le j \le p} | X_{j} |\)) Versions of Theorem 3 and Corollary 1 continue to hold for \(\max _{1 \le j \le p} | X_{j} |\). That is, for example, when \(\sigma _{j}\) are all equal (\(\sigma _{j} = \sigma \)), \( \mathcal {L}(\max _{1 \le j \le p} | X_{j} |, \epsilon ) \le 4 ( a_{p}'+1)/\sigma \), where \(a_{p}' := {\text {E}}[ \max _{1 \le j \le p} | X_{j} |/\sigma ]\). To see this, observe that \(\max _{1 \le j \le p} | X_{j} | = \max _{1 \le j \le 2p} X_{j}'\) where \(X_{j}'=X_{j}\) for \(j=1,\dots ,p\) and \(X_{p+j}' = - X_{j}\) for \(j=1,\dots ,p\). Hence we may apply Theorem 3 to \((X_{1}',\dots ,X_{2p}')^{T}\) to obtain the desired conclusion.
Comment 6
(A sketch of the proof of Theorem 3) For the reader’s convenience, we provide a sketch of the proof of Theorem 3. We focus here on the simple case where all the variances are equal to one \((\sigma _{1}=\cdots =\sigma _{p}=1\)). Then the distribution of \(Z=\max _{1 \le j \le p} X_{j}\) is absolutely continuous and its density can be written as \(\phi (z) G(z)\) where the map \(z \mapsto G(z)\) is non-decreasing. Consequently, it is then not difficult to see that \(G(z) \le {\text {P}}(Z > z)/\{ 1- \Phi (z) \} \le 2 (z \vee 1) e^{-(z-a_{p})_{+}^{2}/2}/\phi (z)\), where the second inequality follows from Mill’s inequality combined with the Gaussian concentration inequality. Hence the density of \(Z\) is bounded by \(2 (z \vee 1) e^{-(z-a_{p})_{+}^{2}/2}\), which immediately leads to the bound \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon ) \le 4 (a_{p}+1)\epsilon \).
In a trivial example where \(p=1\), it is immediate to see that \({\text {P}}( | X_{1} - x| \le \epsilon ) \le \epsilon \sqrt{2/(\pi \sigma _{1}^{2})}\). A non-trivial case is the situation where \(p \rightarrow \infty \). In such a case, it is typically not known whether \(\max _{1 \le j \le p} X_{j}\) has a limiting distribution as \(p \rightarrow \infty \), even after normalization (recall that except for \(\underline{\sigma } > 0\), we allow for general covariance structures between \(X_{1},\dots ,X_{p}\)), and therefore it is not trivial at all whether, for every sequence \(\epsilon = \epsilon _{p} \rightarrow 0\) (or at some rate), \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon ) \rightarrow 0\) or how fast \(\epsilon = \epsilon _{p} \rightarrow 0\) should be to guarantee that \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon ) \rightarrow 0\). Theorem 3 answers this question with explicit, non-asymptotic bounds.
Importantly, the bounds in Theorem 3 are dimension-free in the sense that, similar to the Gaussian concentration inequality, they depend on the dimension \(p\) only through \(a_{p}\)—the expectation of the maximum of the (normalized) Gaussian random vector. Hence Theorem 3 admits direct extensions to the infinite dimensional case, namely separable Gaussian processes, as long as the corresponding expectation is finite. See our companion paper [10] for formal treatments and applications of this extension.
The presence of \(a_{p}\) on the bounds is essential and can not be removed in general, as the following example suggests. This shows that there does not exist a substantially sharper estimate of the universal bound of the concentration function than that given in Theorem 3. Potentially, there could be refinements but they would have to rely on the particular (hence non-universal) features of the covariance structure between \(X_{1},\dots ,X_{p}\).
Example 2
(Partial converse of Theorem 3) Let \(X_{1},\dots ,X_{p}\) be independent standard Gaussian random variables. By Theorem 1.5.3 of [14], as \(p \rightarrow \infty \),
where
and \(G(0,1)\) denotes the standard Gumbel distribution, that is, the distribution having the density \(g(x) = e^{-x} e^{-e^{-x}}\) for \(x \in \mathbb {R}\). In fact, we can show that the density of \(b_{p} (\max _{1 \le j \le p}X_{j} - d_{p})\) converges to that of \(G(0,1)\) locally uniformly. To see this, we begin with noting that the density of \(b_{p} (\max _{1 \le j \le p}X_{j} - d_{p})\) is given by
Pick any \(x \in \mathbb {R}\). Since, by the weak convergence result (7),
we have \([\Phi (d_{p}+b_{p}^{-1}x)]^{p-1} \rightarrow e^{-e^{-x}}\). Hence it remains to show that
Taking the logarithm of the left side yields
Expanding \((d_{p}+b_{p}^{-1}x)^{2}\) gives that
by which we have (8) \(= -x + o(1)\). This shows that \(g_{p}(x) \rightarrow g(x)\) for all \(x \in {\mathbb {R}}\). Moreover, this convergence takes place locally uniformly in \(x\), that is, for every \(K > 0\), \(g_{p}(x) \rightarrow g(x)\) uniformly in \(x \in [-K, K]\).
On the other hand, the density of \(\max _{1 \le j \le p} X_{j}\) is given by \(f_{p}(x) = p \phi (x) [ \Phi (x) ]^{p-1}\). By this form, for every \(K > 0\), there exist a constant \(c > 0\) and a positive integer \(p_{0}\) depending only on \(K\) such that for \(p \ge p_{0}\),
which shows that for \(p \ge p_{0}\),
Therefore, we conclude that for \(p \ge p_{0}\),
By the Gaussian maximal inequality and Lemma 2.3.15 of [11], we have
Hence, by the previous result, for every \(K' > 0\), there exist a constant \(c' > 0\) and a positive integer \(p_{0}'\) depending only on \(K'\) such that for \(p \ge p_{0}'\), \(a_{p} \ge 1\) and
\(\square \)
4 Proofs for Section 2
4.1 Proof of Theorem 1
Here for a smooth function \(f: {\mathbb {R}}^{p} \rightarrow {\mathbb {R}}\), we write \(\partial _{j} f (z) = \partial f(z)/\partial z_{j}\) for \(z = (z_{1},\dots ,z_{p})^{T}\). We shall use the following version of Stein’s identity.
Lemma 2
(Stein’s identity) Let \(W =(W_{1},\dots ,W_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\). Let \(f: {\mathbb {R}}^{p} \rightarrow {\mathbb {R}}\) be a \(C^1\)-function such that \(\mathrm {E}[| \partial _{j}f(W) |] < \infty \) for all \(1 \le j \le p\). Then for every \(1 \le j \le p\),
Proof of Lemma 2
See Section A.6 of [27]; also [8] and [25].
We will use the following properties of the smooth max function.
Lemma 3
For every \(1 \le j,k \le p\),
where
Moreover,
Proof of Lemma 3
The first property was noted in [7]. The other properties follow from a direct calculation.\(\square \)
Lemma 4
Let \(m := g \circ F_{\beta }\) with \(g \in C^{2}({\mathbb {R}})\). Then for every \(1 \le j,k \le p\),
where \(\pi _{j}\) and \(w_{jk}\) are defined in Lemma 3.
Proof of lemma 4
The proof follows from a direct calculation.\(\square \)
Proof of Theorem 1
Without loss of generality, we may assume that \(X\) and \(Y\) are independent, so that \({\text {E}}[X_{j} Y_k]=0\) for all \(1 \le j,k \le p\). Consider the following Slepian interpolation between \(X\) and \(Y\):
Let \(m := g \circ F_\beta \) and \(\Psi (t):={\text {E}}[m (Z(t))]\). Then
Here we have
where the second equality follows from applying Lemma 2 to \(W = (t^{-1/2}X_{j}-(1-t)^{-1/2}Y_{j}, Z(t)^{T})^{T}\) and \(f(W)=\partial _{j} m(Z(t))\). Hence
By Lemmas 3 and 4,
Therefore, we have
which leads to the first assertion. The second assertion follows from the inequality (1). This completes the proof.\(\square \)
4.2 Proof of Theorem 2
The final assertion follows from the inequality \(a_{p} \le \sqrt{2 \log p}\) (see, for example, [27], Proposition 1.1.3). Hence we prove (2). We first note that we may assume that \(0 < \Delta < 1\) since otherwise the proof is trivial (take \(C \ge 2\) in (2)). In what follows, let \(C>0\) be a generic constant that depends only on \(\min _{1 \le j \le p} \sigma _{jj}^{Y}\) and \(\max _{1 \le j \le p} \sigma _{jj}^{Y}\), and its value may change from place to place. For \(\beta > 0\), define \(e_{p,\beta } := \beta ^{-1} \log p\). Consider and fix a \(C^{2}\)-function \(g_0:{\mathbb {R}}\rightarrow [0,1]\) such that \(g_0(t)=1\) for \(t \le 0\) and \(g_0(t)=0\) for \(t \ge 1\). For example, we may take
For given \(x \in {\mathbb {R}}, \beta > 0\) and \(\delta > 0\), define \(g_{x,\beta ,\delta } (t)=g_0(\delta ^{-1}(t-x-e_{p,\beta }))\). For this function \(g_{x,\beta ,\delta }, \Vert g_{x,\beta ,\delta }' \Vert _{\infty } = \delta ^{-1} \Vert g_{0}' \Vert _{\infty }\) and \(\Vert g_{x,\beta ,\delta }'' \Vert _{\infty } = \delta ^{-2} \Vert g_{0}'' \Vert _{\infty }\). Moreover,
For arbitrary \(x \in {\mathbb {R}}, \beta > 0\) and \(\delta > 0\), observe that
where the first inequality follows from the inequality (1), the second from the inequality (9), the third from Theorem 1, the fourth from the inequality (9), and the last from the inequality (1). We wish to compare \({\text {P}}(\max _{1 \le j \le p} Y_{j} \le x+e_{p,\beta }+\delta ) \) with \({\text {P}}(\max _{1 \le j \le p} Y_{j} \le x ) \), and this is where the anti-concentration inequality plays its role. By Theorem 3, we have
Therefore,
Take \(a = a_{p} \vee \log ^{1/2}(1/\Delta )\), and choose \(\beta \) and \(\delta \) in such a way that
Recall that \(p \ge 2\) and \(0 < \Delta < 1\). Observe that \((\delta ^{-2}+\beta \delta ^{-1})\Delta \le C \Delta ^{1/3} (1 \vee a)^{2/3} \log ^{1/3} p, (e_{p,\beta }+\delta ) (1 \vee a_{p}) \le C \Delta ^{1/3} (1 \vee a)^{2/3} \log ^{1/3} p\), and since \(\delta \ge \Delta ^{1/3} (1 \vee a)^{-1/3}\), we have \(\log (1/\delta ) \le (1/3) \log ((1 \vee a)/\Delta )\). Hence the right side on (12) is bounded by \(C \Delta ^{1/3} \{ (1 \vee a)^{2/3} \log ^{1/3} p + (1 \vee a)^{-1/3} ( \log ^{1/3} p ) \log ^{1/2}((1 \vee a)/\Delta ) \}\). In addition, \((1 \vee a)^{-1} \log ^{1/2} (1 \vee a)\) is bounded by a universal constant, so that
The second term inside the bracket is bounded by \(( \log ^{1/3} p ) \log ^{1/3}(1/\Delta )\) as \((1 \vee a)^{-1/3} \le (\log (1/\Delta ))^{-1/6}\), and first term is bound by \((1 \vee a_{p})^{2/3} \log ^{1/3} p + ( \log ^{1/3} p ) \log ^{1/3}(1/\Delta )\). Adjusting the constant \(C\), the right side on the above displayed equation is bounded by \(C \Delta ^{1/3} \{ 1 \vee a_{p}^{2} \vee \log (1/\Delta ) \}^{1/3} \log ^{1/3}p\).
For the opposite direction, observe that
The rest of the proof is similar and hence omitted.\(\square \)
5 Proof of Theorem 3
The proof of Theorem 3 uses some properties of Gaussian measures. We begin with preparing technical tools. The following two facts were essentially noted in [31, 32] (note: [31] and [32] did not contain a proof of Lemma 5, which we find non-trivial). For the sake of completeness, we give their proofs after the proof of Theorem 3.
Lemma 5
Let \((W_{1},\dots ,W_{p})^{T}\) be a (not necessarily centered) Gaussian random vector in \({\mathbb {R}}^{p}\) with \({{\mathrm{Var}}}(W_{j}) = 1\) for all \(1 \le j \le p\). Suppose that \({{\mathrm{Corr}}}(W_{j},W_{k})<1\) whenever \(j \ne k\). Then the distribution of \(\max _{1 \le j \le p} W_{j}\) is absolutely continuous with respect to the Lebesgue measure and a version of the density is given by
Lemma 6
Let \((W_{0},W_{1},\dots ,W_{p})^{T}\) be a (not necessarily centered) Gaussian random vector with \({{\mathrm{Var}}}(W_{j}) =1 \) for all \(0 \le j \le p\). Suppose that \(\mathrm {E}[W_{0} ]\ge 0\). Then the map
is non-decreasing on \({\mathbb {R}}\).
Let us also recall (a version of) the Gaussian concentration (more precisely, deviation) inequality. See, for example, [15], Theorem 7.1, for its proof.
Lemma 7
Let \((X_{1},\dots ,X_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\) with \(\max _{1 \le j \le p} \mathrm {E}[X_{j}^2] \le \sigma ^{2}\) for some \(\sigma ^{2}> 0\). Then for every \(r > 0\),
We are now in position to prove Theorem 3.
Proof of Theorem 3
The proof consists of three steps.
Step 1. This step reduces the analysis to the unit variance case. Pick any \(x\ge 0\). Let \(W_{j} :=(X_{j}-x)/\sigma _{j}+x/\underline{\sigma }\). Then \({\text {E}}[W_{j}]\ge 0\) and \({{\mathrm{Var}}}(W_{j})=1\). Define \(Z:= \max _{1 \le j \le p} W_{j}\). Then we have
Step 2. This step bounds the density of \(Z\). Without loss of generality, we may assume that \({{\mathrm{Corr}}}(W_{j},W_{k})<1\) whenever \(j \ne k\). Since the marginal distribution of \(W_{j}\) is \(N(\mu _{j},1)\) where \(\mu _{j}:={\text {E}}[W_{j}]=(x/\underline{\sigma }-x/\sigma _j)\ge 0\), by Lemma 5, \(Z\) has density of the form
where the map \(z \mapsto G_{p}(z)\) is non-decreasing by Lemma 6. Define \(\bar{z}:= (1/\underline{\sigma }-1/\overline{\sigma })x\), so that \(\mu _{j}\le \bar{z}\) for every \(1 \le j \le p\). Moreover, define \(\bar{Z}:=\max _{1 \le j \le p}(W_{j}-\mu _{j})\). Then
where the last inequality is due to the Gaussian concentration inequality (Lemma 7). Note that \(W_{j}-\mu _{j}=X_{j}/\sigma _{j}\), so that
Therefore, for every \(z \in {\mathbb {R}}\),
Mill’s inequality states that for \(z>0\),
and in particular \((1+z^2)/z^2 \le 2\) when \(z>1\). Moreover, \(\phi (z)/\{ 1-\Phi (z) \}\le 1.53 \le 2 \) for \(z \in (-\infty , 1)\). Therefore,
Hence we conclude from this, (16), and (15) that
Step 3. By Step 2, for every \(y \in {\mathbb {R}}\) and \(t > 0\), we have
where the last inequality follows from the fact that the map \(z \mapsto z e^{-(z-a)^{2}/2}\) (with \(a > 0\)) is non-increasing on \([a+1,\infty )\). Combining this inequality with Step 1, for every \(x \ge 0\) and \(\epsilon > 0\), we have
This inequality also holds for \(x<0\) by the similar argument, and hence it holds for every \(x \in {\mathbb {R}}\).
If \(\underline{\sigma }=\overline{\sigma } =\sigma \), then we have
which leads to the first assertion of the theorem.
On the other hand, consider the case where \(\underline{\sigma } < \overline{\sigma }\). Suppose first that \(0 < \epsilon \le \underline{\sigma }\). By the Gaussian concentration inequality (Lemma 7), for \(|x|\ge \epsilon +\overline{\sigma }(a_p+\sqrt{2\log (\underline{\sigma }/\epsilon )})\), we have
For \(|x| \le \epsilon +\overline{\sigma }(a_p+\sqrt{2\log (\underline{\sigma }/\epsilon )})\), by (17) and using \(\epsilon \le \underline{\sigma }\), we have
Combining (18) and (19), we obtain the inequality in (ii) for \(0 < \epsilon \le \underline{\sigma }\) (with a suitable choice of \(C\)). If \(\epsilon > \underline{\sigma }\), the inequality in (ii) trivially follows by taking \(C \ge 1/\underline{\sigma }\). This completes the proof. \(\square \)
Proof of Lemma 5
Let \(M := \max _{1 \le j \le p} W_{j}\). The absolute continuity of the distribution of \(M\) is deduced from the fact that \({\text {P}}(M \in A) \le \sum _{j=1}^{p} {\text {P}}(W_{j} \in A)\) for every Borel measurable subset \(A\) of \({\mathbb {R}}\). Hence, to show that a version of the density of \(M\) is given by (13), it is enough to show that \(\lim _{\epsilon \downarrow 0} \epsilon ^{-1} {\text {P}}(x < M \le x+\epsilon )\) equals the right side on (13) for a.e. \(x \in {\mathbb {R}}\).
For every \(x \in {\mathbb {R}}\) and \(\epsilon > 0\), observe that
Note that the events \(A^{x,\epsilon }_{1},A^{x,\epsilon }_{2},\dots ,A^{x,\epsilon }_{p}\) are disjoint. For \(A^{x,\epsilon }_{1}\), since
where the events on the right side are disjoint, we have
where \(\mu _{i} := {\text {E}}[ W_{i} ]\). We show that for every \(1 \le i \le p\) and a.e. \(x \in {\mathbb {R}}\), the map \(u \mapsto {\text {P}}(W_{j} \le x, \forall j \ne i \mid W_{i} = u)\) is right continuous at \(x\). Let \(X_{j}=W_{j}-\mu _{j}\) so that \(X_{j}\) are standard Gaussian random variables. Then
Pick \(i=1\). Let \(V_{j} = X_{j} - {\text {E}}[ X_{j} X_{1}] X_{1}\) be the residual from the orthogonal projection of \(X_{j}\) on \(X_{1}\). Note that the vector \((V_{j})_{2 \le j \le p}\) and \(X_{1}\) are jointly Gaussian and uncorrelated, and hence independent, by which we have
Define \(J := \{ j \in \{ 2,\dots ,p \} : {\text {E}}[ X_{j} X_{1}] \le 0 \}\) and \(J^{c} := \{ 2,\dots , p\} \backslash J\). Then
where \(x_j=x-\mu _{j}-{\text {E}}[X_{j}X_1](x-\mu _{1})\). Here each \(V_{j}\) either degenerates to \(0\) (which occurs only when \(X_{j}\) and \(X_{1}\) are perfectly negatively correlated, that is, \({\text {E}}[ X_{j} X_{1} ] = -1\)) or has a non-degenerate Gaussian distribution, and hence for every \(x \in {\mathbb {R}}\) except for at most \((p-1)\) points \((\mu _{1}+\mu _{j})/2, 2 \le j \le p\),
Hence for \(i=1\) and a.e. \(x \in {\mathbb {R}}\), the map \(u \mapsto {\text {P}}(W_{i} \le x, \forall j \ne i \mid W_{i} = u)\) is right continuous at \(x\). The same conclusion clearly holds for \(2 \le i \le p\). Therefore, we conclude that, for a.e. \(x \in {\mathbb {R}}\), as \(\epsilon \downarrow 0\),
In the rest of the proof, we show that, for every \(2 \le i \le p\) and \(x \in {\mathbb {R}}\), \({\text {P}}(A_{i}^{x,\epsilon }) = o(\epsilon )\) as \(\epsilon \downarrow 0\), which leads to the desired conclusion. Fix any \(2 \le i \le p\). The probability \({\text {P}}(A_{i}^{x,\epsilon })\) is bounded by a sum of terms of the form \({\text {P}}(x < W_{j} \le x+\epsilon , x < W_{k} \le x+\epsilon )\) with \(j \ne k\). Recall that \({{\mathrm{Corr}}}(W_{j},W_{k})<1\). Assume that \({{\mathrm{Corr}}}(W_{j},W_{k})=-1\). Then for every \(x \in {\mathbb {R}}\), \({\text {P}}(x < W_{j} \le x+\epsilon , x < W_{k} \le x+\epsilon )\) is zero for sufficiently small \(\epsilon \). Otherwise, \((W_{j},W_{k})^{T}\) obeys a two-dimensional, non-degenerate Gaussian distribution and hence \({\text {P}}(x < W_{j} \le x+\epsilon , x < W_{k} \le x+\epsilon ) = O(\epsilon ^{2}) = o(\epsilon )\) as \(\epsilon \downarrow 0\) for every \(x \in {\mathbb {R}}\). This completes the proof.
Proof of Lemma 6
Since \({\text {E}}[ W_{0} ] \ge 0\), the map \(x \mapsto \exp \{ \mathrm {E}[W_{0}]x - (\mathrm {E}[W_{0}])^{2}/2 \}\) is non-decreasing. Thus it suffices to show that the map
is non-decreasing. As in the proof of Lemma 5, let \(X_{j}=W_{j}-{\text {E}}[ W_{j} ]\) and let \(V_{j} = X_{j} - {\text {E}}[ X_{j} X_{0}] X_{0}\) be the residual from the orthogonal projection of \(X_{j}\) on \(X_{0}\). Note that the vector \((V_{j})_{1 \le j \le p}\) and \(X_{0}\) are independent. Hence the probability in (20) equals
where the latter is non-decreasing in \(x\) on \({\mathbb {R}}\) since \({\text {E}}[X_{j}X_{0}] \le 1\).
References
Adler, R., Taylor, J.: Random Fields and Geometry. Springer, New York (2007)
Ball, K.: The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geom. 10, 411–420 (1993)
Bentkus, V.: On the dependence of the Berry-Esseen bound on dimension. J. Stat. Plann. Infer. 113, 385–402 (2003)
Borell, C.: The Brunn–Minkowski inequality in Gauss space. Invent. Math. 30, 205–216 (1975)
Bühlmann, P., van de Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, New York (2011)
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann. Stat. 35, 2313–2351 (2007)
Chatterjee, S.: An error bound in the Sudakov–Fernique inequality (2005). arXiv:math/0510424
Chen, L., Goldstein, L., Shao, Q.-M.: Normal Approximation by Stein’s Method. Springer, New York (2011)
Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41, 2786–2819 (2013a)
Chernozhukov, V., Chetverikov, D., Kato, K.: Anti-concentration and honest, adaptive confidence bands, (2013b). arXiv:1303:7152
Dudley, R.M.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)
Gordon, Y.: Some inequalities for Gaussian processes and applications. Israel J. Math. 50, 265–289 (1985)
Klivans, A.R., O’Donnell, R., Servedio, R.A.: Learning geometric concepts via Gaussian surface area. In: Proceedings of 49th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 541–550 (2008)
Leadbetter, M., Lindgren, G., Rootzén, H.: Extremes and Related Properties of Random Sequences and Processes. Springer, New York (1983)
Ledoux, M.: Concentration of Measure Phenomenon. American Mathematical Society (2001)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces. Springer, New York (1991)
Li, W., Shao, Q.-M.: Gaussian processes: inequalities, small ball probabilities and applications. In: Handbook of Statistics, 19, pp. 533–597. North-Holland (2001)
Li, W., Shao, Q.-M.: A normal comparison inequality and applications. Probab. Theory Relat. Fields 122, 494–508 (2002)
Nazarov, F.: On the maximal perimeter of a convex set in \(\mathbb{R}^n\) with respect to a Gaussian measure. In: Geometric Aspects of Functional Analysis, Lecture Notes in Mathematics 1807, pp. 169–187. Springer, New York (2003)
Nourdin, I., Viens, F.G.: Density formula and concentration inequalities with Malliavin calculus. Electron. J. Probab. 14, 2287–2309 (2009)
Panchenko, D.: The Sherrington–Kirkpatrick Model. Springer, New York (2013)
Rudelson, M., Vershynin, R.: The Littlewood–Offord problem and invertibility of random matrices. Adv. Math. 218, 600–633 (2008)
Rudelson, M., Vershynin, R.: Smallest singular value of a random rectangular matrix. Commun. Pure Appl. Anal. 62, 1707–1739 (2009)
Slepian, D.: The one-sided barrier problem for Gaussian noise. Bell Syst. Tech. J. 41, 463–501 (1962)
Stein, C.: Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9, 1135–1151 (1981)
Sudakov, V.N., Tsirel’son, B.S.: Extremal properties of half-spaces for spherically invariant measures. J. Soviet Math. 9, 9–18 (1978)
Talagrand, M.: Spin Glasses: A Challenge for Mathematicians. Springer, New York (2003)
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York (1996)
Vershynin, R., Rudelson, M.: Anti-concentration inequalities. In: Conference Proceeding for Phenomena in High Dimensions, Samos, Greece (2007)
Yan, L.: Comparison inequalities for one sided normal probabilities. J. Theor. Probab. 22, 827–836 (2009)
Ylvisaker, D.: The expected number of zeros of a stationary Gaussian process. Ann. Math. Stat. 36, 1043–1046 (1965)
Ylvisaker, D.: A note on the absence of tangencies in Gaussian sample paths. Ann. Math. Stat. 39, 261–262 (1968)
Acknowledgments
V. Chernozhukov and D. Chetverikov are supported by a National Science Foundation grant. K. Kato is supported by the Grant-in-Aid for Young Scientists (B) (22730179, 25780152), the Japan Society for the Promotion of Science. We would like to thank the Editors and an anonymous referee for their careful review.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Proof of Lemma 1
Appendix A: Proof of Lemma 1
Lemma 1 follows from the following maximal inequality and Hölder’s inequality. Here we write \(a \lesssim b\) if \(a\) is smaller than or equal to \(b\) up to a universal positive constant.
Lemma 8
Let \(Z_{1},\dots ,Z_{n}\) be independent random vectors in \({\mathbb {R}}^{p}\) with \(p \ge 2\). Define \(M := \max _{1 \le i \le n} \max _{1 \le j \le p} | Z_{ij} |\) and \(\sigma ^{2} := \max _{1 \le j \le p} \sum _{i=1}^{n}\mathrm {E}[Z_{ij}^{2}]\). Then
We shall use the following lemma.
Lemma 9
Let \(V_{1},\dots ,V_{n}\) be independent random vectors in \({\mathbb {R}}^{p}\) with \(p \ge 2\) such that \(V_{ij} \ge 0\) for all \(1 \le i \le n\) and \(1 \le j \le p\). Then
Proof of Lemma 9
We make use of the symmetrization technique. Let \(\varepsilon _{1},\dots ,\varepsilon _{n}\) be independent Rademacher random variables (that is, \({\text {P}}(\varepsilon _{i} = 1) = {\text {P}}( \varepsilon _{i} = -1) = 1/2\)) independent of \(V_{1}^{n} := \{ V_{1},\dots ,V_{n} \}\). Then by the triangle inequality and Lemma 2.3.1 in [28],
By Lemmas 2.2.2 and 2.2.7 in [28], we have
where \(B := \max _{1 \le i \le n} \max _{1 \le j \le p} V_{ij}\). Hence by Fubini’s theorem and the Cauchy-Schwarz inequality,
Therefore, we have
Solving this inequality, we conclude that \(I \lesssim a + b^{2}\).\(\square \)
Proof of Lemma 8
Let \(\varepsilon _{1},\dots ,\varepsilon _{n}\) be independent Rademacher random variables independent of \(Z_{1},\dots ,Z_{n}\). Then arguing as in the previous proof, we have
By Lemma 9 applied to \(V_{ij} = Z_{ij}^{2}\), we have
This implies the desired conclusion.\(\square \)
Rights and permissions
About this article
Cite this article
Chernozhukov, V., Chetverikov, D. & Kato, K. Comparison and anti-concentration bounds for maxima of Gaussian random vectors. Probab. Theory Relat. Fields 162, 47–70 (2015). https://doi.org/10.1007/s00440-014-0565-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-014-0565-9
Keywords
- Slepian inequality
- Anti-concentration
- Lévy concentration function
- Maximum of Gaussian random vector
- Conditional multiplier central limit theorem