1 Introduction

We derive a bound on the difference in expectations of smooth functions of maxima of finite dimensional Gaussian random vectors. We also derive a bound on the Kolmogorov distance between distributions of these maxima. The key property of these bounds is that they depend on the dimension \(p\) of Gaussian random vectors only through \(\log p\), and on the max norm of the difference between the covariance matrices of the vectors. These results extend and complement the work of [7] that derived an explicit Sudakov–Fernique type bound on the difference of expectations of maxima of Gaussian random vectors. See also [1], Chapter 2. As an application, we establish a conditional multiplier central limit theorem for maxima of sums of independent random vectors where the dimension of the vectors is possibly much larger than the sample size. In all these results, we allow for arbitrary covariance structures between the coordinates in random vectors, which is plausible especially in applications to high-dimensional statistics. We stress that the derivation of bounds on the Kolmogorov distance is by no means trivial and relies on a new anti-concentration inequality for maxima of Gaussian random vectors, which is another main result of this paper (see Comment 4 for what anti-concentration inequalities here precisely refer to and how they differ from the concentration inequalities). These anti-concentration bounds are non-trivial in the following sense: (i) they apply to every dimension \(p\) and they are dimension-free in the sense that the bounds depend on the dimension \(p\) only through the expectation of the maximum of the Gaussian random vector, thereby admitting direct extensions to the infinite dimensional case, namely, separable Gaussian processes (see [10] for this extension and applications to empirical processes). This dimension-free nature is parallel to the Gaussian concentration inequality, which states that the supremum concentrates around the expected supremum. (ii) They allow for arbitrary covariance structures between the coordinates in Gaussian random vectors, and (iii) they are sharp in the sense that there is an example for which the bound is tight up to a dimension independent constant. We note that these anti-concentration bounds are sharper than those that result from application of the universal reverse isoperimetric inequality of [2] (see also [3], pp. 386–367).

Comparison inequalities for Gaussian random vectors play an important role in probability theory, especially in empirical process and extreme value theories. We refer the reader to [7, 12, 14, 1618, 24], and [30] for standard references on this topic. The anti-concentration phenomenon has attracted considerable interest in the context of random matrix theory and the Littlewood–Offord problem in number theory. See, for example, [22, 23], and [29] who remarked that concentration is better understood than anti-concentration. Those papers were concerned with the anti-concentration in the Euclidean norm for sums of independent random vectors, and the topic and the proof technique here are substantially different from theirs.

Either of the comparison or anti-concentration bounds derived in the paper have many immediate statistical applications, especially in the context of high-dimensional statistical inference, where the dimension \(p\) of vectors of interest is much larger than the sample size (see [5] for a textbook treatment of the recent developments of high-dimensional statistics). In particular, the results established here are helpful in deriving an invariance principle for sums of high-dimensional random vectors, and also in establishing the validity of the multiplier bootstrap for inference in practice. We refer the reader to our companion paper [9], where the results established here are applied in several important statistical problems, particularly the analysis of Dantzig selector of [6] in the non-Gaussian setting.

The proof strategy for our anti-concentration inequalities is to directly bound the density function of the maximum of a Gaussian random vector. The paper by [20] is concerned with bounding such a density (see [20], Proposition 3.12) but under positive covariances restriction. This is related to but different from our anti-concentration bounds. The crucial assumption in their Proposition 3.12 is positivity of all the covariances between the coordinates in the Gaussian random vector, which does not hold in our targeted applications in high-dimensional statistics, for example, analysis of Danzig selector. Moreover, their upper bound on the density depends on the inverse of the lower bound on the covariances—and hence, for example, if there are two independent coordinates in the Gaussian random vector, then the upper bound becomes infinite. Our anti-concentration bounds do not require such positivity (or other) assumptions on covariances and hence are not implied by the results of [20]. Another method for deriving reverse isoperimetric inequalities is to use geometric results of [19], as shown by [13], which leads to dimension-dependent anti-concentration inequalities, which are essentially different from ours. Moreover, our density-bounding proof technique is substantially different from that of [20] based on Malliavin calculus or [19] based on geometric arguments.

The rest of the paper is organized as follows. In Sect. 2, we present comparison bounds for Gaussian random vectors and its application, namely the conditional multiplier central limit theorem. In Sect. 3, we present anti-concentration bounds for maxima of Gaussian random vectors. In Sects. 4 and 5, we give proofs of the theorems in Sects. 2 and 3. The Appendix contains a proof of a technical lemma.

Notation. Denote by \((\Omega ,{\mathcal {F}},{\text {P}})\) the underlying probability space. For \(a,b \in {\mathbb {R}}\), we write \(a_{+} = \max \{ 0,a \}\) and \(a \vee b = \max \{ a,b \}\). Let \(1(\cdot )\) denote the indicator function. The transpose of a vector \(z\) is denoted by \(z^{T}\). For a function \(g: {\mathbb {R}}\rightarrow {\mathbb {R}}\), we use the notation \(\Vert g \Vert _{\infty } = \sup _{z \in {\mathbb {R}}} | g(z) |\). Let \(\phi (\cdot )\) and \(\Phi (\cdot )\) denote the density and distribution functions of the standard Gaussian distribution, respectively: \(\phi (x) = (1/\sqrt{2 \pi }) e^{-x^{2}/2}\) and \(\Phi (x) = \int _{-\infty }^{x} \phi (t) dt\).

2 Comparison bounds and multiplier bootstrap

2.1 Comparison bounds

Let \(X =(X_{1},\dots ,X_{p})^{T}\) and \(Y=(Y_{1},\dots ,Y_{p})^{T}\) be centered Gaussian random vectors in \({\mathbb {R}}^{p}\) with positive semi-definite covariance matrices \(\Sigma ^{X} =(\sigma ^{X}_{jk})_{1 \le j,k \le p}\) and \(\Sigma ^{Y}=(\sigma ^{Y}_{jk})_{1 \le j,k \le p}\), respectively. The purpose of this section is to give error bounds on the difference of the expectations of smooth functions and the distribution functions of

$$\begin{aligned} \max _{1 \le j \le p} X_{j} \quad \text {and} \quad \max _{1 \le j \le p} Y_{j} \end{aligned}$$

in terms of \(p\),

$$\begin{aligned} \Delta :=\max _{1\le j,k\le p} | \sigma ^{X}_{jk}-\sigma ^{Y}_{jk} |, \ \quad \text {and}\quad \ a_p:={\text {E}}\left[ \max _{1 \le j \le p} (Y_{j}/\sigma ^{Y}_{jj})\right] . \end{aligned}$$

The problem of comparing distributions of maxima is of intrinsic difficulty since the maximum function \(z=(z_{1},\dots ,z_{p})^{T} \mapsto \max _{1 \le j \le p} z_{j}\) is non-differentiable. To circumvent the problem, we use a smooth approximation of the maximum function. For \(z=(z_{1},\dots ,z_{p})^{T} \in {\mathbb {R}}^{p}\), consider the function:

$$\begin{aligned} F_{\beta }(z):=\beta ^{-1}\log \left( \sum _{j=1}^{p}\exp (\beta z_{j})\right) , \end{aligned}$$

which approximates the maximum function, where \(\beta > 0\) is the smoothing parameter that controls the level of approximation (we call this function the “smooth max function”). Indeed, an elementary calculation shows that for every \(z \in {\mathbb {R}}^{p}\),

$$\begin{aligned} 0 \le F_{\beta }(z)- \max _{1 \le j \le p} z_{j} \le \beta ^{-1} \log p. \end{aligned}$$
(1)

This smooth max function arises in the definition of “free energy” in spin glasses. See, for example, [21, 27]. Here is the first theorem of this section.

Theorem 1

(Comparison bounds for smooth functions) For every \(g \in C^2({\mathbb {R}})\) with \(\Vert g' \Vert _{\infty } \vee \Vert g'' \Vert _{\infty } < \infty \) and every \(\beta >0\),

$$\begin{aligned}&\mathrm {E}[g(F_{\beta }(X)) - g(F_{\beta } (Y))] \le (\Vert g'' \Vert _{\infty }/2 + \beta \Vert g' \Vert _{\infty })\Delta ,\\&\text {and hence}\\&\left| \mathrm {E} \left[ g\left( \max \limits _{1 \le j \le p}X_{j}\right) \! \!-\!\! g\left( \max \limits _{1 \le j \le p}Y_ {j}\right) \right] \right| \!\le \! (\Vert g'' \Vert _{\infty }/2 \!+\! \beta \Vert g' \Vert _{\infty }) \!\Delta \!+\!\! 2 \beta ^{-1} \Vert g' \Vert _{\infty }\! \log \! p. \end{aligned}$$

Proof

See Sect. 4.\(\square \)

Comment 1

Minimizing the second bound with respect to \(\beta > 0\), we have

$$\begin{aligned} \left| {\text {E}}\left[ g \left( \max _{1 \le j \le p}X_{j}\right) - g\left( \max _{1 \le j \le p} Y_{j}\right) \right] \right| \le \Vert g'' \Vert _{\infty }\Delta /2+ 2\Vert g' \Vert _{\infty }\sqrt{2\Delta \log p}. \end{aligned}$$

This result extends the work of [7], which derived the following Sudakov–Fernique type bound on the expectation of the difference between two Gaussian maxima:

$$\begin{aligned} \left| {\text {E}}\left[ \max _{1 \le j \le p}X_{j} - \max _{1 \le j \le p} Y_{j}\right] \right| \le 2\sqrt{2\Delta \log p}. \end{aligned}$$

Theorem 1 is not applicable to functions of the form \(g(z) = 1(z \le x)\) and hence does not directly lead to a bound on the Kolmogorov distance between \(\max _{1 \le j \le p} X_{j}\) and \(\max _{1 \le j \le p} Y_{j}\) (recall that the Kolmogorov distance between (the distributions) of two real valued random variables \(\xi \) and \(\eta \) is defined by \(\sup _{x \in {\mathbb {R}}} | {\text {P}}( \xi \le x ) - {\text {P}}(\eta \le x) |\)). Nevertheless, we have the following bounds on the Kolmogorov distance. Recall \(a_p={\text {E}}[ \max _{1 \le j \le p} (Y_{j}/\sigma ^{Y}_{jj})]\).

Theorem 2

(Comparison of distributions) Suppose that \(p \ge 2\) and \(\sigma ^{Y}_{jj} > 0\) for all \(1 \le j \le p\). Then

$$\begin{aligned}&\sup \limits _{x \in \mathbb {R}} \left| \mathrm {P}\left( \max \limits _{1 \le j \le p}X_{j} \le x\right) - \mathrm {P}\left( \max \limits _{1 \le j \le p}Y_{j} \le x\right) \right| \nonumber \\&\qquad \le C \Delta ^{1/3} \left\{ 1\vee a_p^2\vee \log (1/\Delta ) \right\} ^{1/3} \log ^{1/3} p, \end{aligned}$$
(2)

where \(C>0\) depends only on \(\min _{1 \le j \le p} \sigma _{jj}^{Y}\) and \(\max _{1 \le j \le p} \sigma _{jj}^{Y}\) (the right side is understood to be \(0\) when \(\Delta = 0\)). Moreover, in the worst case, \(a_{p} \le \sqrt{2 \log p}\), so that

$$\begin{aligned} \sup \limits _{x \in \mathbb {R}} \left| \mathrm {P}\left( \max \limits _{1 \le j \le p}X_{j} \le x\right) - \mathrm {P}\left( \max \limits _{1 \le j \le p}Y_{j} \le x\right) \right| \!\le \!C' \Delta ^{1/3} \left\{ 1 \!\vee \! \log (p/\Delta )\right\} ^{2/3}, \end{aligned}$$

where as before \(C' > 0\) depends only on \(\min _{1 \le j \le p} \sigma _{jj}^{Y}\) and \(\max _{1 \le j \le p} \sigma _{jj}^{Y}\).

Proof

See Sect. 4.\(\square \)

The first bound (2) is generally sharper than the latter. To see this, consider the simple case where \(a_{p} = O(1)\) as \(p \rightarrow \infty \), which would happen, for example, when \(Y_{1},\dots ,Y_{p}\) come from discretization of a single continuous Gaussian process. Then the right side on (2) is \(o(1)\) if \(\Delta (\log p) \log \log p= o(1)\), while the second bound requires \(\Delta (\log p)^{2}= o(1)\).

Comment 2

(On the proof strategy) Bounding the Kolmogorov distance between \(\max _{1 \le j \le p}X_{j}\) and \(\max _{1 \le j \le p}Y_{j}\) is not immediate from Theorem 1 and this step relies on the anti-concentration inequality for the maximum of a Gaussian random vector, which we will study in Sect. 3. More formally, by smoothing the indicator and maximum functions, we obtain from Theorem 1 a bound of the following form:

$$\begin{aligned} \inf _{\beta ,\delta > 0} \left\{ \mathcal {L}\left( \max _{1 \le j \le p} Y_{j},\beta ^{-1}\log p+\delta \right) + C(\delta ^{-2}+\beta \delta ^{-1}) \Delta \right\} , \end{aligned}$$

where \( \mathcal {L}(\max _{1 \le j \le p} Y_{j},\epsilon )\) is the Lévy concentration function for \(\max _{1 \le j \le p}Y_{j}\) (see Definition 1 in Sect. 3 for the formal definition), and \(\beta ,\delta > 0\) are smoothing parameters (see Eqs. (10) and (11) in the proof of Theorem 2 given in Sect. 4 for the derivation of the above bound). The bound (2) then follows from bounding the Lévy concentration function by using the anti-concentration inequality derived in Sect. 3, and optimizing the bound with respect to \(\beta ,\delta \).

The proof of Theorem 2 is substantially different from the (“textbook”) proof of classical Slepian’s inequality. The simplest form of Slepian’s inequality states that

$$\begin{aligned} {\text {P}}\left( \max _{1 \le j \le p} X_{j} \le x\right) \le {\text {P}}\left( \max _{1 \le j \le p} Y_{j} \le x\right) , \ \forall x \in {\mathbb {R}}, \end{aligned}$$

whenever \(\sigma _{jj}^{X} = \sigma _{jj}^{Y}\) and \(\sigma _{jk}^{X} \le \sigma _{jk}^{Y}\) for all \(1 \le j,k \le p\). This inequality is immediately deduced from the following expression:

$$\begin{aligned}&{\text {P}}\left( \max _{1 \le j \le p} X_{j}\le x \right) -{\text {P}}\left( \max _{1 \le j \le p} Y_{j}\le x \right) \nonumber \\&\quad = \sum _{1 \le j < k \le p} (\sigma _{jk}^{X} - \sigma _{jk}^{Y}) \int \limits _{0}^{1} \left\{ \int \limits _{-\infty }^{x} \cdots \int \limits _{-\infty }^{x} \frac{\partial ^{2} f_{t}(z)}{\partial z_{j}\partial z_{k}} dz \right\} dt, \end{aligned}$$
(3)

where we assume \(\Sigma ^{X}\) and \(\Sigma ^{Y}\) are non-singular, and \(\sigma _{jj}^{X}=\sigma _{jj}^{Y}, 1 \le \forall j \le p\). Here \(f_{t}\) denotes the density function of \(N(0,t \Sigma ^{X} + (1-t) \Sigma ^{Y})\). See [14], p. 82, for this expression. The expression (3) is of importance and indeed a source of many interesting probabilistic results (see, for example, [18, 30] for recent related works). It is not clear (or at least non-trivial), however, whether a bound similar in nature to Theorem 2 can be deduced from the expression (3) when there is no restriction on the covariance matrices except for the condition that \(\sigma _{jj}^{X} = \sigma _{jj}^{Y},1 \le \forall j \le p\), and here we take the different route.

The key features of Theorem 2 are: (i) the bound on the Kolmogorov distance between the maxima of Gaussian random vectors in \({\mathbb {R}}^{p}\) depends on the dimension \(p\) only through \(\log p\) and the maximum difference of the covariance matrices \(\Delta \), and (ii) it allows for arbitrary covariance matrices for \(X\) and \(Y\) (except for the nondegeneracy condition that \(\sigma _{jj}^{Y} > 0, \ 1 \le \forall j \le p\)). These features have an important implication to statistical applications, as discussed below.

2.2 Conditional multiplier central limit theorem

Consider the following problem. Suppose that \(n\) independent centered random vectors in \({\mathbb {R}}^{p}\) of observations \(Z_{1},\dots ,Z_{n}\) are given. Here \(Z_{1},\dots ,Z_{n}\) are generally non-Gaussian, and the dimension \(p\) is allowed to increase with \(n\) (that is, the case where \(p=p_{n} \rightarrow \infty \) as \(n \rightarrow \infty \) is allowed). We suppress the possible dependence of \(p\) on \(n\) for the notational convenience. Suppose that each \(Z_{i}\) has a finite covariance matrix \({\text {E}}[ Z_{i} Z_{i}^{T} ]\). Consider the following normalized sum:

$$\begin{aligned} S_{n} := (S_{n,1},\dots ,S_{n,p})^{T} = \frac{1}{\sqrt{n}} \sum _{i=1}^{n} Z_{i}. \end{aligned}$$

The problem here is to approximate the distribution of \(\max _{1 \le j \le p} S_{n,j}\).

Statistics of this form arise frequently in modern statistical applications. The exact distribution of \(\max _{1 \le j \le p} S_{n,j}\) is generally unknown. An intuitive idea to approximate the distribution of \(\max _{1 \le j \le p} S_{n,j}\) is to use the Gaussian approximation. Let \(V_{1},\dots ,V_{n}\) be independent Gaussian random vectors in \({\mathbb {R}}^{p}\) such that \(V_{i} \sim N(0,{\text {E}}[ Z_{i} Z_{i}^{T} ])\), and define

$$\begin{aligned} T_{n} := (T_{n,1},\dots ,T_{n,p}) := \frac{1}{\sqrt{n}} \sum _{i=1}^{n} V_{i} \sim N\left( 0,n^{-1} \sum \limits _{i=1}^{n} {\text {E}}[Z_{i}Z_{i}^{T}] \right) . \end{aligned}$$

It is expected that the distribution of \(\max _{1 \le j \le p} T_{n,j}\) is close to that of \(\max _{1 \le j \le p} S_{n,j}\) in the following sense:

$$\begin{aligned} \sup _{x \in {\mathbb {R}}} \left| {\text {P}}\left( \max _{1 \le j \le p} S_{n,j} \le x \right) - {\text {P}}\left( \max _{1 \le j \le p} T_{n,j} \le x \right) \right| \rightarrow 0, \quad n \rightarrow \infty . \end{aligned}$$
(4)

When \(p\) is fixed, (4) will follow from the classical Lindeberg–Feller central limit theorem, subject to the Lindeberg conditions. The recent paper by [9] established conditions under which this Gaussian approximation (4) holds even when \(p\) is comparable or much larger than \(n\). For example, [9] proved that if \(c_{1} \le n^{-1} \sum _{i=1}^{n} {\text {E}}[ Z_{ij}^{2} ] \le C_{1}\) and \({\text {E}}[ \exp (| Z_{ij} |/ C_{1} ) ] \le 2\) for all \(1 \le i \le n\) and \(1 \le j \le p\) for some \(0 < c_{1} < C_{1}\), then (4) holds as long as \(\log p = o(n^{1/7})\).

The Gaussian approximation (4) is in itself an important step, but in the general case where the covariance matrix \(n^{-1} \sum _{i=1}^{n} {\text {E}}[ Z_{i} Z_{i}^{T} ]\) is unknown, it is not directly applicable for purposes of statistical inference. In such cases, the following multiplier bootstrap procedure will be useful. Let \(\eta _{1},\dots ,\eta _{n}\) be independent standard Gaussian random variables independent of \(Z_{1}^{n} := \{ Z_{1},\dots ,Z_{n} \}\). Consider the following randomized sum:

$$\begin{aligned} S_{n}^{\eta } := \left( S_{n,1}^{\eta }, \dots , S_{n,p}^{\eta } \right) ^{T} := \frac{1}{\sqrt{n}} \sum _{i=1}^{n} \eta _{i} Z_{i}. \end{aligned}$$

Since conditional on \(Z_{1}^{n}\),

$$\begin{aligned} S_{n}^{\eta } \sim N\left( 0,n^{-1} \textstyle \sum \limits _{i=1}^{n} Z_{i}Z_{i}^{T}\right) , \end{aligned}$$

it is natural to expect that the conditional distribution of \(\max _{1 \le j \le p} S_{n,j}^{\eta }\) is “close” to the distribution of \(\max _{1 \le j \le p} T_{n,j}\) and hence that of \(\max _{1 \le j \le p} S_{n,j}\). Note here that the conditional distribution of \(S^{\eta }_{n}\) is completely known, which makes this distribution useful for purposes of statistical inference. The following proposition makes this intuition rigorous.

Proposition 1

(Conditional multiplier central limit theorem) Work with the setup as described above. Suppose that \(p \ge 2\) and there are some constants \(0 < c_{1} < C_{1}\) such that \(c_{1} \le n^{-1}\sum _{i=1}^{n} \mathrm {E}[Z_{ij}^{2}] \le C_{1}\) for all \(1 \le j \le p\). Moreover, suppose that \(\widehat{\Delta } := \max _{1 \le j,k \le p}|n^{-1}\sum _{i=1}^{n}(Z_{ij}Z_{ik} - \mathrm {E}[Z_{ij}Z_ {ik}] )|\) obeys the following conditions: as \(n \rightarrow \infty \),

$$\begin{aligned} \widehat{\Delta } \left( \mathrm {E}\left[ \max _{1 \le j \le p} T_ {n,j}\right] \right) ^{2} \log p = o_{\mathrm {P}}(1), \ \widehat{\Delta } (\log p) (1 \vee \log \log p)= o_{\mathrm {P}}(1). \end{aligned}$$
(5)

Then we have

$$\begin{aligned} \sup _{x \in \mathbb {R}} \left| \mathrm {P}\left( \max _{1 \le j \le p} S_{n,j}^{\eta } \le x \mid Z_{1}^{n} \right) -\mathrm {P}\left( \max _ {1 \le j \le p} T_{n,j} \le x \right) \right| \mathop {\rightarrow }\limits ^{\mathrm {P}} 0, \ \text {as} \ n \rightarrow \infty . \end{aligned}$$
(6)

Here recall that \(p\) is allowed to increase with \(n\).

Proof

Follows immediately from Theorem 2.\(\square \)

We call this result a “conditional multiplier central limit theorem,” where the terminology follows that in empirical process theory. See [28], Chapter 2.9. The notable features of this proposition, which inherit from the features of Theorem 2 discussed above, are: (i) (6) can hold even when \(p\) is much larger than \(n\), and (ii) it allows for arbitrary covariance matrices for \(Z_{i}\) (except for the mild scaling condition that \(c_{1} \le n^{-1} \sum _{i=1}^{n} {\text {E}}[ Z_{ij}^{2} ] \le C_{1}\)). The second point is clearly desirable in statistical applications as the information on the true covariance structure is generally (but not always) unavailable. For the first point, we have the following estimate on \({\text {E}}[ \widehat{\Delta } ]\).

Lemma 1

Let \(p \ge 2\). There exists a universal constant \(C > 0\) such that

$$\begin{aligned} \mathrm {E}[\widehat{\Delta }] \!\le \! C \left[ \max _{1 \le j \le p} \left( n^{-1} \sum \limits _{i=1}^{n} \mathrm {E}[ Z_{ij}^{4}]\right) ^{1/2}\!\! \sqrt{\frac{ \log p}{n}} \!+\! \left( \mathrm {E}\left[ \max _{1 \le i \le n} \max _{1 \le j \le p}Z_{ij}^{4}\right] \!\right) ^{1/2}\frac{\log p}{n} \right] . \end{aligned}$$

Proof

See the Appendix. \(\square \)

Hence with help of Lemma 2.2.2 in [28], we can find various primitive conditions under which (5) holds.

Example 1

Consider the following examples. Here for the sake of simplicity, we use the worst case bound \({\text {E}}[ \max _{1 \le j \le p} T_{n,j} ] \le \sqrt{2 C_{1} \log p}\), so that conditions (5) reduce to \(\widehat{\Delta } = o_{{\text {P}}}((\log p)^{-2})\).

Case a Suppose that \({\text {E}}[ \exp (| Z_{ij} |/ C_{1} ) ] \le 2\) for all \(1 \le i \le n\) and \(1 \le j \le p\) for some \(C_{1} > 0\). In this case, it is not difficult to verify that \(\widehat{\Delta } = o_{{\text {P}}}((\log p)^{-2})\) as soon as \(\log p = o(n^{1/5})\).

Case b Another type of \(Z_{ij}\) which arises in regression applications is of the form \(Z_{ij} = \varepsilon _{i} x_{ij}\) where \(\varepsilon _{i}\) are stochastic with \({\text {E}}[ \epsilon _{i} ] = 0\) and \(\max _{1 \le i \le n}{\text {E}}[ | \varepsilon _{i} |^{4q} ] = O(1)\) for some \(q \ge 1\), and \(x_{ij}\) are non-stochastic (typically, \(\varepsilon _{i}\) are “errors” and \(x_{ij}\) are “regressors”). Suppose that \(x_{ij}\) are normalized in such a way that \(n^{-1} \sum _{i=1}^{n} x_{ij}^{2} = 1\), and there are bounds \(B_{n} \ge 1\) such that \(\max _{1 \le i \le n} \max _{1 \le j \le p} | x_{ij} | \le B_{n}\), where we allow \(B_{n} \rightarrow \infty \). In this case, \(\widehat{\Delta } = o_{{\text {P}}}((\log p)^{-2})\) as soon as

$$\begin{aligned} \max \{ B_{n}^{2} (\log p)^{5}, B_{n}^{4q/(2q-1)} (\log p)^{6q/(2q-1)} \} = o(n), \end{aligned}$$

since \(\max _{1 \le j \le p} ( n^{-1}\sum _{i=1}^{n} {\text {E}}[ (\varepsilon _{i} x_{ij})^{4} ]) \le B_{n}^{2} \max _{1 \le i \le n} {\text {E}}[ \varepsilon _{i}^{4} ] = O(B_{n}^{2})\) and \({\text {E}}[ \max _{1 \le i \le n} \max _{1 \le j \le p} (\varepsilon _{i} x_{ij})^{4} ] \le B_{n}^{4} {\text {E}}[\max _{1 \le i \le n} \varepsilon _{i}^{4} ] = O(n^{1/q} B_{n}^{4})\).

Importantly, in these examples, for (6) to hold, \(p\) can increase exponentially in some fractional power of \(n\).

3 Anti-concentration bounds

The following theorem provides bounds on the Lévy concentration function of the maximum of a Gaussian random vector in \({\mathbb {R}}^{p}\), where the terminology is borrowed from [23].

Definition 1

([23], Definition 3.1) The Lévy concentration function of a real valued random variable \(\xi \) is defined for \(\epsilon > 0\) as

$$\begin{aligned} \mathcal {L}(\xi ,\epsilon ) = \sup _{x \in {\mathbb {R}}} {\text {P}}( | \xi - x | \le \epsilon ). \end{aligned}$$

Theorem 3

(Anti-concentration) Let \((X_{1},\dots ,X_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\) with \(\sigma ^{2}_{j} := \mathrm {E}[X_{j}^{2}] > 0\) for all \(1 \le j \le p\). Moreover, let \(\underline{\sigma } := \min _{1 \le j \le p} \sigma _{j}, \overline{\sigma } := \max _{1 \le j \le p} \sigma _{j}\), and \(a_{p} := \mathrm {E}[\max _{1 \le j \le p}(X_{j}/\sigma _{j})]\).

  1. (i)

    If the variances are all equal, namely \(\underline{\sigma } = \overline{\sigma } = \sigma \), then for every \(\epsilon > 0\),

    $$\begin{aligned} \mathcal {L}\left( \max _{1 \le j \le p} X_{j},\epsilon \right) \le 4 \epsilon (a_p + 1)/\sigma . \end{aligned}$$
  2. (ii)

    If the variances are not equal, namely \(\underline{\sigma } < \overline{\sigma }\), then for every \(\epsilon > 0\),

    $$\begin{aligned} \mathcal {L}\left( \max _{1 \le j \le p} X_{j},\epsilon \right) \le C \epsilon \{ a_p + \sqrt{1 \vee \log (\underline{\sigma }/\epsilon )} \}, \end{aligned}$$

where \(C>0\) depends only on \(\underline{\sigma }\) and \(\overline{\sigma }\).

Since \(X_{j}/\sigma _{j} \sim N(0,1)\), by a standard calculation, we have \(a_{p} \le \sqrt{2 \log p}\) in the worst case (see, for example, [27], Proposition 1.1.3), so that the following simpler corollary follows immediately from Theorem 3.

Corollary 1

Let \((X_{1},\dots ,X_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\) with \(\sigma ^{2}_{j} := \mathrm {E}[X_{j}^{2}] > 0\) for all \(1 \le j \le p\). Let \(\underline{\sigma } := \min _{1 \le j \le p} \sigma _{j}\) and \(\overline{\sigma } := \max _{1 \le j \le p} \sigma _{j}\). Then for every \(\epsilon > 0\),

$$\begin{aligned} \mathcal {L}\left( \max _{1 \le j \le p} X_{j},\epsilon \right) \le C \epsilon \sqrt{1 \vee \log (p/\epsilon )}, \end{aligned}$$

where \(C>0\) depends only on \(\underline{\sigma }\) and \(\overline{\sigma }\). When \(\sigma _{j}\) are all equal, \(\log (p/\epsilon )\) on the right side can be replaced by \(\log p\).

Comment 3

(Anti-concentration vs. small ball probabilities) The problem of bounding the Lévy concentration function \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon )\) is qualitatively different from the problem of bounding \({\text {P}}( \max _{1 \le j \le p} | X_{j} | \le x )\). For a survey on the latter problem, called the “small ball problem”, we refer the reader to [17].

Comment 4

(Concentration vs. anti-concentration) Concentration inequalities refer to inequalities bounding \({\text {P}}(| \xi - x | > \epsilon )\) for a random variable \(\xi \) (typically \(x\) is the mean or median of \(\xi \)). See the monograph [15] for a study of the concentration of measure phenomenon. Anti-concentration inequalities in turn refer to reverse inequalities, that is, inequalities bounding \({\text {P}}(| \xi - x | \le \epsilon )\). Theorem 3 provides anti-concentration inequalities for \(\max _{1 \le j \le p} X_{j}\). [29] remarked that “concentration is better understood than anti-concentration”. In the present case, the Gaussian concentration inequality (see [15], Theorem 7.1) states that

$$\begin{aligned} {\text {P}}\left( \left| \max _{1 \le j \le p} X_{j} - {\text {E}}\left[ \max _{1 \le j \le p} X_{j} \right] \right| \ge r\right) \le 2 e^{-r^{2}/(2 \overline{\sigma }^{2})}, \ r > 0, \end{aligned}$$

where the mean can be replace by the median. This inequality is well known and dates back to [4] and [26]. To the best of our knowledge, however, the reverse inequalities in Theorem 3 were not known and are new.

Comment 5

(Anti-concentration for maximum of moduli, \(\max _{1 \le j \le p} | X_{j} |\)) Versions of Theorem 3 and Corollary 1 continue to hold for \(\max _{1 \le j \le p} | X_{j} |\). That is, for example, when \(\sigma _{j}\) are all equal (\(\sigma _{j} = \sigma \)), \( \mathcal {L}(\max _{1 \le j \le p} | X_{j} |, \epsilon ) \le 4 ( a_{p}'+1)/\sigma \), where \(a_{p}' := {\text {E}}[ \max _{1 \le j \le p} | X_{j} |/\sigma ]\). To see this, observe that \(\max _{1 \le j \le p} | X_{j} | = \max _{1 \le j \le 2p} X_{j}'\) where \(X_{j}'=X_{j}\) for \(j=1,\dots ,p\) and \(X_{p+j}' = - X_{j}\) for \(j=1,\dots ,p\). Hence we may apply Theorem 3 to \((X_{1}',\dots ,X_{2p}')^{T}\) to obtain the desired conclusion.

Comment 6

(A sketch of the proof of Theorem 3) For the reader’s convenience, we provide a sketch of the proof of Theorem 3. We focus here on the simple case where all the variances are equal to one \((\sigma _{1}=\cdots =\sigma _{p}=1\)). Then the distribution of \(Z=\max _{1 \le j \le p} X_{j}\) is absolutely continuous and its density can be written as \(\phi (z) G(z)\) where the map \(z \mapsto G(z)\) is non-decreasing. Consequently, it is then not difficult to see that \(G(z) \le {\text {P}}(Z > z)/\{ 1- \Phi (z) \} \le 2 (z \vee 1) e^{-(z-a_{p})_{+}^{2}/2}/\phi (z)\), where the second inequality follows from Mill’s inequality combined with the Gaussian concentration inequality. Hence the density of \(Z\) is bounded by \(2 (z \vee 1) e^{-(z-a_{p})_{+}^{2}/2}\), which immediately leads to the bound \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon ) \le 4 (a_{p}+1)\epsilon \).

In a trivial example where \(p=1\), it is immediate to see that \({\text {P}}( | X_{1} - x| \le \epsilon ) \le \epsilon \sqrt{2/(\pi \sigma _{1}^{2})}\). A non-trivial case is the situation where \(p \rightarrow \infty \). In such a case, it is typically not known whether \(\max _{1 \le j \le p} X_{j}\) has a limiting distribution as \(p \rightarrow \infty \), even after normalization (recall that except for \(\underline{\sigma } > 0\), we allow for general covariance structures between \(X_{1},\dots ,X_{p}\)), and therefore it is not trivial at all whether, for every sequence \(\epsilon = \epsilon _{p} \rightarrow 0\) (or at some rate), \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon ) \rightarrow 0\) or how fast \(\epsilon = \epsilon _{p} \rightarrow 0\) should be to guarantee that \( \mathcal {L}(\max _{1 \le j \le p} X_{j},\epsilon ) \rightarrow 0\). Theorem 3 answers this question with explicit, non-asymptotic bounds.

Importantly, the bounds in Theorem 3 are dimension-free in the sense that, similar to the Gaussian concentration inequality, they depend on the dimension \(p\) only through \(a_{p}\)—the expectation of the maximum of the (normalized) Gaussian random vector. Hence Theorem 3 admits direct extensions to the infinite dimensional case, namely separable Gaussian processes, as long as the corresponding expectation is finite. See our companion paper [10] for formal treatments and applications of this extension.

The presence of \(a_{p}\) on the bounds is essential and can not be removed in general, as the following example suggests. This shows that there does not exist a substantially sharper estimate of the universal bound of the concentration function than that given in Theorem 3. Potentially, there could be refinements but they would have to rely on the particular (hence non-universal) features of the covariance structure between \(X_{1},\dots ,X_{p}\).

Example 2

(Partial converse of Theorem 3) Let \(X_{1},\dots ,X_{p}\) be independent standard Gaussian random variables. By Theorem 1.5.3 of [14], as \(p \rightarrow \infty \),

$$\begin{aligned} b_{p} \left( \max _{1 \le j \le p}X_{j} - d_{p}\right) \mathop {\rightarrow }\limits ^{d} G(0,1), \end{aligned}$$
(7)

where

$$\begin{aligned} b_{p} := \sqrt{2 \log p}, \ \ d_{p} := b_{p} - \frac{ \log (4 \pi ) + \log \log p }{2 b_{p}}, \end{aligned}$$

and \(G(0,1)\) denotes the standard Gumbel distribution, that is, the distribution having the density \(g(x) = e^{-x} e^{-e^{-x}}\) for \(x \in \mathbb {R}\). In fact, we can show that the density of \(b_{p} (\max _{1 \le j \le p}X_{j} - d_{p})\) converges to that of \(G(0,1)\) locally uniformly. To see this, we begin with noting that the density of \(b_{p} (\max _{1 \le j \le p}X_{j} - d_{p})\) is given by

$$\begin{aligned} g_{p}(x) = \frac{p}{b_{p}}\phi (d_{p} + b_{p}^{-1}x) [\Phi (d_{p}+b_{p}^{-1}x)]^{p-1}. \end{aligned}$$

Pick any \(x \in \mathbb {R}\). Since, by the weak convergence result (7),

$$\begin{aligned}{}[\Phi (d_{p}+b_{p}^{-1}x)]^p ={\text {P}}\left( b_{p} \left( \max _{1 \le j \le p}X_{j} - d_{p}\right) \le x\right) \rightarrow e^{-e^{-x}}, \ p \rightarrow \infty , \end{aligned}$$

we have \([\Phi (d_{p}+b_{p}^{-1}x)]^{p-1} \rightarrow e^{-e^{-x}}\). Hence it remains to show that

$$\begin{aligned} \frac{p}{b_{p}}\phi (d_{p} + b_{p}^{-1}x) \rightarrow e^{-x}. \end{aligned}$$

Taking the logarithm of the left side yields

$$\begin{aligned} \log p -\log b_{p} - \log (\sqrt{2\pi }) - (d_{p}+b_{p}^{-1}x)^{2}/2. \end{aligned}$$
(8)

Expanding \((d_{p}+b_{p}^{-1}x)^{2}\) gives that

$$\begin{aligned} d_{p}^{2} + 2 d_{p} b_{p}^{-1}x + b_{p}^{-2}x^{2} = b_{p}^{2} - \log \log p - \log (4\pi ) + 2x + o(1), \ p \rightarrow \infty , \end{aligned}$$

by which we have (8) \(= -x + o(1)\). This shows that \(g_{p}(x) \rightarrow g(x)\) for all \(x \in {\mathbb {R}}\). Moreover, this convergence takes place locally uniformly in \(x\), that is, for every \(K > 0\), \(g_{p}(x) \rightarrow g(x)\) uniformly in \(x \in [-K, K]\).

On the other hand, the density of \(\max _{1 \le j \le p} X_{j}\) is given by \(f_{p}(x) = p \phi (x) [ \Phi (x) ]^{p-1}\). By this form, for every \(K > 0\), there exist a constant \(c > 0\) and a positive integer \(p_{0}\) depending only on \(K\) such that for \(p \ge p_{0}\),

$$\begin{aligned} \inf _{x \in [ d_{p} - K b_{p}^{-1}, d_{p} +K b_{p}^{-1}]} b_{p}^{-1} f_{p}(x) = \inf _{x \in [-K,K]} g_{p}(x) \ge \inf _{x \in [-K,K]} g(x) + o(1) \ge c, \end{aligned}$$

which shows that for \(p \ge p_{0}\),

$$\begin{aligned} f_{p}(x) \ge c b_{p}, \ \forall x \in [ d_{p} - Kb_{p}^{-1}, d_{p} +K b_{p}^{-1}]. \end{aligned}$$

Therefore, we conclude that for \(p \ge p_{0}\),

$$\begin{aligned} {\text {P}}\left( \left| \max _{1 \le j \le p} X_{j} - d_{p} \right| \le \epsilon \right) = \int \limits _{d_{p}-\epsilon }^{d_{p}+\epsilon } f_{p}(x) dx \ge 2 c \epsilon b_{p}, \ \forall \epsilon \in [ 0, K b_{p}^{-1} ]. \end{aligned}$$

By the Gaussian maximal inequality and Lemma 2.3.15 of [11], we have

$$\begin{aligned} \sqrt{\log p}/12 \le \mathrm {E}\left[ \max _{1 \le j \le p}X_{j}\right] \le \sqrt{2 \log p}. \end{aligned}$$

Hence, by the previous result, for every \(K' > 0\), there exist a constant \(c' > 0\) and a positive integer \(p_{0}'\) depending only on \(K'\) such that for \(p \ge p_{0}'\), \(a_{p} \ge 1\) and

$$\begin{aligned} \mathcal {L}\left( \max _{1 \le j \le p} X_{j},\epsilon \right) \ge {\text {P}}\left( \left| \max _{1 \le j \le p} X_{j} - d_{p} \right| \le \epsilon \right) \ge c' \epsilon a_{p}, \ \forall \epsilon \in [0, K'a^{-1}_{p} ]. \end{aligned}$$

\(\square \)

4 Proofs for Section 2

4.1 Proof of Theorem 1

Here for a smooth function \(f: {\mathbb {R}}^{p} \rightarrow {\mathbb {R}}\), we write \(\partial _{j} f (z) = \partial f(z)/\partial z_{j}\) for \(z = (z_{1},\dots ,z_{p})^{T}\). We shall use the following version of Stein’s identity.

Lemma 2

(Stein’s identity) Let \(W =(W_{1},\dots ,W_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\). Let \(f: {\mathbb {R}}^{p} \rightarrow {\mathbb {R}}\) be a \(C^1\)-function such that \(\mathrm {E}[| \partial _{j}f(W) |] < \infty \) for all \(1 \le j \le p\). Then for every \(1 \le j \le p\),

$$\begin{aligned} \mathrm {E}[W_{j}f(W)]=\sum _{k=1}^p\mathrm {E}[W_{j}W_{k}] \mathrm {E}[\partial _{k} f(W)]. \end{aligned}$$

Proof of Lemma 2

See Section A.6 of [27]; also [8] and [25].

We will use the following properties of the smooth max function.

Lemma 3

For every \(1 \le j,k \le p\),

$$\begin{aligned} \partial _{j} F_{\beta }(z) = \pi _{j}(z), \quad \partial _{j} \partial _k F_{\beta }(z) = \beta w_{jk} (z), \end{aligned}$$

where

$$\begin{aligned} \pi _{j}(z) := e^{\beta z_{j}}/\sum \limits _{m=1}^pe^{\beta z_m}, \ w_{jk}(z) := 1 (j = k) \pi _{j} (z)- \pi _{j}(z) \pi _{k} (z). \end{aligned}$$

Moreover,

$$\begin{aligned} \pi _{j}(z) \ge 0, \sum \limits _{j=1}^p \pi _{j} (z) = 1, \ \sum \limits _{j,k=1}^p |w_{jk}(z)| \le 2. \end{aligned}$$

Proof of Lemma 3

The first property was noted in [7]. The other properties follow from a direct calculation.\(\square \)

Lemma 4

Let \(m := g \circ F_{\beta }\) with \(g \in C^{2}({\mathbb {R}})\). Then for every \(1 \le j,k \le p\),

$$\begin{aligned} \partial _{j} \partial _k m(z) = (g'' \circ F_\beta ) (z) \pi _{j} (z) \pi _{k} (z)+ \beta (g' \circ F_\beta ) (z) w_{jk}(z), \end{aligned}$$

where \(\pi _{j}\) and \(w_{jk}\) are defined in Lemma 3.

Proof of lemma 4

The proof follows from a direct calculation.\(\square \)

Proof of Theorem 1

Without loss of generality, we may assume that \(X\) and \(Y\) are independent, so that \({\text {E}}[X_{j} Y_k]=0\) for all \(1 \le j,k \le p\). Consider the following Slepian interpolation between \(X\) and \(Y\):

$$\begin{aligned} Z(t):=\sqrt{t} X+\sqrt{1-t} Y, \ t \in [0,1]. \end{aligned}$$

Let \(m := g \circ F_\beta \) and \(\Psi (t):={\text {E}}[m (Z(t))]\). Then

$$\begin{aligned} |{\text {E}}[m(X)]-{\text {E}}[ m(Y) ]|= | \Psi (1) - \Psi (0)| =\left| \int \limits _{0}^{1} \Psi '(t) dt\right| . \end{aligned}$$

Here we have

$$\begin{aligned} \Psi '(t)&= \frac{1}{2} \sum _{j=1}^p {\text {E}}[ \partial _{j} m(Z(t)) (t^{-1/2}X_{j}-(1-t)^{-1/2}Y_{j}) ] \\&= \frac{1}{2} \sum _{j=1}^p \sum _{k=1}^p (\sigma _{jk}^{X}-\sigma _{jk}^{Y}) {\text {E}}[\partial _{j} \partial _k m (Z(t)) ], \end{aligned}$$

where the second equality follows from applying Lemma 2 to \(W = (t^{-1/2}X_{j}-(1-t)^{-1/2}Y_{j}, Z(t)^{T})^{T}\) and \(f(W)=\partial _{j} m(Z(t))\). Hence

$$\begin{aligned} \left| \int \limits _{0}^{1} \Psi '(t) dt\right|&\le \frac{1}{2} \sum _{j,k=1}^{p} | \sigma _{jk}^{X}-\sigma _{jk}^{Y} | \left| \int \limits _0^1 {\text {E}}[\partial _{j} \partial _k m(Z(t))] dt \right| \\&\le \frac{1}{2} \max _{1 \le j,k \le p} | \sigma _{jk}^{X}-\sigma _{jk}^{Y} | \int \limits _0^1 \sum _{j,k=1}^{p} \left| {\text {E}}[\partial _{j} \partial _k m(Z(t))]\right| dt \\&= \frac{\Delta }{2} \int \limits _0^1 \sum _{j,k=1}^{p} \left| {\text {E}}[\partial _{j} \partial _k m(Z(t))]\right| dt. \end{aligned}$$

By Lemmas 3 and 4,

$$\begin{aligned} \sum _{j,k=1}^{p} \left| \partial _{j} \partial _k m(Z(t))\right| \le \left| (g'' \circ F_{\beta })(Z(t))\right| + 2 \beta \left| (g' \circ F_{\beta })(Z(t))\right| . \end{aligned}$$

Therefore, we have

$$\begin{aligned}&\left| {\text {E}}[g(F_{\beta }(X))-g(F_{\beta }(Y))]\right| \\&\quad \le \Delta \times \left\{ \frac{1}{2} \int \limits _0^1 {\text {E}}[ | (g'' \circ F_{\beta })(Z(t))| ] dt + \beta \int \limits _0^1 {\text {E}}[| (g' \circ F_{\beta })(Z(t))| ]dt\right\} \\&\quad \le \Delta ( \Vert g'' \Vert _{\infty }/2 + \beta \Vert g' \Vert _{\infty }), \end{aligned}$$

which leads to the first assertion. The second assertion follows from the inequality (1). This completes the proof.\(\square \)

4.2 Proof of Theorem 2

The final assertion follows from the inequality \(a_{p} \le \sqrt{2 \log p}\) (see, for example, [27], Proposition 1.1.3). Hence we prove (2). We first note that we may assume that \(0 < \Delta < 1\) since otherwise the proof is trivial (take \(C \ge 2\) in (2)). In what follows, let \(C>0\) be a generic constant that depends only on \(\min _{1 \le j \le p} \sigma _{jj}^{Y}\) and \(\max _{1 \le j \le p} \sigma _{jj}^{Y}\), and its value may change from place to place. For \(\beta > 0\), define \(e_{p,\beta } := \beta ^{-1} \log p\). Consider and fix a \(C^{2}\)-function \(g_0:{\mathbb {R}}\rightarrow [0,1]\) such that \(g_0(t)=1\) for \(t \le 0\) and \(g_0(t)=0\) for \(t \ge 1\). For example, we may take

$$\begin{aligned} g_{0}(t) ={\left\{ \begin{array}{ll} 0, &{} t \ge 1, \\ 30\int _{t}^{1} s^{2}(1-s)^{2} ds, &{} 0 < t < 1, \\ 1, &{} t \le 0. \end{array}\right. } \end{aligned}$$

For given \(x \in {\mathbb {R}}, \beta > 0\) and \(\delta > 0\), define \(g_{x,\beta ,\delta } (t)=g_0(\delta ^{-1}(t-x-e_{p,\beta }))\). For this function \(g_{x,\beta ,\delta }, \Vert g_{x,\beta ,\delta }' \Vert _{\infty } = \delta ^{-1} \Vert g_{0}' \Vert _{\infty }\) and \(\Vert g_{x,\beta ,\delta }'' \Vert _{\infty } = \delta ^{-2} \Vert g_{0}'' \Vert _{\infty }\). Moreover,

$$\begin{aligned} 1(t \le x+e_{p,\beta }) \le g_{x,\beta ,\delta }(t) \le 1(t \le x + e_{p,\beta } + \delta ), \ \forall t \in {\mathbb {R}}. \end{aligned}$$
(9)

For arbitrary \(x \in {\mathbb {R}}, \beta > 0\) and \(\delta > 0\), observe that

$$\begin{aligned} {\text {P}}\left( \max _{1 \le j \le p} X_{j} \le x \right)&\le {\text {P}}( F_{\beta } (X) \le x + e_{p,\beta }) \le {\text {E}}[g_{x,\beta ,\delta }(F_\beta (X))] \nonumber \\&\le {\text {E}}[g_{x,\beta ,\delta }(F_\beta (Y))] + C(\delta ^{-2}+\beta \delta ^{-1})\Delta \nonumber \\&\le {\text {P}}( F_\beta (Y) \le x+e_{p,\beta }+\delta ) + C(\delta ^{-2}+\beta \delta ^{-1})\Delta \nonumber \\&\le {\text {P}}\left( \max _{1 \le j \le p} Y_{j} \le x+e_{p,\beta }+\delta \right) + C(\delta ^{-2}+\beta \delta ^{-1})\Delta , \end{aligned}$$
(10)

where the first inequality follows from the inequality (1), the second from the inequality (9), the third from Theorem 1, the fourth from the inequality (9), and the last from the inequality (1). We wish to compare \({\text {P}}(\max _{1 \le j \le p} Y_{j} \le x+e_{p,\beta }+\delta ) \) with \({\text {P}}(\max _{1 \le j \le p} Y_{j} \le x ) \), and this is where the anti-concentration inequality plays its role. By Theorem 3, we have

$$\begin{aligned}&{\text {P}}\left( \max _{1 \le j \le p} Y_{j} \le x+e_{p,\beta }+\delta \right) - {\text {P}}\left( \max _{1 \le j \le p} Y_{j} \le x \right) \nonumber \\&\quad = {\text {P}}\left( x < \max _{1 \le j \le p} Y_{j} \le x+e_{p,\beta }+\delta \right) \le \mathcal {L}\left( \max _{1 \le j \le p} Y_{j}, e_{p,\beta }+\delta \right) \\&\quad \le C (e_{p,\beta } + \delta ) \sqrt{1 \vee a_{p}^{2} \vee \log \{ 1 / (e_{p,\beta } + \delta ) \}} \le C (e_{p,\beta } + \delta ) \sqrt{1 \vee a_{p}^{2} \vee \log (1 / \delta )}. \nonumber \end{aligned}$$
(11)

Therefore,

$$\begin{aligned}&{\text {P}}\left( \max _{1\le j\le p}X_{j}\le x \right) - {\text {P}}\left( \max _{1\le j\le p}Y_{j}\le x\right) \nonumber \\&\quad \le C \left\{ (\delta ^{-2}+\beta \delta ^{-1})\Delta + (e_{p,\beta }+\delta )\sqrt{1 \vee a_{p}^{2} \vee \log (1/\delta )} \right\} . \end{aligned}$$
(12)

Take \(a = a_{p} \vee \log ^{1/2}(1/\Delta )\), and choose \(\beta \) and \(\delta \) in such a way that

$$\begin{aligned} \beta = \delta ^{-1}\log p \ \quad \text {and}\quad \ \delta = \Delta ^{1/3} (1 \vee a)^{-1/3} (2\log p)^{1/3}. \end{aligned}$$

Recall that \(p \ge 2\) and \(0 < \Delta < 1\). Observe that \((\delta ^{-2}+\beta \delta ^{-1})\Delta \le C \Delta ^{1/3} (1 \vee a)^{2/3} \log ^{1/3} p, (e_{p,\beta }+\delta ) (1 \vee a_{p}) \le C \Delta ^{1/3} (1 \vee a)^{2/3} \log ^{1/3} p\), and since \(\delta \ge \Delta ^{1/3} (1 \vee a)^{-1/3}\), we have \(\log (1/\delta ) \le (1/3) \log ((1 \vee a)/\Delta )\). Hence the right side on (12) is bounded by \(C \Delta ^{1/3} \{ (1 \vee a)^{2/3} \log ^{1/3} p + (1 \vee a)^{-1/3} ( \log ^{1/3} p ) \log ^{1/2}((1 \vee a)/\Delta ) \}\). In addition, \((1 \vee a)^{-1} \log ^{1/2} (1 \vee a)\) is bounded by a universal constant, so that

$$\begin{aligned}&\text {right side of (12)} \\&\quad \le C \Delta ^{1/3} \{ (1 \vee a)^{2/3} \log ^{1/3} p + (1 \vee a)^{-1/3} ( \log ^{1/3} p ) \log ^{1/2}(1/\Delta ) \}. \end{aligned}$$

The second term inside the bracket is bounded by \(( \log ^{1/3} p ) \log ^{1/3}(1/\Delta )\) as \((1 \vee a)^{-1/3} \le (\log (1/\Delta ))^{-1/6}\), and first term is bound by \((1 \vee a_{p})^{2/3} \log ^{1/3} p + ( \log ^{1/3} p ) \log ^{1/3}(1/\Delta )\). Adjusting the constant \(C\), the right side on the above displayed equation is bounded by \(C \Delta ^{1/3} \{ 1 \vee a_{p}^{2} \vee \log (1/\Delta ) \}^{1/3} \log ^{1/3}p\).

For the opposite direction, observe that

$$\begin{aligned} {\text {P}}\left( \max _{1 \le j \le p} X_{j} \le x \right)&\ge {\text {P}}( F_{\beta } (X) \le x) \ge {\text {E}}[g_{x-e_{p,\beta }-\delta ,\beta ,\delta }(F_\beta (X))] \\&\ge {\text {E}}[g_{x-e_{p,\beta }-\delta ,\beta ,\delta }(F_\beta (Y))] - C(\delta ^{-2}+\beta \delta ^{-1})\Delta \\&\ge {\text {P}}(F_{\beta }(Y) \le x - \delta ) - C(\delta ^{-2}+\beta \delta ^{-1})\Delta \\&\ge {\text {P}}\left( \max _{1 \le j \le p} Y_{j} \le x - e_{p,\beta } - \delta \right) - C(\delta ^{-2}+\beta \delta ^{-1})\Delta . \end{aligned}$$

The rest of the proof is similar and hence omitted.\(\square \)

5 Proof of Theorem 3

The proof of Theorem 3 uses some properties of Gaussian measures. We begin with preparing technical tools. The following two facts were essentially noted in [31, 32] (note: [31] and [32] did not contain a proof of Lemma 5, which we find non-trivial). For the sake of completeness, we give their proofs after the proof of Theorem 3.

Lemma 5

Let \((W_{1},\dots ,W_{p})^{T}\) be a (not necessarily centered) Gaussian random vector in \({\mathbb {R}}^{p}\) with \({{\mathrm{Var}}}(W_{j}) = 1\) for all \(1 \le j \le p\). Suppose that \({{\mathrm{Corr}}}(W_{j},W_{k})<1\) whenever \(j \ne k\). Then the distribution of \(\max _{1 \le j \le p} W_{j}\) is absolutely continuous with respect to the Lebesgue measure and a version of the density is given by

$$\begin{aligned} f(x) = \phi (x) \sum _{j=1}^{p} e^{ \mathrm {E} [W_{j}]x-(\mathrm {E} [W_{j}])^2/2 } \cdot \mathrm {P}\left( W_{k} \le x, \forall k \ne j \mid W_{j}=x\right) . \end{aligned}$$
(13)

Lemma 6

Let \((W_{0},W_{1},\dots ,W_{p})^{T}\) be a (not necessarily centered) Gaussian random vector with \({{\mathrm{Var}}}(W_{j}) =1 \) for all \(0 \le j \le p\). Suppose that \(\mathrm {E}[W_{0} ]\ge 0\). Then the map

$$\begin{aligned} x \mapsto e^{ \mathrm{E}[W_{0} ] x-(\mathrm{E}[W_{0}])^2/2 } \cdot {\mathrm {P}}( W_{j} \le x, 1 \le \forall j \le p \mid W_{0} = x) \end{aligned}$$
(14)

is non-decreasing on \({\mathbb {R}}\).

Let us also recall (a version of) the Gaussian concentration (more precisely, deviation) inequality. See, for example, [15], Theorem 7.1, for its proof.

Lemma 7

Let \((X_{1},\dots ,X_{p})^{T}\) be a centered Gaussian random vector in \({\mathbb {R}}^{p}\) with \(\max _{1 \le j \le p} \mathrm {E}[X_{j}^2] \le \sigma ^{2}\) for some \(\sigma ^{2}> 0\). Then for every \(r > 0\),

$$\begin{aligned} \mathrm {P}\left( \max _{1 \le j \le p} X_{j} \ge \mathrm {E}\left[ \max _{1 \le j \le p} X_{j} \right] + r \right) \le e^{-r^{2}/(2\sigma ^{2})}. \end{aligned}$$

We are now in position to prove Theorem 3.

Proof of Theorem 3

The proof consists of three steps.

Step 1. This step reduces the analysis to the unit variance case. Pick any \(x\ge 0\). Let \(W_{j} :=(X_{j}-x)/\sigma _{j}+x/\underline{\sigma }\). Then \({\text {E}}[W_{j}]\ge 0\) and \({{\mathrm{Var}}}(W_{j})=1\). Define \(Z:= \max _{1 \le j \le p} W_{j}\). Then we have

$$\begin{aligned}&{\text {P}}\left( \left| \max _{1 \le j \le p}X_{j}-x \right| \le \epsilon \right) \le {\text {P}}\left( \left| \max _{1 \le j \le p}\frac{X_{j}-x}{\sigma _j}\right| \le \frac{\epsilon }{\underline{\sigma }}\right) \\&\quad \le \sup _{y\in {\mathbb {R}}}{\text {P}}\left( \left| \max _{1 \le j \le p}\frac{X_{j}-x}{\sigma _j}+\frac{x}{\underline{\sigma }}-y\right| \le \frac{\epsilon }{\underline{\sigma }}\right) =\sup _{y\in {\mathbb {R}}}{\text {P}}\left( \left| Z-y\right| \le \frac{\epsilon }{\underline{\sigma }}\right) . \end{aligned}$$

Step 2. This step bounds the density of \(Z\). Without loss of generality, we may assume that \({{\mathrm{Corr}}}(W_{j},W_{k})<1\) whenever \(j \ne k\). Since the marginal distribution of \(W_{j}\) is \(N(\mu _{j},1)\) where \(\mu _{j}:={\text {E}}[W_{j}]=(x/\underline{\sigma }-x/\sigma _j)\ge 0\), by Lemma 5, \(Z\) has density of the form

$$\begin{aligned} f_{p}(z) = \phi (z) G_{p} (z), \end{aligned}$$
(15)

where the map \(z \mapsto G_{p}(z)\) is non-decreasing by Lemma 6. Define \(\bar{z}:= (1/\underline{\sigma }-1/\overline{\sigma })x\), so that \(\mu _{j}\le \bar{z}\) for every \(1 \le j \le p\). Moreover, define \(\bar{Z}:=\max _{1 \le j \le p}(W_{j}-\mu _{j})\). Then

$$\begin{aligned}&\int \limits _{z}^{\infty } \phi (u) du G_{p}(z) \le \int \limits _{z}^{\infty } \phi (u) G_{p}(u) du = {\text {P}}( Z> z) \\&\qquad \le {\text {P}}( \bar{Z} > z - \bar{z} ) \le \exp \left\{ -\frac{(z-\bar{z}-{\text {E}}[ \bar{Z} ])_{+}^{2}}{2} \right\} , \end{aligned}$$

where the last inequality is due to the Gaussian concentration inequality (Lemma 7). Note that \(W_{j}-\mu _{j}=X_{j}/\sigma _{j}\), so that

$$\begin{aligned} {\text {E}}[\bar{Z}]={\text {E}}\left[ \max _{1 \le j \le p}(X_{j}/\sigma _{j}) \right] =: a_p. \end{aligned}$$

Therefore, for every \(z \in {\mathbb {R}}\),

$$\begin{aligned} G_{p} (z) \le \frac{1}{1-\Phi (z)} \exp \left\{ -\frac{(z-\bar{z}-a_p)_{+}^{2}}{2} \right\} . \end{aligned}$$
(16)

Mill’s inequality states that for \(z>0\),

$$\begin{aligned} z \le \frac{\phi (z)}{1-\Phi (z)}\le z \frac{1+z^2}{z^2}, \end{aligned}$$

and in particular \((1+z^2)/z^2 \le 2\) when \(z>1\). Moreover, \(\phi (z)/\{ 1-\Phi (z) \}\le 1.53 \le 2 \) for \(z \in (-\infty , 1)\). Therefore,

$$\begin{aligned} \phi (z)/\{ 1-\Phi (z) \} \le 2 (z \vee 1), \ \forall z \in {\mathbb {R}}. \end{aligned}$$

Hence we conclude from this, (16), and (15) that

$$\begin{aligned} f_{p}(z) \le 2(z \vee 1) \exp \left\{ -\frac{(z-\bar{z}-a_p)_{+}^{2}}{2} \right\} , \ \forall z \in {\mathbb {R}}. \end{aligned}$$

Step 3. By Step 2, for every \(y \in {\mathbb {R}}\) and \(t > 0\), we have

$$\begin{aligned} {\text {P}}\left( | Z - y | \le t \right)&= \int \limits _{y -t}^{y+t} f_{p}(z) dz \le 2 t \max _{z \in [y -t, y+t]}f_{p}(z) \le 4 t (\bar{z}+ a_p + 1), \end{aligned}$$

where the last inequality follows from the fact that the map \(z \mapsto z e^{-(z-a)^{2}/2}\) (with \(a > 0\)) is non-increasing on \([a+1,\infty )\). Combining this inequality with Step 1, for every \(x \ge 0\) and \(\epsilon > 0\), we have

$$\begin{aligned} {\text {P}}\left( \left| \max _{1 \le j \le p}X_{j} - x \right| \le \epsilon \right) \le 4 \epsilon \left\{ \left( 1/\underline{\sigma }-1/\overline{\sigma }\right) |x|+a_p + 1 \right\} /\underline{\sigma }. \end{aligned}$$
(17)

This inequality also holds for \(x<0\) by the similar argument, and hence it holds for every \(x \in {\mathbb {R}}\).

If \(\underline{\sigma }=\overline{\sigma } =\sigma \), then we have

$$\begin{aligned} {\text {P}}\left( \left| \max _{1 \le j \le p}X_{j} - x \right| \le \epsilon \right) \le 4 \epsilon (a_p + 1)/\sigma , \ \forall x \in {\mathbb {R}}, \ \forall \epsilon > 0, \end{aligned}$$

which leads to the first assertion of the theorem.

On the other hand, consider the case where \(\underline{\sigma } < \overline{\sigma }\). Suppose first that \(0 < \epsilon \le \underline{\sigma }\). By the Gaussian concentration inequality (Lemma 7), for \(|x|\ge \epsilon +\overline{\sigma }(a_p+\sqrt{2\log (\underline{\sigma }/\epsilon )})\), we have

$$\begin{aligned}&{\text {P}}\left( \left| \max _{1 \le j \le p} X_{j}-x\right| \le \epsilon \right) \le {\text {P}}\left( \max _{1 \le j \le p}X_{j} \ge |x|-\epsilon \right) \nonumber \\&\quad \le {\text {P}}\left( \max _{1 \le j \le p}X_{j}\ge {\text {E}}\left[ \max _{1 \le j \le p}X_{j}\right] +\overline{\sigma }\sqrt{2\log (\underline{\sigma }/\epsilon )}\right) \le \epsilon /\underline{\sigma }. \end{aligned}$$
(18)

For \(|x| \le \epsilon +\overline{\sigma }(a_p+\sqrt{2\log (\underline{\sigma }/\epsilon )})\), by (17) and using \(\epsilon \le \underline{\sigma }\), we have

$$\begin{aligned}&{\text {P}}\left( \left| \max _{1 \le j \le p}X_{j} - x \right| \le \epsilon \right) \nonumber \\&\quad \le 4 \epsilon \{ (\overline{\sigma }/\underline{\sigma }) a_{p} + (\overline{\sigma }/\underline{\sigma }-1)\sqrt{2\log (\underline{\sigma }/\epsilon )} + 2 - \underline{\sigma }/\overline{\sigma }\}/\underline{\sigma }. \end{aligned}$$
(19)

Combining (18) and (19), we obtain the inequality in (ii) for \(0 < \epsilon \le \underline{\sigma }\) (with a suitable choice of \(C\)). If \(\epsilon > \underline{\sigma }\), the inequality in (ii) trivially follows by taking \(C \ge 1/\underline{\sigma }\). This completes the proof. \(\square \)

Proof of Lemma 5

Let \(M := \max _{1 \le j \le p} W_{j}\). The absolute continuity of the distribution of \(M\) is deduced from the fact that \({\text {P}}(M \in A) \le \sum _{j=1}^{p} {\text {P}}(W_{j} \in A)\) for every Borel measurable subset \(A\) of \({\mathbb {R}}\). Hence, to show that a version of the density of \(M\) is given by (13), it is enough to show that \(\lim _{\epsilon \downarrow 0} \epsilon ^{-1} {\text {P}}(x < M \le x+\epsilon )\) equals the right side on (13) for a.e. \(x \in {\mathbb {R}}\).

For every \(x \in {\mathbb {R}}\) and \(\epsilon > 0\), observe that

$$\begin{aligned}&\{ x < M \le x + \epsilon \} \\&\quad = \{ \exists i_{0}, W_{i_{0}} > x \ \text {and} \ \forall i, W_{i} \le x +\epsilon \} \\&\quad = \{ \exists i_{1}, x < W_{i_{1}} \le x + \epsilon \ \text {and} \ \forall i \ne i_{1}, W_{i} \le x \} \\&\quad \cup \{ \exists i_{1}, \exists i_{2}, x < W_{i_{1}} \le x + \epsilon , x < W_{i_{2}} \le x + \epsilon \ \text {and} \ \forall i \notin \{ i_{1},i_{2} \}, W_{i} \le x \} \\&\qquad \qquad \vdots \\&\quad \cup \{ \forall i, x < W_{i} \le x + \epsilon \} \\&\quad {=:} A^{x,\epsilon }_{1} \cup A^{x,\epsilon }_{2} \cup \cdots \cup A^{x,\epsilon }_{p}. \end{aligned}$$

Note that the events \(A^{x,\epsilon }_{1},A^{x,\epsilon }_{2},\dots ,A^{x,\epsilon }_{p}\) are disjoint. For \(A^{x,\epsilon }_{1}\), since

$$\begin{aligned} A^{x,\epsilon }_{1} = \mathop {\cup }_{i=1}^{p} \{ x < W_{i} \le x+\epsilon \ \text {and} \ W_{j} \le x, \forall j \ne i \}, \end{aligned}$$

where the events on the right side are disjoint, we have

$$\begin{aligned} {\text {P}}(A^{x,\epsilon }_{1})&= \sum _{i=1}^{p} {\text {P}}(x < W_{i} \le x+\epsilon \ \text {and} \ W_{j} \le x, \forall j \ne i) \\&= \sum _{i=1}^{p} \int \limits _{x}^{x+\epsilon } {\text {P}}(W_{j} \le x, \forall j \ne i \mid W_{i} = u) \phi (u-\mu _{i}) du, \end{aligned}$$

where \(\mu _{i} := {\text {E}}[ W_{i} ]\). We show that for every \(1 \le i \le p\) and a.e. \(x \in {\mathbb {R}}\), the map \(u \mapsto {\text {P}}(W_{j} \le x, \forall j \ne i \mid W_{i} = u)\) is right continuous at \(x\). Let \(X_{j}=W_{j}-\mu _{j}\) so that \(X_{j}\) are standard Gaussian random variables. Then

$$\begin{aligned} {\text {P}}(W_{j} \le x, \forall j \ne i \mid W_{i} = u)={\text {P}}(X_{j} \le x-\mu _{j}, \forall j \ne i \mid X_{i} = u-\mu _{i}). \end{aligned}$$

Pick \(i=1\). Let \(V_{j} = X_{j} - {\text {E}}[ X_{j} X_{1}] X_{1}\) be the residual from the orthogonal projection of \(X_{j}\) on \(X_{1}\). Note that the vector \((V_{j})_{2 \le j \le p}\) and \(X_{1}\) are jointly Gaussian and uncorrelated, and hence independent, by which we have

$$\begin{aligned}&{\text {P}}(X_{j} \le x-\mu _{j}, 2 \le \forall j \le p \mid X_{1} = u-\mu _{1}) \\&\quad = {\text {P}}(V_{j} \le x-\mu _{j}-{\text {E}}[ X_{j} X_{1}] (u-\mu _{1}), 2 \le \forall j \le p \mid X_{1} = u-\mu _{1}) \\&\quad = {\text {P}}(V_{j} \le x-\mu _{j}-{\text {E}}[ X_{j} X_{1}] (u-\mu _{1}), 2 \le \forall j \le p). \end{aligned}$$

Define \(J := \{ j \in \{ 2,\dots ,p \} : {\text {E}}[ X_{j} X_{1}] \le 0 \}\) and \(J^{c} := \{ 2,\dots , p\} \backslash J\). Then

$$\begin{aligned}&{\text {P}}(V_{j} \le x-\mu _{j}-{\text {E}}[ X_{j} X_{1}] (u-\mu _{1}), 2 \le \forall j \le p) \\&\quad \rightarrow {\text {P}}( V_{j} \le x_j, \forall j \in J, V_{j'} < x_{j'}, \forall j' \in J^{c}), \ \text {as} \ u \downarrow x, \end{aligned}$$

where \(x_j=x-\mu _{j}-{\text {E}}[X_{j}X_1](x-\mu _{1})\). Here each \(V_{j}\) either degenerates to \(0\) (which occurs only when \(X_{j}\) and \(X_{1}\) are perfectly negatively correlated, that is, \({\text {E}}[ X_{j} X_{1} ] = -1\)) or has a non-degenerate Gaussian distribution, and hence for every \(x \in {\mathbb {R}}\) except for at most \((p-1)\) points \((\mu _{1}+\mu _{j})/2, 2 \le j \le p\),

$$\begin{aligned} {\text {P}}( V_{j} \le x_j, \forall j \in J, V_{j'} < x_{j'}, \forall j' \in J^{c})&= {\text {P}}( V_{j} \le x_{j}, 2 \le \forall j \le p ) \\&= {\text {P}}(W_{j} \le x, 2 \le \forall j \le p \mid W_{1} = x). \end{aligned}$$

Hence for \(i=1\) and a.e. \(x \in {\mathbb {R}}\), the map \(u \mapsto {\text {P}}(W_{i} \le x, \forall j \ne i \mid W_{i} = u)\) is right continuous at \(x\). The same conclusion clearly holds for \(2 \le i \le p\). Therefore, we conclude that, for a.e. \(x \in {\mathbb {R}}\), as \(\epsilon \downarrow 0\),

$$\begin{aligned} \frac{1}{\epsilon } {\text {P}}(A_{1}^{x,\epsilon })&\rightarrow \sum _{i=1}^{p} {\text {P}}(W_{j} \le x, \forall j \ne i \mid W_{i} = x)\phi (x-\mu _{i})\\&= \phi (x)\sum _{i=1}^{p} e^{\mu _{i}x-\mu _{i}^2/2} {\text {P}}(W_{j} \le x, \forall j \ne i \mid W_{i} = x). \end{aligned}$$

In the rest of the proof, we show that, for every \(2 \le i \le p\) and \(x \in {\mathbb {R}}\), \({\text {P}}(A_{i}^{x,\epsilon }) = o(\epsilon )\) as \(\epsilon \downarrow 0\), which leads to the desired conclusion. Fix any \(2 \le i \le p\). The probability \({\text {P}}(A_{i}^{x,\epsilon })\) is bounded by a sum of terms of the form \({\text {P}}(x < W_{j} \le x+\epsilon , x < W_{k} \le x+\epsilon )\) with \(j \ne k\). Recall that \({{\mathrm{Corr}}}(W_{j},W_{k})<1\). Assume that \({{\mathrm{Corr}}}(W_{j},W_{k})=-1\). Then for every \(x \in {\mathbb {R}}\), \({\text {P}}(x < W_{j} \le x+\epsilon , x < W_{k} \le x+\epsilon )\) is zero for sufficiently small \(\epsilon \). Otherwise, \((W_{j},W_{k})^{T}\) obeys a two-dimensional, non-degenerate Gaussian distribution and hence \({\text {P}}(x < W_{j} \le x+\epsilon , x < W_{k} \le x+\epsilon ) = O(\epsilon ^{2}) = o(\epsilon )\) as \(\epsilon \downarrow 0\) for every \(x \in {\mathbb {R}}\). This completes the proof.

Proof of Lemma 6

Since \({\text {E}}[ W_{0} ] \ge 0\), the map \(x \mapsto \exp \{ \mathrm {E}[W_{0}]x - (\mathrm {E}[W_{0}])^{2}/2 \}\) is non-decreasing. Thus it suffices to show that the map

$$\begin{aligned} x \mapsto {\text {P}}( W_{1} \le x, \dots , W_{p} \le x \mid W_{0} =x) \end{aligned}$$
(20)

is non-decreasing. As in the proof of Lemma 5, let \(X_{j}=W_{j}-{\text {E}}[ W_{j} ]\) and let \(V_{j} = X_{j} - {\text {E}}[ X_{j} X_{0}] X_{0}\) be the residual from the orthogonal projection of \(X_{j}\) on \(X_{0}\). Note that the vector \((V_{j})_{1 \le j \le p}\) and \(X_{0}\) are independent. Hence the probability in (20) equals

$$\begin{aligned}&{\text {P}}(V_{j} \le x-\mu _{j} - {\text {E}}[X_{j}X_{0}](x-{\text {E}}[ W_{0} ] ), 1 \le \forall j \le p \mid X_{0}=x-{\text {E}}[ W_{0} ])\\&\quad ={\text {P}}(V_{j} \le x-\mu _{j} - {\text {E}}[X_{j}X_{0}](x-{\text {E}}[ W_{0} ]), 1 \le \forall j \le p), \end{aligned}$$

where the latter is non-decreasing in \(x\) on \({\mathbb {R}}\) since \({\text {E}}[X_{j}X_{0}] \le 1\).