Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions

doi:10.1016/j.jeconom.2015.03.010

Journal of Econometrics

Volume 188, Issue 2, October 2015, Pages 447-465

https://doi.org/10.1016/j.jeconom.2015.03.010 Get rights and content

Abstract

We show that spline and wavelet series regression estimators for weakly dependent regressors attain the optimal uniform (i.e. sup-norm) convergence rate ${(n / log n)}^{- p / (2 p + d)}$ of Stone (1982), where $d$ is the number of regressors and $p$ is the smoothness of the regression function. The optimal rate is achieved even for heavy-tailed martingale difference errors with finite $(2 + (d / p))$ th absolute moment for $d / p < 2$ . We also establish the asymptotic normality of $t$ statistics for possibly nonlinear, irregular functionals of the conditional mean function under weak conditions. The results are proved by deriving a new exponential inequality for sums of weakly dependent random matrices, which is of independent interest.

Introduction

We study the nonparametric regression model $\begin{matrix} Y_{i} = h_{0} (X_{i}) + ϵ_{i} \\ E [ϵ_{i} | X_{i}] = 0 \end{matrix}$ where $Y_{i} \in R$ is a scalar response variable, $X_{i} \in X \subseteq R^{d}$ is a $d$ -dimensional regressor (predictor variable), and the conditional mean function $h_{0} (x) = E [Y_{i} | X_{i} = x]$ belongs to a Hölder space of smoothness $p > 0$ . We are interested in series least squares (LS) estimation¹ of $h_{0}$ under sup-norm loss and inference on possibly nonlinear functionals of $h_{0}$ allowing for weakly dependent regressors and heavy-tailed errors $ϵ_{i}$ .²

For i.i.d. data, Stone (1982) shows that ${(n / log n)}^{- p / (2 p + d)}$ is the minimax lower bound in sup-norm risk for estimation of $h_{0}$ over a Hölder ball of smoothness $p > 0$ . For strictly stationary beta-mixing regressors, we show that spline and wavelet series LS estimators $\hat{h}$ of $h_{0}$ attain the optimal uniform rate of Stone (1982) under a mild unconditional moment condition $E [{| ϵ_{i} |}^{2 + (d / p)}] < \infty$ imposed on the martingale difference errors.

More generally, we assume the error process ${ϵ_{i}}_{i = - \infty}^{\infty}$ is a martingale difference sequence but impose no explicit weak dependence condition on the regressor process ${X_{i}}_{i = - \infty}^{\infty}$ . Rather, weak dependence of the regressor process is formulated in terms of convergence of a certain random matrix. We verify this condition for absolutely regular (beta-mixing) sequences by deriving a new exponential inequality for sums of weakly dependent random matrices. This new inequality then leads to a sharp upper bound on the sup-norm variance term of series LS estimators with an arbitrary basis. When combined with a general upper bound on the sup-norm bias term of series LS estimators, the sharp sup-norm variance bound immediately leads to a general upper bound on the sup-norm convergence rate of series LS estimators with an arbitrary basis and weakly dependent data.

In our sup-norm bias and variance decomposition of series LS estimators, the bound on the sup-norm bias term depends on the sup norm of the empirical $L^{2}$ projection onto the linear sieve space. The sup norm of the empirical $L^{2}$ projection varies with the choice of the (linear sieve) basis. For spline regression with i.i.d. data, Huang (2003b) shows that the sup norm of the empirical $L^{2}$ projection onto splines is bounded with probability approaching one (wpa1). Using our new exponential inequality for sums of weakly dependent random matrices, his bound is easily extended to spline regression with weakly dependent regressors. In addition, we show in Theorem 5.2 that, for either i.i.d. or weakly dependent regressors, the sup norm of the empirical $L^{2}$ projection onto compactly supported wavelet bases is also bounded wpa1 (this property is called sup-norm stability of empirical $L^{2}$ projection). These tight bounds lead to sharp sup-norm bias control for spline and wavelet series LS estimators. They allow us to show that spline and wavelet series LS estimators achieve the optimal sup-norm convergence rate with weakly dependent data and heavy-tailed errors (e.g., $E [ϵ_{i}^{4}] = \infty$ is allowed).

Sup-norm (uniform) convergence rates of series LS estimators have been studied previously by Newey (1997) and de Jong (2002) for i.i.d. data and Lee and Robinson (2013) for spatially dependent data. But the uniform convergence rates obtained in these papers are slower than the optimal rate of Stone (1982).³ In an important paper on series LS regression with i.i.d. data, Belloni et al. (2014) establish the attainability of the optimal sup-norm rates of series LS estimators using spline, local polynomial partition, wavelet and other series possessing the property of sup-norm stability of the $L^{2}$ projection (or bounded Lebesgue constant) under the conditional moment condition ${sup}_{x} E [{| ϵ_{i} |}^{2 + δ} | X_{i} = x] < \infty$ for some $δ > d / p$ . Our Theorem 5.1 on the sup-norm stability of $L^{2}$ projection of the wavelet basis is used by Belloni et al. (2014) to show that the wavelet series LS estimator achieves the optimal sup-norm rate under their conditional moment requirement. Our paper contributes to the literature by showing that spline and wavelet series LS estimators attain the optimal sup-norm rate with strictly stationary beta-mixing regressors under the weaker unconditional moment requirement $E [{| ϵ_{i} |}^{2 + (d / p)}] < \infty$ .

As another application of our new exponential inequality, under very weak conditions we obtain sharp $L^{2}$ convergence rates for series LS estimators with weakly dependent regressors. For example, under the minimal bounded conditional second moment restriction ( ${sup}_{x} E [{| ϵ_{i} |}^{2} | X_{i} = x] < \infty$ ), our $L^{2}$ -norm rates for trigonometric polynomial, spline or wavelet series LS estimators attain Stone (1982)’s optimal $L^{2}$ -norm rate of $n^{- p / (2 p + d)}$ with strictly stationary, exponentially beta-mixing (respectively algebraically beta-mixing at rate $γ$ ) regressors with $p > 0$ (resp. $p > d / (2 γ)$ ), while the power series LS estimator attains the same optimal rate with exponentially (resp. algebraically) beta-mixing regressors for $p > d / 2$ (resp. $p > d (2 + γ) / (2 γ)$ ). It is interesting to note that for a smooth conditional mean function, we obtain the optimal $L^{2}$ convergence rates for these commonly used series LS estimators with weakly dependent regressors without requiring the existence of higher-than-second unconditional moments of the error terms. Previously, Newey (1997) derived the optimal $L^{2}$ convergence rates of series LS estimator under i.i.d. data and the restriction of $K^{2} / n = o (1)$ for spline and trigonometric series (and $K^{3} / n = o (1)$ for power series), where $K$ is the series number of terms. The restriction on $K$ is relaxed to $K (log K) / n = o (1)$ in Huang (2003a) for splines and in Belloni et al. (2014) for wavelets, trigonometric and other series under i.i.d. data. We show that the optimal $L^{2}$ convergence rates are still attainable for splines, wavelets, trigonometric and other series under exponentially beta-mixing and $K {(log K)}^{2} / n = o (1)$ .

We also show that feasible asymptotic inference can be performed on a possibly nonlinear functional $f (h_{0})$ using the plug-in series LS estimator $f (\hat{h})$ . We establish the asymptotic normality of $f (\hat{h})$ and of the corresponding Student t statistic for weakly dependent data under mild low-level conditions. When specializing to general irregular (i.e., slower than $\sqrt{n}$ -estimable) but sup-norm bounded linear functionals of spline or wavelet series LS estimators with i.i.d. data, we obtain the asymptotic normality of $f (\hat{h})$ $\frac{\sqrt{n} (f (\hat{h}) - f (h_{0}))}{V_{K}^{1 / 2}} \to_{d} N (0, 1)$ under remarkably mild conditions of (1) uniform integrability ( ${sup}_{x \in X} E [ϵ_{i}^{2} {| ϵ_{i} | > ℓ (n)} | X_{i} = x] \to 0$ for any $ℓ (n) \to \infty$ as $n \to \infty$ ), and (2) $K^{- p / d} \sqrt{n / V_{K}} = o (1)$ , $(K log K) / n = o (1)$ , where $V_{K}$ is the sieve variance that grows with $K$ for irregular functionals. These conditions coincide with the weakest known conditions in Huang (2003b) for the pointwise asymptotic normality of spline LS estimators, except we also allow for other irregular linear functionals of spline or wavelet LS estimators.⁴ When specializing to general irregular but sup-norm bounded nonlinear functionals of spline or wavelet series LS estimators with i.i.d. data, we obtain asymptotic normality of $f (\hat{h})$ (and of its t statistic) under conditions (1) and (3) $K^{- p / d} \sqrt{n / V_{K}} = o (1)$ , $K^{(2 + δ) / δ} (log n) / n ≲ 1$ (and $K^{(2 + δ) / δ} (log n) / n = o (1)$ for the t statistic) for $δ \in (0, 2)$ such that $E [{| ϵ_{i} |}^{2 + δ}] < \infty$ . These conditions are much weaker than the well-known conditions in Newey (1997) for the asymptotic normality of a nonlinear functional and its t statistic of spline LS estimator, namely $K^{- p / d} \sqrt{n} = o (1)$ , $K^{4} / n = o (1)$ and ${sup}_{x} E [{| ϵ_{i} |}^{4} | X_{i} = x] < \infty$ . Moreover, under a slightly more restrictive growth condition on $K$ but without the need to increase $δ$ , we show that our mild sufficient conditions for the i.i.d. case extend naturally to the weakly dependent case. Our conditions for the weakly dependent case relax the higher-order-moment requirement in Chen et al. (2014) for sieve $t$ inference on nonlinear functionals of linear sieve time series LS regressions.

Our paper improves upon the existing results on the asymptotic normality of $t$ statistics of possibly nonlinear functionals of a series LS estimator under dependent data by allowing for heavy-tailed errors $ϵ_{i}$ and relaxing the growth rates of the series term $K$ but maintaining the bounded conditional error variance assumption. For i.i.d. data, Hansen (2014) derives pointwise asymptotic normality for linear functionals of a series LS estimator allowing for unbounded regressors, unbounded conditional error variance, but requiring $E [{| ϵ_{i} |}^{4 + η}] < \infty$ for some $η > 0$ . In addition to pointwise limiting distribution results, Belloni et al. (2014) also provide uniform limit theory and uniform confidence intervals for linear functionals of a series LS estimator with i.i.d. data.

Our paper, Belloni et al. (2014) and Hansen (2014) all employ tools from recent random matrix theory to derive various new results for series LS estimation. Belloni et al. (2014) are the first to apply the non-commutative Khinchin random matrix inequality for i.i.d. data. Instead, we apply an exponential inequality for sums of independent random matrices due to Tropp (2012). Our results for series LS with weakly dependent data rely crucially on our extension of Tropp’s matrix exponential inequality from i.i.d. data to weakly dependent data. See Hansen (2014) for other applications of Tropp’s inequality to series LS estimators with i.i.d. data.

Since economic and financial time series data often have infinite forth moments, the new improved rates and inference results in our paper should be very useful to the literatures on nonparametric estimation and testing of nonlinear time series models (see, e.g., Robinson, 1989, Li et al., 2003, Fan and Yao, 2003, Chen, 2013). Moreover, our new exponential inequality for sums of weakly dependent random matrices should be useful in series LS estimation of spatially dependent models and in other contexts as well.⁵

The rest of the paper is organized as follows. Section 2 first derives general upper bounds on the sup-norm convergence rates of series LS estimators with an arbitrary basis. It then shows that spline and wavelet series LS estimators attain the optimal sup-norm rates, allowing for weakly dependent data and heavy tailed error terms. It also presents general sharp $L^{2}$ -norm convergence rates of series LS estimators with an arbitrary basis under very mild conditions. Section 3 provides the asymptotic normality of sieve $t$ statistics for possibly nonlinear functionals of $h_{0}$ . Section 4 provides new exponential inequalities for sums of weakly dependent random matrices, and a reinterpretation of equivalence of the theoretical and empirical $L^{2}$ norms as a criterion regarding convergence of a certain random matrix. Section 5 shows the sup-norm stability of the empirical $L^{2}$ projections onto compactly supported wavelet bases, which provides a tight upper bound on the sup-norm bias term for the wavelet series LS estimator. The results in Sections 4 Useful results on random matrices, 5 Sup-norm stability of are of independent interest. Section 6 contains a brief review of spline and wavelet sieve bases. Proofs and ancillary results are presented in Section 7.

Notation: Let $λ_{\min} (\cdot)$ and $λ_{\max} (\cdot)$ denote the smallest and largest eigenvalues, respectively, of a matrix. The exponent $^{-}$ denotes the Moore–Penrose generalized inverse. $‖ \cdot ‖$ denotes the Euclidean norm when applied to vectors and the matrix spectral norm (i.e., largest singular value) when applied to matrices, and ${‖ \cdot ‖}_{ℓ^{p}}$ denotes the $ℓ^{p}$ norm when applied to vectors and its induced operator norm when applied to matrices (thus $‖ \cdot ‖ = {‖ \cdot ‖}_{ℓ^{2}}$ ). If ${a_{n} : n \geq 1}$ and ${b_{n} : n \geq 1}$ are two sequences of non-negative numbers, $a_{n} ≲ b_{n}$ means there exists a finite positive $C$ such that $a_{n} \leq C b_{n}$ for all $n$ sufficiently large, and $a_{n} ≍ b_{n}$ means $a_{n} ≲ b_{n}$ and $b_{n} ≲ a_{n}$ . $# S$ denotes the cardinality of a set $S$ of finitely many elements. Given a strictly stationary process ${X_{i}}$ and $1 \leq p < \infty$ , we let $L^{p} (X)$ denote the function space consisting of all (equivalence classes) of measurable functions $f$ for which the $L^{p} (X)$ norm ${‖ f ‖}_{L^{p} (X)} \equiv E {[{| f (X_{i}) |}^{p}]}^{1 / p}$ is finite, and we let $L^{\infty} (X)$ denote the space of bounded functions under the sup norm ${‖ \cdot ‖}_{\infty}$ , i.e., if $f : X \to R$ then ${‖ f ‖}_{\infty} \equiv {sup}_{x \in X} | f (x) |$ .

Section snippets

Uniform convergence rates

In this section we present some general results on uniform convergence properties of nonparametric series LS estimators with weakly dependent data.

Inference on possibly nonlinear functionals

We now study inference on possibly nonlinear functionals $f : L^{2} (X) \cap L^{\infty} (X) \to R$ of the regression function $h_{0}$ . Examples of functionals include, but are not limited to, the pointwise evaluation functional, the partial mean functional, and consumer surplus (see, e.g., Newey, 1997, for examples). The functional $f (h_{0})$ may be estimated using the plug-in series LS estimator $f (\hat{h})$ , for which we now establish feasible limit theory.

As with Newey (1997) and Chen et al. (2014), our results allow researchers

An exponential inequality for sums of weakly dependent random matrices

In this section we derive a new Bernstein-type inequality for sums of random matrices formed from absolutely regular ( $β$ -mixing) sequences, where the dimension, norm, and variance measure of the random matrices are allowed to grow with the sample size. This inequality is particularly useful for establishing sharp convergence rates for semi/nonparametric sieve estimators with weakly dependent data. We first recall an inequality of Tropp (2012) for independent random matrices.

Theorem 4.1

Tropp, 2012

Let ${Ξ_{i}}_{i = 1}^{n}$ be a

Sup-norm stability of $L^{2} (X)$ projection onto wavelet sieves

In this section we show that the $L^{2} (X)$ orthogonal projection onto (tensor product) compactly supported wavelet bases is stable in sup norm as the dimension of the space increases. Consider the orthogonal projection operator $P_{K}$ defined in expression (16) where the elements of $b^{K}$ span the tensor products of $d$ univariate wavelet spaces $Wav (K_{0}, [0, 1])$ . We show that its $L^{\infty}$ operator norm ${‖ P_{K} ‖}_{\infty}$ (see expression (17)) is stable, in the sense that ${‖ P_{K} ‖}_{\infty} ≲ 1$ as $K \to \infty$ . We also show that the empirical $L^{2}$

Brief review of B-spline and wavelet sieve spaces

We first outline univariate B-spline and wavelet sieve spaces on $[0, 1]$ , then deal with the multivariate case by constructing a tensor-product sieve basis.

B-splines B-splines are defined by their order $r \geq 1$ (or degree $r - 1 \geq 0$ ) and number of interior knots $m \geq 0$ . Define the knot set $0 = t_{- (r - 1)} = \dots = t_{0} \leq t_{1} \leq \dots \leq t_{m} \leq t_{m + 1} = \dots = t_{m + r} = 1 .$ We generate a $L^{\infty}$ -normalized B-spline basis recursively using the De Boor relation (see, e.g., Chapter 5 of DeVore and Lorentz, 1993) then appropriately rescale the basis functions.

Proofs for Section 2

Proof of Lemma 2.1

Follows from Corollary 4.1 by setting $Ξ_{i, n} = n^{- 1} ({\tilde{b}}_{w}^{K} (X_{i}) {\tilde{b}}_{w}^{K} {(X_{i})}^{'} - I_{K})$ and noting that $R_{n} \leq n^{- 1} (ζ_{K, n}^{2} λ_{K, n}^{2} + 1)$ , and $σ_{n}^{2} \leq n^{- 1} (ζ_{K, n}^{2} λ_{K, n}^{2} + 1)$ . ■

Proof of Lemma 2.2

Follows from Corollary 4.2 by setting $Ξ_{i, n} = n^{- 1} ({\tilde{b}}_{w}^{K} (X_{i}) {\tilde{b}}_{w}^{K} {(X_{i})}^{'} - I_{K})$ and noting that $R_{n} \leq n^{- 1} (ζ_{K, n}^{2} λ_{K, n}^{2} + 1)$ , and $σ_{n}^{2} \leq n^{- 2} (ζ_{K, n}^{2} λ_{K, n}^{2} + 1)$ . ■

Proof of Lemma 2.3

By rotational invariance, we may rescale $\hat{h}$ and $\tilde{h}$ to yield $\hat{h} (x) - \tilde{h} (x) = {\tilde{b}}_{w}^{K} {(x)}^{'} {({\tilde{B}}_{w}^{'} {\tilde{B}}_{w} / n)}^{-} {\tilde{B}}_{w}^{'} e / n$ where $e = {(ϵ_{1}, \dots, ϵ_{n})}^{'}$ .

Let $\overset{̌}{h} = \hat{h} - \tilde{h}$ to simplify notation. By the mean value theorem, Assumption 3(i) and 4 (i)(iii), for any $(x, x^{*}) \in D_{n}^{2}$

References (45)

M.D. Cattaneo et al.
Optimal convergence rates, bahadur representation, and asymptotic normality of partitioning estimators
J. Econometrics
(2013)
X. Chen
Large sample sieve estimation of semi-nonparametric models
X. Chen et al.
Sieve m inference on irregular parameters
J. Econometrics
(2014)
X. Chen et al.
Sieve inference on possibly misspecified semi-nonparametric time series models
J. Econometrics
(2014)
A. Cohen et al.
Wavelets on the interval and fast wavelet transforms
Appl. Comput. Harmon. Anal.
(1993)
R.M. de~Jong
A note on convergence rates and asymptotic normality for series estimators: Uniform convergence rates
J. Econometrics
(2002)
J.Z. Huang
Asymptotics for polynomial spline regression under weak conditions
Statist. Probab. Lett.
(2003)
Q. Li et al.
Consistent specifiation tests for semiparametric/nonparametric models based on series estimation methods
J. Econometrics
(2003)
W.K. Newey
Convergence rates and asymptotic normality for series estimators
J. Econometrics
(1997)
D.W.~K. Andrews
Asymptotic normality of series estimators for nonparametric and semiparametric regression models
Econometrica
(1991)

Belloni, A., Chernozhukov, V., Chetverikov, D., Kato, K., 2014, Some new asymptotic theory for least squares series:...

H. Berbee

Convergece rates in the strong law for bounded mixing sequences

Probab. Theory Related Field

(1987)

R.C. Bradley

Basic properties of strong mixing conditions. a survey and some open questions

Probab. Surv.

(2005)

X. Chen

Penalized sieve estimation and inference of semi-nonparametric dynamic models: A selective review

Chen, X., Christensen, T.M., 2013, Optimal Uniform Convergence Rates for Sieve Nonparametric Instrumental Variables...

X. Chen et al.

Sup Norm Convergence Rate and Asymptotic Normality for a Class of Linear Sieve Estimators, Tech. Report

(2003)

X. Chen et al.

Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals

Econometrica

(2012)

X. Chen et al.

On rate optimality for ill-posed inverse problems in econometrics

Econometric Theory

(2011)

X. Chen et al.

Sieve extremum estimates for weakly dependent data

Econometrica

(1998)

C. De Boor

A Practical Guide to Splines

(2001)

S. Demko et al.

Decay rates for inverses of band matrices

Math. Comp.

(1984)

R.A. DeVore et al.

Constructive Approximation

(1993)

Cited by (0)

^☆: We thank the guest coeditor, two anonymous referees, Bruce Hansen, Jianhua Huang, Stefan Schneeberger, and conference participants of SETA2013 in Seoul and AMES2013 in Singapore for useful comments. This paper is an expanded version of Sections 4 and 5 of our Cowles Foundation Discussion Paper No. 1923 (Chen and Christensen, 2013) the remaining parts of which are currently undergoing a major revision. Support from the Cowles Foundation is gratefully acknowledged. Any errors are the responsibility of the authors.

View full text

Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions☆

Abstract

Introduction

Section snippets

Uniform convergence rates

Inference on possibly nonlinear functionals

An exponential inequality for sums of weakly dependent random matrices

Tropp, 2012

Sup-norm stability of L2(X) projection onto wavelet sieves

Brief review of B-spline and wavelet sieve spaces

Proofs for Section 2

J. Econometrics

J. Econometrics

J. Econometrics

Appl. Comput. Harmon. Anal.

J. Econometrics

Statist. Probab. Lett.

J. Econometrics

J. Econometrics

Asymptotic normality of series estimators for nonparametric and semiparametric regression models

Econometrica

Convergece rates in the strong law for bounded mixing sequences

Probab. Theory Related Field

Basic properties of strong mixing conditions. a survey and some open questions

Probab. Surv.

Penalized sieve estimation and inference of semi-nonparametric dynamic models: A selective review

Sup Norm Convergence Rate and Asymptotic Normality for a Class of Linear Sieve Estimators, Tech. Report

Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals

Econometrica

On rate optimality for ill-posed inverse problems in econometrics

Econometric Theory

Sieve extremum estimates for weakly dependent data

Econometrica

A Practical Guide to Splines

Decay rates for inverses of band matrices

Math. Comp.

Constructive Approximation

Sup-norm stability of $L^{2} (X)$ projection onto wavelet sieves