Elsevier

Journal of Econometrics

Volume 188, Issue 2, October 2015, Pages 447-465
Journal of Econometrics

Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions

https://doi.org/10.1016/j.jeconom.2015.03.010Get rights and content

Abstract

We show that spline and wavelet series regression estimators for weakly dependent regressors attain the optimal uniform (i.e. sup-norm) convergence rate (n/logn)p/(2p+d) of Stone (1982), where d is the number of regressors and p is the smoothness of the regression function. The optimal rate is achieved even for heavy-tailed martingale difference errors with finite (2+(d/p))th absolute moment for d/p<2. We also establish the asymptotic normality of t statistics for possibly nonlinear, irregular functionals of the conditional mean function under weak conditions. The results are proved by deriving a new exponential inequality for sums of weakly dependent random matrices, which is of independent interest.

Introduction

We study the nonparametric regression model Yi=h0(Xi)+ϵiE[ϵi|Xi]=0 where YiR is a scalar response variable, XiXRd is a d-dimensional regressor (predictor variable), and the conditional mean function h0(x)=E[Yi|Xi=x] belongs to a Hölder space of smoothness p>0. We are interested in series least squares (LS) estimation1 of h0 under sup-norm loss and inference on possibly nonlinear functionals of h0 allowing for weakly dependent regressors and heavy-tailed errors ϵi.2

For i.i.d. data,  Stone (1982) shows that (n/logn)p/(2p+d) is the minimax lower bound in sup-norm risk for estimation of h0 over a Hölder ball of smoothness p>0. For strictly stationary beta-mixing regressors, we show that spline and wavelet series LS estimators ĥ of h0 attain the optimal uniform rate of  Stone (1982) under a mild unconditional moment condition E[|ϵi|2+(d/p)]< imposed on the martingale difference errors.

More generally, we assume the error process {ϵi}i= is a martingale difference sequence but impose no explicit weak dependence condition on the regressor process {Xi}i=. Rather, weak dependence of the regressor process is formulated in terms of convergence of a certain random matrix. We verify this condition for absolutely regular (beta-mixing) sequences by deriving a new exponential inequality for sums of weakly dependent random matrices. This new inequality then leads to a sharp upper bound on the sup-norm variance term of series LS estimators with an arbitrary basis. When combined with a general upper bound on the sup-norm bias term of series LS estimators, the sharp sup-norm variance bound immediately leads to a general upper bound on the sup-norm convergence rate of series LS estimators with an arbitrary basis and weakly dependent data.

In our sup-norm bias and variance decomposition of series LS estimators, the bound on the sup-norm bias term depends on the sup norm of the empirical L2 projection onto the linear sieve space. The sup norm of the empirical L2 projection varies with the choice of the (linear sieve) basis. For spline regression with i.i.d. data,  Huang (2003b) shows that the sup norm of the empirical L2 projection onto splines is bounded with probability approaching one (wpa1). Using our new exponential inequality for sums of weakly dependent random matrices, his bound is easily extended to spline regression with weakly dependent regressors. In addition, we show in Theorem 5.2 that, for either i.i.d. or weakly dependent regressors, the sup norm of the empirical L2 projection onto compactly supported wavelet bases is also bounded wpa1 (this property is called sup-norm stability of empirical L2 projection). These tight bounds lead to sharp sup-norm bias control for spline and wavelet series LS estimators. They allow us to show that spline and wavelet series LS estimators achieve the optimal sup-norm convergence rate with weakly dependent data and heavy-tailed errors (e.g.,  E[ϵi4]= is allowed).

Sup-norm (uniform) convergence rates of series LS estimators have been studied previously by  Newey (1997) and  de Jong (2002) for i.i.d. data and  Lee and Robinson (2013) for spatially dependent data. But the uniform convergence rates obtained in these papers are slower than the optimal rate of  Stone (1982).3 In an important paper on series LS regression with i.i.d. data,  Belloni et al. (2014) establish the attainability of the optimal sup-norm rates of series LS estimators using spline, local polynomial partition, wavelet and other series possessing the property of sup-norm stability of the L2 projection (or bounded Lebesgue constant) under the conditional moment condition supxE[|ϵi|2+δ|Xi=x]< for some δ>d/p. Our Theorem 5.1 on the sup-norm stability of L2 projection of the wavelet basis is used by  Belloni et al. (2014) to show that the wavelet series LS estimator achieves the optimal sup-norm rate under their conditional moment requirement. Our paper contributes to the literature by showing that spline and wavelet series LS estimators attain the optimal sup-norm rate with strictly stationary beta-mixing regressors under the weaker unconditional moment requirement E[|ϵi|2+(d/p)]<.

As another application of our new exponential inequality, under very weak conditions we obtain sharp L2 convergence rates for series LS estimators with weakly dependent regressors. For example, under the minimal bounded conditional second moment restriction (supxE[|ϵi|2|Xi=x]<), our L2-norm rates for trigonometric polynomial, spline or wavelet series LS estimators attain  Stone (1982)’s optimal L2-norm rate of np/(2p+d) with strictly stationary, exponentially beta-mixing (respectively algebraically beta-mixing at rate γ) regressors with p>0 (resp. p>d/(2γ)), while the power series LS estimator attains the same optimal rate with exponentially (resp. algebraically) beta-mixing regressors for p>d/2 (resp. p>d(2+γ)/(2γ)). It is interesting to note that for a smooth conditional mean function, we obtain the optimal L2 convergence rates for these commonly used series LS estimators with weakly dependent regressors without requiring the existence of higher-than-second unconditional moments of the error terms. Previously,  Newey (1997) derived the optimal L2 convergence rates of series LS estimator under i.i.d. data and the restriction of K2/n=o(1) for spline and trigonometric series (and K3/n=o(1) for power series), where K is the series number of terms. The restriction on K is relaxed to K(logK)/n=o(1) in  Huang (2003a) for splines and in  Belloni et al. (2014) for wavelets, trigonometric and other series under i.i.d. data. We show that the optimal L2 convergence rates are still attainable for splines, wavelets, trigonometric and other series under exponentially beta-mixing and K(logK)2/n=o(1).

We also show that feasible asymptotic inference can be performed on a possibly nonlinear functional f(h0) using the plug-in series LS estimator f(ĥ). We establish the asymptotic normality of f(ĥ) and of the corresponding Student t statistic for weakly dependent data under mild low-level conditions. When specializing to general irregular (i.e., slower than n-estimable) but sup-norm bounded linear functionals of spline or wavelet series LS estimators with i.i.d. data, we obtain the asymptotic normality of f(ĥ)n(f(ĥ)f(h0))VK1/2dN(0,1) under remarkably mild conditions of (1) uniform integrability (supxXE[ϵi2{|ϵi|>(n)}|Xi=x]0 for any (n) as n), and (2) Kp/dn/VK=o(1), (KlogK)/n=o(1), where VK is the sieve variance that grows with K for irregular functionals. These conditions coincide with the weakest known conditions in  Huang (2003b) for the pointwise asymptotic normality of spline LS estimators, except we also allow for other irregular linear functionals of spline or wavelet LS estimators.4 When specializing to general irregular but sup-norm bounded nonlinear functionals of spline or wavelet series LS estimators with i.i.d. data, we obtain asymptotic normality of f(ĥ) (and of its t statistic) under conditions (1) and (3) Kp/dn/VK=o(1), K(2+δ)/δ(logn)/n1 (and K(2+δ)/δ(logn)/n=o(1) for the t statistic) for δ(0,2) such that E[|ϵi|2+δ]<. These conditions are much weaker than the well-known conditions in  Newey (1997) for the asymptotic normality of a nonlinear functional and its t statistic of spline LS estimator, namely Kp/dn=o(1), K4/n=o(1) and supxE[|ϵi|4|Xi=x]<. Moreover, under a slightly more restrictive growth condition on K but without the need to increase δ, we show that our mild sufficient conditions for the i.i.d. case extend naturally to the weakly dependent case. Our conditions for the weakly dependent case relax the higher-order-moment requirement in  Chen et al. (2014) for sieve t inference on nonlinear functionals of linear sieve time series LS regressions.

Our paper improves upon the existing results on the asymptotic normality of t statistics of possibly nonlinear functionals of a series LS estimator under dependent data by allowing for heavy-tailed errors ϵi and relaxing the growth rates of the series term K but maintaining the bounded conditional error variance assumption. For i.i.d. data,  Hansen (2014) derives pointwise asymptotic normality for linear functionals of a series LS estimator allowing for unbounded regressors, unbounded conditional error variance, but requiring E[|ϵi|4+η]< for some η>0. In addition to pointwise limiting distribution results,  Belloni et al. (2014) also provide uniform limit theory and uniform confidence intervals for linear functionals of a series LS estimator with i.i.d. data.

Our paper,  Belloni et al. (2014) and  Hansen (2014) all employ tools from recent random matrix theory to derive various new results for series LS estimation.  Belloni et al. (2014) are the first to apply the non-commutative Khinchin random matrix inequality for i.i.d. data. Instead, we apply an exponential inequality for sums of independent random matrices due to  Tropp (2012). Our results for series LS with weakly dependent data rely crucially on our extension of Tropp’s matrix exponential inequality from i.i.d. data to weakly dependent data. See  Hansen (2014) for other applications of Tropp’s inequality to series LS estimators with i.i.d. data.

Since economic and financial time series data often have infinite forth moments, the new improved rates and inference results in our paper should be very useful to the literatures on nonparametric estimation and testing of nonlinear time series models (see, e.g.,  Robinson, 1989, Li et al., 2003, Fan and Yao, 2003, Chen, 2013). Moreover, our new exponential inequality for sums of weakly dependent random matrices should be useful in series LS estimation of spatially dependent models and in other contexts as well.5

The rest of the paper is organized as follows. Section  2 first derives general upper bounds on the sup-norm convergence rates of series LS estimators with an arbitrary basis. It then shows that spline and wavelet series LS estimators attain the optimal sup-norm rates, allowing for weakly dependent data and heavy tailed error terms. It also presents general sharp L2-norm convergence rates of series LS estimators with an arbitrary basis under very mild conditions. Section  3 provides the asymptotic normality of sieve t statistics for possibly nonlinear functionals of h0. Section  4 provides new exponential inequalities for sums of weakly dependent random matrices, and a reinterpretation of equivalence of the theoretical and empirical L2 norms as a criterion regarding convergence of a certain random matrix. Section  5 shows the sup-norm stability of the empirical L2 projections onto compactly supported wavelet bases, which provides a tight upper bound on the sup-norm bias term for the wavelet series LS estimator. The results in Sections  4 Useful results on random matrices, 5 Sup-norm stability of are of independent interest. Section  6 contains a brief review of spline and wavelet sieve bases. Proofs and ancillary results are presented in Section  7.

Notation: Let λmin() and λmax() denote the smallest and largest eigenvalues, respectively, of a matrix. The exponent denotes the Moore–Penrose generalized inverse. denotes the Euclidean norm when applied to vectors and the matrix spectral norm (i.e., largest singular value) when applied to matrices, and p denotes the p norm when applied to vectors and its induced operator norm when applied to matrices (thus =2). If {an:n1} and {bn:n1} are two sequences of non-negative numbers, anbn means there exists a finite positive C such that anCbn for all n sufficiently large, and anbn means anbn and bnan. #S denotes the cardinality of a set S of finitely many elements. Given a strictly stationary process {Xi} and 1p<, we let Lp(X) denote the function space consisting of all (equivalence classes) of measurable functions f for which the Lp(X) norm fLp(X)E[|f(Xi)|p]1/p is finite, and we let L(X) denote the space of bounded functions under the sup norm , i.e., if f:XR then fsupxX|f(x)|.

Section snippets

Uniform convergence rates

In this section we present some general results on uniform convergence properties of nonparametric series LS estimators with weakly dependent data.

Inference on possibly nonlinear functionals

We now study inference on possibly nonlinear functionals f:L2(X)L(X)R of the regression function h0. Examples of functionals include, but are not limited to, the pointwise evaluation functional, the partial mean functional, and consumer surplus (see, e.g.,  Newey, 1997, for examples). The functional f(h0) may be estimated using the plug-in series LS estimator f(ĥ), for which we now establish feasible limit theory.

As with  Newey (1997) and  Chen et al. (2014), our results allow researchers

An exponential inequality for sums of weakly dependent random matrices

In this section we derive a new Bernstein-type inequality for sums of random matrices formed from absolutely regular (β-mixing) sequences, where the dimension, norm, and variance measure of the random matrices are allowed to grow with the sample size. This inequality is particularly useful for establishing sharp convergence rates for semi/nonparametric sieve estimators with weakly dependent data. We first recall an inequality of  Tropp (2012) for independent random matrices.

Theorem 4.1

Tropp, 2012

Let {Ξi}i=1n be a

Sup-norm stability of L2(X) projection onto wavelet sieves

In this section we show that the L2(X) orthogonal projection onto (tensor product) compactly supported wavelet bases is stable in sup norm as the dimension of the space increases. Consider the orthogonal projection operator PK defined in expression (16) where the elements of bK span the tensor products of d univariate wavelet spaces Wav(K0,[0,1]). We show that its L operator norm PK (see expression (17)) is stable, in the sense that PK1 as K. We also show that the empirical L2

Brief review of B-spline and wavelet sieve spaces

We first outline univariate B-spline and wavelet sieve spaces on [0,1], then deal with the multivariate case by constructing a tensor-product sieve basis.

B-splines B-splines are defined by their order r1 (or degree r10) and number of interior knots m0. Define the knot set 0=t(r1)==t0t1tmtm+1==tm+r=1. We generate a L-normalized B-spline basis recursively using the De Boor relation (see, e.g., Chapter 5 of  DeVore and Lorentz, 1993) then appropriately rescale the basis functions.

Proofs for Section  2

Proof of Lemma 2.1

Follows from Corollary 4.1 by setting Ξi,n=n1(b˜wK(Xi)b˜wK(Xi)IK) and noting that Rnn1(ζK,n2λK,n2+1), and σn2n1(ζK,n2λK,n2+1). 

Proof of Lemma 2.2

Follows from Corollary 4.2 by setting Ξi,n=n1(b˜wK(Xi)b˜wK(Xi)IK) and noting that Rnn1(ζK,n2λK,n2+1), and σn2n2(ζK,n2λK,n2+1). 

Proof of Lemma 2.3

By rotational invariance, we may rescale ĥ and h˜ to yield ĥ(x)h˜(x)=b˜wK(x)(B˜wB˜w/n)B˜we/n where e=(ϵ1,,ϵn).

Let ȟ=ĥh˜ to simplify notation. By the mean value theorem, Assumption 3(i) and 4 (i)(iii), for any (x,x)Dn2

References (45)

  • Belloni, A., Chernozhukov, V., Chetverikov, D., Kato, K., 2014, Some new asymptotic theory for least squares series:...
  • H. Berbee

    Convergece rates in the strong law for bounded mixing sequences

    Probab. Theory Related Field

    (1987)
  • R.C. Bradley

    Basic properties of strong mixing conditions. a survey and some open questions

    Probab. Surv.

    (2005)
  • X. Chen

    Penalized sieve estimation and inference of semi-nonparametric dynamic models: A selective review

  • Chen, X., Christensen, T.M., 2013, Optimal Uniform Convergence Rates for Sieve Nonparametric Instrumental Variables...
  • X. Chen et al.

    Sup Norm Convergence Rate and Asymptotic Normality for a Class of Linear Sieve Estimators, Tech. Report

    (2003)
  • X. Chen et al.

    Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals

    Econometrica

    (2012)
  • X. Chen et al.

    On rate optimality for ill-posed inverse problems in econometrics

    Econometric Theory

    (2011)
  • X. Chen et al.

    Sieve extremum estimates for weakly dependent data

    Econometrica

    (1998)
  • C. De Boor

    A Practical Guide to Splines

    (2001)
  • S. Demko et al.

    Decay rates for inverses of band matrices

    Math. Comp.

    (1984)
  • R.A. DeVore et al.

    Constructive Approximation

    (1993)
  • Cited by (0)

    We thank the guest coeditor, two anonymous referees, Bruce Hansen, Jianhua Huang, Stefan Schneeberger, and conference participants of SETA2013 in Seoul and AMES2013 in Singapore for useful comments. This paper is an expanded version of Sections 4 and 5 of our Cowles Foundation Discussion Paper No. 1923 (Chen and Christensen, 2013) the remaining parts of which are currently undergoing a major revision. Support from the Cowles Foundation is gratefully acknowledged. Any errors are the responsibility of the authors.

    View full text