Some new asymptotic theory for least squares series: Pointwise and uniform results

https://doi.org/10.1016/j.jeconom.2015.02.014Get rights and content

Abstract

In econometric applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements over a pre-specified functional form, especially if they nest some successful parametric economically-motivated forms. Series method offers exactly that by approximating the unknown function based on k basis functions, where k is allowed to grow with the sample size n to balance the trade off between variance and bias. In this work we consider series estimators for the conditional mean in light of four new ingredients: (i) sharp LLNs for matrices derived from the non-commutative Khinchin inequalities, (ii) bounds on the Lebesgue factor that controls the ratio between the L and L2-norms of approximation errors, (iii) maximal inequalities for processes whose entropy integrals diverge at some rate, and (iv) strong approximations to series-type processes.

These technical tools allow us to contribute to the series literature, specifically the seminal work of Newey (1997), as follows. First, we weaken considerably the condition on the number k of approximating functions used in series estimation from the typical k2/n0 to k/n0, up to log factors, which was available only for spline series before. Second, under the same weak conditions we derive L2 rates and pointwise central limit theorems results when the approximation error vanishes. Under an incorrectly specified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates and functional central limit theorems that hold if the approximation error vanishes or not. That is, we derive the strong approximation for the entire estimate of the nonparametric function.

Finally and most importantly, from a point of view of practice, we derive uniform rates, Gaussian approximations, and uniform confidence bands for a wide collection of linear functionals of the conditional expectation function, for example, the function itself, the partial derivative function, the conditional average partial derivative function, and other similar quantities. All of these results are new.

Introduction

Series estimators have been playing a central role in various fields. In econometric applications it is common that the exact form of a conditional expectation is unknown and having a flexible functional form can lead to improvements over a pre-specified functional form, especially if it nests some successful parametric economic models. Series estimation offers exactly that by approximating the unknown function based on k basis functions, where k is allowed to grow with the sample size n to balance the trade off between variance and bias. Moreover, the series modeling allows for convenient nesting of some theory-based models, by simply using corresponding terms as the first k0k basis functions. For instance, our series could contain linear and quadratic functions to nest the canonical Mincer equations in the context of wage equation modeling or the canonical translog demand and production functions in the context of demand and supply modeling; see, for example, Wasserman (2006) for a textbook level introduction to series estimators.

Several asymptotic properties of series estimators have been investigated in the literature. The focus has been on convergence rates and asymptotic normality results (see  van de Geer, 1990, Andrews, 1991, Eastwood and Gallant, 1991, Gallant and Souza, 1991, Newey, 1997, van de Geer, 2002, Huang, 2003b, Chen, 2007, Cattaneo and Farrell, 2013, and the references therein).

This work revisits the topic by making use of new critical ingredients:

  • 1.

    The sharp LLNs for matrices derived from the non-commutative Khinchin inequalities.

  • 2.

    The sharp bounds on the Lebesgue factor that controls the ratio between the L and L2-norms of the least squares approximation of functions (which is bounded or grows like a logk in many cases).

  • 3.

    Sharp maximal inequalities for processes whose entropy integrals diverge at some rate.

  • 4.

    Strong approximations to empirical processes of series types.

To the best of our knowledge, our results are the first applications of the first ingredient to statistical estimation problems. After the use in this work, some recent working papers are also using related matrix inequalities and extending some results in different directions, e.g.  Chen and Christensen (2013) allows β-mixing dependence, and  Hansen (2014) handles unbounded regressors and also characterizes a trade-off between the number of finite moments and the allowable rate of expansion of the number of series terms. Regarding the second ingredient, it has already been used by  Huang (2003a) but for splines only. All of these ingredients are critical for generating sharp results.

This approach allows us to contribute to the series literature in several directions. First, we weaken considerably the condition on the number k of approximating functions used in series estimation from the typical k2/n0  (see  Newey, 1997) to k/n0  (up to logs) for bounded or local bases which was previously available only for spline series (Huang, 2003a, Stone, 1994), and recently established for local polynomial partition series (Cattaneo and Farrell, 2013). An example of a bounded basis is Fourier series; examples of local bases are spline, wavelet, and local polynomial partition series. To be more specific, for such bases we require klogk/n0. Note that the last condition is similar to the condition on the bandwidth value required for local polynomial (kernel) regression estimators (hdlog(1/h)/n0 where h=1/k1/d is the bandwidth value). Second, under the same weak conditions we derive L2 rates and pointwise central limit theorems results when the approximation error vanishes. Under a misspecified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates that hold if the approximation error vanishes or not. An important contribution here is that we show that the series estimator achieves the optimal uniform rate of convergence under quite general conditions. Previously, the same result was shown only for local polynomial partition series estimator (Cattaneo and Farrell, 2013). In addition, we derive a functional central limit theorem. By the functional central limit theorem we mean here that the entire estimate of the nonparametric function is uniformly close to a Gaussian process that can change with n. That is, we derive the strong approximation for the entire estimate of the nonparametric function.

Perhaps the most important contribution of the paper is a set of completely new results that provide estimation and inference methods for the entire linear functionals θ() of the conditional mean function g:XR. Examples of linear functionals θ() of interest include

  • 1.

    the partial derivative function:  xθ(x)=jg(x);

  • 2.

    the average partial derivative:  θ=jg(x)dμ(x);

  • 3.

    the conditional average partial derivative:   xsθ(xs)=jg(x)dμ(x|xs)

where jg(x) denotes the partial derivative of g(x) with respect to jth component of x, xs is a subvector of x, and the measure μ entering the definitions above is taken as known; the result can be extended to include estimated measures. We derive uniform (in x) rates of convergence, large sample distributional approximations, and inference methods for the functions above based on the Gaussian approximation. To the best of our knowledge all these results are new, especially the distributional and inferential results. For example, using these results we can now perform inference on the entire partial derivative function. The only other reference that provides analogous results but for quantile series estimator is  Belloni et al. (2011). Before doing uniform analysis, we also update the pointwise results of  Newey (1997) to weaker, more general conditions.

Notation

In what follows, all parameter values are indexed by the sample size n, but we omit the index whenever this does not cause confusion. We use the notation (a)+=max{a,0}, ab=max{a,b} and ab=min{a,b}. The 2-norm of a vector v is denoted by v, while for a matrix Q the operator norm is denoted by Q. We also use standard notation in the empirical process literature, En[f]=En[f(wi)]=1ni=1nf(wi)andGn[f]=Gn[f(wi)]=1ni=1n(f(wi)E[f(wi)]) and we use the notation ab to denote acb for some constant c>0 that does not depend on n; and aPb to denote a=OP(b). Moreover, for two random variables X,Y we say that X=dY if they have the same probability distribution. Finally, Sk1 denotes the space of vectors α in Rk with unit Euclidean norm: α=1.

Section snippets

Set-up

Throughout the paper, we consider a sequence of models, indexed by the sample size n, yi=g(xi)+ϵi,E[ϵi|xi]=0,xiXRd,i=1,,n, where yi is a response variable, xi a vector of covariates (basic regressors), ϵi noise, and xg(x)=E[yi|xi=x] a regression (conditional mean) function; that is, we consider a triangular array of models with yi=yi,n, xi=xi,n, ϵi=ϵi,n, and g=gn. We assume that gG where G is some class of functions. Since we consider a sequence of models indexed by n, we allow the

Approximation properties of least squares

Next we consider approximation properties of the least squares estimator. Not surprisingly, approximation properties must rely on the particular choice of approximating functions. At this point it is instructive to consider particular examples of relevant bases used in the literature. For each example, we state a bound on the following quantity: ξksupxXp(x). This quantity will play a key role in our analysis.1

L2 limit theory

After we have established the set-up, we proceed to derive our results. We start with a result on the L2 rate of convergence. Recall that σ̄2=supxXE[ϵi2|xi=x]. In the theorem below, we assume that σ¯21. This is a mild regularity condition.

Theorem 4.1

L2 Rate of Convergence

Assume that   Condition A.1, Condition A.2, Condition A.3   are satisfied. In addition, assume that ξk2logk/n0 and σ¯21 . Then under ck0,ĝgF,2Pk/n+ck,and under ck0,ĝpβF,2Pk/n+(kckk/n)(ξkck/n),

Comment 4.1

(i) This is our first main result in this

Rates and inference on linear functionals

In this section, we derive rates and inference results for linear functionals θ(w),wI of the conditional expectation function such as its derivative, average derivative, or conditional average derivative. To a large extent, with the exception of Theorem 5.6, the results presented in this section can be considered as an extension of results presented in Section  4, and so similar comments can be applied as those given in Section  4. Theorem 5.6 deals with construction of uniform confidence

Tools: maximal inequalities for matrices and empirical processes

In this section we collect the main technical tools that our analysis rely upon, namely Khinchin Inequalities for Matrices and Data Dependent Maximal Inequalities.

Acknowledgments

This paper was presented and first circulated in a series of lectures given by Victor Chernozhukov at “Stats in the Château” Statistics Summer School on “Inverse Problems and High-Dimensional Statistics” in 2009 near Paris. Participants, especially Xiaohong Chen, and one of several referees made numerous helpful suggestions. We also thank Bruce Hansen for extremely useful comments.

References (41)

  • J. Angrist et al.

    Quantile regression under misspecification, with an application to the U.S. wage structure

    Econometrica

    (2006)
  • Belloni, A., Chernozhukov, V., Fernández-Val, I., 2011. Conditional Quantile Processes Based on Series or Many...
  • X. Chen

    Large sample sieve estimation of semi-nonparametric models

  • Chen, X., Christensen, T., 2013, Optimal uniform convergence rates for sieve nonparametric instrumental variables...
  • Chernozhukov, V., Chetverikov, D., Kato, K., 2012. Gaussian approximation of suprema of empirical processes....
  • V. Chernozhukov et al.

    Anti-concentration and honest, adaptive confidence bands

    Ann. Statist.

    (2014)
  • V. Chernozhukov et al.

    Intersection bounds: estimation and inference

    Econometrica

    (2013)
  • C. De Boor

    A Practical Guide to Splines (Revised Edition)

    (2001)
  • R.A. DeVore et al.

    Constructive Approximation

    (1993)
  • B.J. Eastwood et al.

    Adaptive rules for seminonparametric estimation that achieve asymptotic normality

    Econometric Theory

    (1991)
  • Cited by (203)

    • Local regression distribution estimators

      2024, Journal of Econometrics
    View all citing articles on Scopus
    View full text