Some new asymptotic theory for least squares series: Pointwise and uniform results
Introduction
Series estimators have been playing a central role in various fields. In econometric applications it is common that the exact form of a conditional expectation is unknown and having a flexible functional form can lead to improvements over a pre-specified functional form, especially if it nests some successful parametric economic models. Series estimation offers exactly that by approximating the unknown function based on basis functions, where is allowed to grow with the sample size to balance the trade off between variance and bias. Moreover, the series modeling allows for convenient nesting of some theory-based models, by simply using corresponding terms as the first basis functions. For instance, our series could contain linear and quadratic functions to nest the canonical Mincer equations in the context of wage equation modeling or the canonical translog demand and production functions in the context of demand and supply modeling; see, for example, Wasserman (2006) for a textbook level introduction to series estimators.
Several asymptotic properties of series estimators have been investigated in the literature. The focus has been on convergence rates and asymptotic normality results (see van de Geer, 1990, Andrews, 1991, Eastwood and Gallant, 1991, Gallant and Souza, 1991, Newey, 1997, van de Geer, 2002, Huang, 2003b, Chen, 2007, Cattaneo and Farrell, 2013, and the references therein).
This work revisits the topic by making use of new critical ingredients:
- 1.
The sharp LLNs for matrices derived from the non-commutative Khinchin inequalities.
- 2.
The sharp bounds on the Lebesgue factor that controls the ratio between the and -norms of the least squares approximation of functions (which is bounded or grows like a in many cases).
- 3.
Sharp maximal inequalities for processes whose entropy integrals diverge at some rate.
- 4.
Strong approximations to empirical processes of series types.
This approach allows us to contribute to the series literature in several directions. First, we weaken considerably the condition on the number of approximating functions used in series estimation from the typical (see Newey, 1997) to for bounded or local bases which was previously available only for spline series (Huang, 2003a, Stone, 1994), and recently established for local polynomial partition series (Cattaneo and Farrell, 2013). An example of a bounded basis is Fourier series; examples of local bases are spline, wavelet, and local polynomial partition series. To be more specific, for such bases we require . Note that the last condition is similar to the condition on the bandwidth value required for local polynomial (kernel) regression estimators ( where is the bandwidth value). Second, under the same weak conditions we derive rates and pointwise central limit theorems results when the approximation error vanishes. Under a misspecified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates that hold if the approximation error vanishes or not. An important contribution here is that we show that the series estimator achieves the optimal uniform rate of convergence under quite general conditions. Previously, the same result was shown only for local polynomial partition series estimator (Cattaneo and Farrell, 2013). In addition, we derive a functional central limit theorem. By the functional central limit theorem we mean here that the entire estimate of the nonparametric function is uniformly close to a Gaussian process that can change with . That is, we derive the strong approximation for the entire estimate of the nonparametric function.
Perhaps the most important contribution of the paper is a set of completely new results that provide estimation and inference methods for the entire linear functionals of the conditional mean function . Examples of linear functionals of interest include
- 1.
the partial derivative function: ;
- 2.
the average partial derivative: ;
- 3.
the conditional average partial derivative:
Notation In what follows, all parameter values are indexed by the sample size , but we omit the index whenever this does not cause confusion. We use the notation , and . The -norm of a vector is denoted by , while for a matrix the operator norm is denoted by . We also use standard notation in the empirical process literature, and we use the notation to denote for some constant that does not depend on ; and to denote . Moreover, for two random variables we say that if they have the same probability distribution. Finally, denotes the space of vectors in with unit Euclidean norm: .
Section snippets
Set-up
Throughout the paper, we consider a sequence of models, indexed by the sample size , where is a response variable, a vector of covariates (basic regressors), noise, and a regression (conditional mean) function; that is, we consider a triangular array of models with , , , and . We assume that where is some class of functions. Since we consider a sequence of models indexed by , we allow the
Approximation properties of least squares
Next we consider approximation properties of the least squares estimator. Not surprisingly, approximation properties must rely on the particular choice of approximating functions. At this point it is instructive to consider particular examples of relevant bases used in the literature. For each example, we state a bound on the following quantity: This quantity will play a key role in our analysis.1
limit theory
After we have established the set-up, we proceed to derive our results. We start with a result on the rate of convergence. Recall that . In the theorem below, we assume that . This is a mild regularity condition. Theorem 4.1 Assume that Condition A.1, Condition A.2, Condition A.3 are satisfied. In addition, assume that and . Then under ,and under , Rate of Convergence
Comment 4.1 (i) This is our first main result in this
Rates and inference on linear functionals
In this section, we derive rates and inference results for linear functionals of the conditional expectation function such as its derivative, average derivative, or conditional average derivative. To a large extent, with the exception of Theorem 5.6, the results presented in this section can be considered as an extension of results presented in Section 4, and so similar comments can be applied as those given in Section 4. Theorem 5.6 deals with construction of uniform confidence
Tools: maximal inequalities for matrices and empirical processes
In this section we collect the main technical tools that our analysis rely upon, namely Khinchin Inequalities for Matrices and Data Dependent Maximal Inequalities.
Acknowledgments
This paper was presented and first circulated in a series of lectures given by Victor Chernozhukov at “Stats in the Château” Statistics Summer School on “Inverse Problems and High-Dimensional Statistics” in 2009 near Paris. Participants, especially Xiaohong Chen, and one of several referees made numerous helpful suggestions. We also thank Bruce Hansen for extremely useful comments.
References (41)
- et al.
Optimal convergence rates, Bahadur representation, and asymptotic normality of partitioning estimators
J. Econometrics
(2013) - et al.
Wavelets on the interval and fast wavelet transforms
Appl. Comput. Harmon. Anal.
(1993) The sizes of compact subsets of Hilbert space and continuity of Gaussian processes
J. Funct. Anal.
(1967)- et al.
On the asymptotic normality of Fourier flexible functional form estimates
J. Econometrics
(1991) - et al.
-moments of random vectors via majorizing measures
Adv. Math.
(2007) Asymptotics for polynomial spline regression under weak conditions
Statist. Probab. Lett.
(2003)Convergence rates and asymptotic normality for series estimators
J. Econometrics
(1997)Random vectors in the isotropic position
J. Funct. Anal.
(1999)M-estimation using penalties or sieves
J. Statist. Plann. Inference
(2002)Asymptotic normality of series estimators for nonparametric and semiparametric models
Econometrica
(1991)
Quantile regression under misspecification, with an application to the U.S. wage structure
Econometrica
Large sample sieve estimation of semi-nonparametric models
Anti-concentration and honest, adaptive confidence bands
Ann. Statist.
Intersection bounds: estimation and inference
Econometrica
A Practical Guide to Splines (Revised Edition)
Constructive Approximation
Adaptive rules for seminonparametric estimation that achieve asymptotic normality
Econometric Theory
Cited by (203)
Nonparametric estimation for high-frequency data incorporating trading information
2024, Journal of EconometricsAssumption-lean falsification tests of rate double-robustness of double-machine-learning estimators
2024, Journal of EconometricsLocal regression distribution estimators
2024, Journal of EconometricsRetire: Robust expectile regression in high dimensions
2024, Journal of EconometricsFunctional coefficient cointegration models with Box–Cox transformation
2024, Economics LettersSemi-parametric single-index predictive regression models with cointegrated regressors
2024, Journal of Econometrics