Elsevier

Journal of Econometrics

Volume 188, Issue 1, September 2015, Pages 22-39
Journal of Econometrics

Nonparametric identification and estimation of transformation models

https://doi.org/10.1016/j.jeconom.2015.01.001Get rights and content

Abstract

This paper derives sufficient conditions for nonparametric transformation models to be identified and develops estimators of the identified components. Our nonparametric identification result is global, and allows for endogenous regressors. In particular, we show that a completeness assumption combined with conditional independence with respect to one of the regressors suffices for the model to be nonparametrically identified. The identification result is also constructive in the sense that it yields explicit expressions of the functions of interest. We show how natural estimators can be developed from these expressions, and analyze their theoretical properties. Importantly, it is demonstrated that different normalizations of the model lead to different asymptotic properties of the estimators with one normalization in particular resulting in an estimator for the unknown transformation function that converges at a parametric rate. A test for whether a candidate regressor satisfies the conditional independence assumption required for identification is developed. A Monte Carlo experiment illustrates the performance of our method in the context of a duration model with endogenous regressors.

Introduction

A variety of structural econometric models comes in the form of a transformation model, in which a scalar dependent variable Y is related to a vector of regressors X and a scalar unobservable ϵ through Y=T(g(X)+ϵ). The model is characterized by a strictly monotonic transformation T, a regression function g, and a cumulative distribution function (cdf) Fϵ|X of ϵ given X, all of which are unknown. An important economic application of the model (1) is to the study of duration data (see, e.g.,  Van den Berg, 2001, for a survey). In this context, dependence between ϵ and some components of X is often a concern, which can arise for a variety of reasons. For instance, if the duration outcome depends on another duration variable with both durations affected by the same unobserved heterogeneity term (Abbring and van den Berg, 2003); or because duration data is only observed for those individuals that comply with some treatment and compliance is not random but selective (Bijwaard and Ridder, 2005); or else in a strategic environment in which durations of two or more players interact with each other (Honore and de Paula, 2010); or because of reverse causality as when duration represents time-to-default and defaults affect regressors such as prices (Palmer, 2014). More generally, omission of relevant regressors or presence of measurement errors might give rise to endogeneity.

We develop novel nonparametric identification results for (T,g,Fϵ|X) when some of the regressors X are correlated with ϵ. Our identification strategy is constructive in the sense that we obtain explicit expressions of the components in terms of the cdf of Y given X, FY|X. This in turn allows us to develop simple nonparametric estimators of (T,g,Fϵ|X) which we analyze. An important feature is that the convergence rate of the estimator of Tcritically depends on the normalization conditions we impose: The “smoother” the normalization, the faster the estimator converges. To the best of our knowledge, our paper is the first to show that normalization conditions are not innocuous, with different normalization choices leading to nonparametric estimators with radically different properties.1 When the normalization used for identification of T does not involve derivatives of T, our estimator attains parametric rate. This in turn implies that for inference regarding g and Fϵ|X we can treat T as known.

The identification argument proceeds in two steps: We first show that ΘT1 is identified under the assumption that X can be decomposed into X=(XI,XI) where the subset of regressors XI is conditionally exogenous, ϵXIXI. As such XI play a role similar to the “special regressor” of  Lewbel (1998); however, in contrast to his study, we do not require XI to satisfy any “large-support” conditions. Once Θ has been identified, we can identify g and Fϵ|X using existing results on nonparametric instrumental variables (IV); see, e.g.  Darolles et al. (2011), and references therein.

The estimation strategy builds upon our identification result where we demonstrate that Θ can be expressed as a functional of FY|X. A pointwise estimator of Θ is then obtained by replacing FY|X with a nonparametric estimator. Once Θ has been estimated, g can be estimated using, for example, nonparametric IVs with Θˆ(Y) replacing the unknown dependent variable Θ(Y). Given the parametric convergence rate of Θˆ, our nonparametric IV estimator of g is asymptotically equivalent to the oracle estimator with Θ known. Having recovered Θ and g, we can compute residuals and use these to estimate Fϵ|X.

The identification and estimation schemes critically rely on the availability of at least one regressor being conditionally exogeneous. If, for a given choice of XI, this assumption is violated the proposed estimators are inconsistent. It is therefore important to be able to check the validity of a candidate regressor. As part of the identification argument, we derive a set of over-identifying restrictions implied by the conditional independence assumption, which in turn is used to develop a statistical test for it.

We investigate the finite-sample performance of our estimators in a Monte Carlo simulation study designed around a popular duration model. We find that the estimators perform well with moderate biases and variances. Moreover, they appear to be quite robust to the choice of the various smoothing parameters used in their implementation.

Our identification results are close in spirit to those obtained by Ridder (1990) and  Ekeland et al. (2004) who focus on exogenous regressors. Fève and Florens (2010) allow for endogenous regressors when g is linear or partially linear using a so-called measurable separability assumption in place of our conditional exogeneity condition. More in line with our identification strategy,  Vanhems and Van Keilegom (2013) allow for endogeneity in a semiparametric version of the model with a finitely parameterized transformation. Finally,  Chernozhukov et al. (2007) and  Chen et al. (2011)provide identification conditions that allow for endogeneity in a general class of models, including ours. These are, however, only local identification results and rely on high-level assumptions. We complement these papers by providing primitive conditions for global nonparametric identification.

Nonparametric estimators of Θ under exogeneity have been developed in, e.g.,  Horowitz (1996),  Chen (2002) and  Jochmans (2011). These require as input an initial parametric estimator of g and are thus difficult to extend to the fully nonparametric case.  Matzkin (1991) and Jacho-Chávez et al. (2010) develop fully nonparametric estimators. However, the asymptotic properties of the former are still not fully understood, and the latter only achieves nonparametric convergence rate. None of the above papers allow for endogenous regressors. Finally, the sieve estimators developed in  Chernozhukov et al. (2007) and  Chen and Pouzo (2012) should in principle be applicable to our model.

The remainder of the paper is organized as follows. Section  2 contains the identification result, while estimators are proposed and analyzed in Section  3. The test for conditional independence is developed and analyzed in Section  4. Section  5 illustrates the performance of the proposed estimators and test through a Monte Carlo experiment. The last section concludes. Additional technical assumptions and proofs are relegated to an Appendix A Sieve IV assumptions, Appendix B Proofs, Appendix C Lemmas, Appendix D Identification without continuity, Appendix E Integral normalization.

Section snippets

Model and assumptions

We consider the model in (1) where Y has support YR, X=(X1,,Xdx) has support XRdx, and ϵ belongs to ER. The variables Y and X are observed, while ϵ remains latent. We decompose the regressors into X=(XI,XI) where the subvector XIR|I| is assumed to be exogeneous while XIRdx|I| contains the potentially endogenous components. The supports of XI and XI are denoted by XI and XI, respectively.

Assumption A1

For a.e. xX, the conditional distribution Fϵ|X(|x) of ϵ given X=x is absolutely continuous (with

Estimation

We use the identification results of the previous section to derive explicit estimators of (T,g,Fϵ|X). We will for notational simplicity assume that XI has a continuous distribution which we then estimate using kernel smoothing techniques. If some of the regressors in XI have a discrete distribution, the corresponding kernel function used in the nonparametric smoothing should be replaced by an indicator function.

Testing exogeneity

The identification and estimation results developed in the two previous sections rest on two fundamental assumptions regarding the chosen “special” regressor Xi: First, Xi needs to be relevant in a sense that g(x)/xi0; and second it needs to be exogenous in a sense that: H0:ϵXiXi.

If either of these two restrictions is violated, the proposed estimator will in general be inconsistent. It is therefore of interest to develop tools to examine whether a candidate regressor indeed satisfies

Monte Carlo application to duration models

We here illustrate how the proposed identification and estimation strategy can be used in the study of duration models, and provide Monte Carlo results for estimators and tests in this context.

Discussion and conclusion

We conclude by discussing possible extensions and applications of our results. First, note that additional instrumental variables are easily incorporated in our setup. Specifically, instead of assuming conditional independence between ϵ and XI given XI, we could assume that some instrument W was available such that ϵ and XI were conditionally independent given (XI,W), i.e.  ϵXI(XI,W). This would amount to considering the conditional distribution FY|X,W of Y given (X,W) which now satisfies:F

Acknowledgments

We would like to thank the Editor, Jianqing Fan, an Associate Editor, and three anonymous Referees, for their comments and suggestions which greatly improved the manuscript. Earlier versions of this paper were presented at the CAM workshop 2010 at University of Copenhagen, the (EC)2 conference 2010 in Toulouse, 2010 CIRANO-CIREQ conference on “Revealed Preferences and Partial Identification” in Montreal, 2014 conference on non and semiparametric Econometrics at University of York, and seminars

References (33)

  • Chen, X., Chernozhukov, V., Lee, S., Newey, W., (2011). Local Identification of Nonparametric and Semiparametric...
  • X. Chen et al.

    Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals

    Econometrica

    (2012)
  • S. Darolles et al.

    Nonparametric instrumental regression

    Econometrica

    (2011)
  • I. Ekeland et al.

    Identification and estimation of hedonic models

    J. Political Economy

    (2004)
  • F. Fève et al.

    The practice of non-parametric estimation by solving inverse problems: the example of transformation models

    J. Econometrics

    (2010)
  • P. Hall et al.

    Nonparametric methods for inference in the presence of instrumental variables

    Ann. Statist.

    (2005)
  • Cited by (0)

    View full text