Nonparametric identification and estimation of transformation models
Introduction
A variety of structural econometric models comes in the form of a transformation model, in which a scalar dependent variable is related to a vector of regressors and a scalar unobservable through The model is characterized by a strictly monotonic transformation , a regression function , and a cumulative distribution function (cdf) of given , all of which are unknown. An important economic application of the model (1) is to the study of duration data (see, e.g., Van den Berg, 2001, for a survey). In this context, dependence between and some components of is often a concern, which can arise for a variety of reasons. For instance, if the duration outcome depends on another duration variable with both durations affected by the same unobserved heterogeneity term (Abbring and van den Berg, 2003); or because duration data is only observed for those individuals that comply with some treatment and compliance is not random but selective (Bijwaard and Ridder, 2005); or else in a strategic environment in which durations of two or more players interact with each other (Honore and de Paula, 2010); or because of reverse causality as when duration represents time-to-default and defaults affect regressors such as prices (Palmer, 2014). More generally, omission of relevant regressors or presence of measurement errors might give rise to endogeneity.
We develop novel nonparametric identification results for when some of the regressors are correlated with . Our identification strategy is constructive in the sense that we obtain explicit expressions of the components in terms of the cdf of given , . This in turn allows us to develop simple nonparametric estimators of which we analyze. An important feature is that the convergence rate of the estimator of critically depends on the normalization conditions we impose: The “smoother” the normalization, the faster the estimator converges. To the best of our knowledge, our paper is the first to show that normalization conditions are not innocuous, with different normalization choices leading to nonparametric estimators with radically different properties.1 When the normalization used for identification of does not involve derivatives of , our estimator attains parametric rate. This in turn implies that for inference regarding and we can treat as known.
The identification argument proceeds in two steps: We first show that is identified under the assumption that can be decomposed into where the subset of regressors is conditionally exogenous, . As such play a role similar to the “special regressor” of Lewbel (1998); however, in contrast to his study, we do not require to satisfy any “large-support” conditions. Once has been identified, we can identify and using existing results on nonparametric instrumental variables (IV); see, e.g. Darolles et al. (2011), and references therein.
The estimation strategy builds upon our identification result where we demonstrate that can be expressed as a functional of . A pointwise estimator of is then obtained by replacing with a nonparametric estimator. Once has been estimated, can be estimated using, for example, nonparametric IVs with replacing the unknown dependent variable . Given the parametric convergence rate of , our nonparametric IV estimator of is asymptotically equivalent to the oracle estimator with known. Having recovered and , we can compute residuals and use these to estimate .
The identification and estimation schemes critically rely on the availability of at least one regressor being conditionally exogeneous. If, for a given choice of , this assumption is violated the proposed estimators are inconsistent. It is therefore important to be able to check the validity of a candidate regressor. As part of the identification argument, we derive a set of over-identifying restrictions implied by the conditional independence assumption, which in turn is used to develop a statistical test for it.
We investigate the finite-sample performance of our estimators in a Monte Carlo simulation study designed around a popular duration model. We find that the estimators perform well with moderate biases and variances. Moreover, they appear to be quite robust to the choice of the various smoothing parameters used in their implementation.
Our identification results are close in spirit to those obtained by Ridder (1990) and Ekeland et al. (2004) who focus on exogenous regressors. Fève and Florens (2010) allow for endogenous regressors when is linear or partially linear using a so-called measurable separability assumption in place of our conditional exogeneity condition. More in line with our identification strategy, Vanhems and Van Keilegom (2013) allow for endogeneity in a semiparametric version of the model with a finitely parameterized transformation. Finally, Chernozhukov et al. (2007) and Chen et al. (2011)provide identification conditions that allow for endogeneity in a general class of models, including ours. These are, however, only local identification results and rely on high-level assumptions. We complement these papers by providing primitive conditions for global nonparametric identification.
Nonparametric estimators of under exogeneity have been developed in, e.g., Horowitz (1996), Chen (2002) and Jochmans (2011). These require as input an initial parametric estimator of and are thus difficult to extend to the fully nonparametric case. Matzkin (1991) and Jacho-Chávez et al. (2010) develop fully nonparametric estimators. However, the asymptotic properties of the former are still not fully understood, and the latter only achieves nonparametric convergence rate. None of the above papers allow for endogenous regressors. Finally, the sieve estimators developed in Chernozhukov et al. (2007) and Chen and Pouzo (2012) should in principle be applicable to our model.
The remainder of the paper is organized as follows. Section 2 contains the identification result, while estimators are proposed and analyzed in Section 3. The test for conditional independence is developed and analyzed in Section 4. Section 5 illustrates the performance of the proposed estimators and test through a Monte Carlo experiment. The last section concludes. Additional technical assumptions and proofs are relegated to an Appendix A Sieve IV assumptions, Appendix B Proofs, Appendix C Lemmas, Appendix D Identification without continuity, Appendix E Integral normalization.
Section snippets
Model and assumptions
We consider the model in (1) where has support , has support , and belongs to . The variables and are observed, while remains latent. We decompose the regressors into where the subvector is assumed to be exogeneous while contains the potentially endogenous components. The supports of and are denoted by and , respectively.
Assumption A1 For a.e. , the conditional distribution of given is absolutely continuous (with
Estimation
We use the identification results of the previous section to derive explicit estimators of . We will for notational simplicity assume that has a continuous distribution which we then estimate using kernel smoothing techniques. If some of the regressors in have a discrete distribution, the corresponding kernel function used in the nonparametric smoothing should be replaced by an indicator function.
Testing exogeneity
The identification and estimation results developed in the two previous sections rest on two fundamental assumptions regarding the chosen “special” regressor : First, needs to be relevant in a sense that ; and second it needs to be exogenous in a sense that:
If either of these two restrictions is violated, the proposed estimator will in general be inconsistent. It is therefore of interest to develop tools to examine whether a candidate regressor indeed satisfies
Monte Carlo application to duration models
We here illustrate how the proposed identification and estimation strategy can be used in the study of duration models, and provide Monte Carlo results for estimators and tests in this context.
Discussion and conclusion
We conclude by discussing possible extensions and applications of our results. First, note that additional instrumental variables are easily incorporated in our setup. Specifically, instead of assuming conditional independence between and given , we could assume that some instrument was available such that and were conditionally independent given , i.e. . This would amount to considering the conditional distribution of given which now satisfies:
Acknowledgments
We would like to thank the Editor, Jianqing Fan, an Associate Editor, and three anonymous Referees, for their comments and suggestions which greatly improved the manuscript. Earlier versions of this paper were presented at the CAM workshop 2010 at University of Copenhagen, the (EC)2 conference 2010 in Toulouse, 2010 CIRANO-CIREQ conference on “Revealed Preferences and Partial Identification” in Montreal, 2014 conference on non and semiparametric Econometrics at University of York, and seminars
References (33)
- et al.
Correcting for selective compliance in a re-employment bonus experiment
J. Econometrics
(2005) - et al.
Instrumental variable estimation of nonseparable models
J. Econometrics
(2007) - et al.
Identification and nonparametric estimation of a transformed additively separable model
J. Econometrics
(2010) Semi-nonparametric estimation and misspecification testing of diffusion models
J. Econometrics
(2011)- et al.
Specification testing for transformation models with applications to generalized accelerated failure-time models
J. Econometrics
(2015) Duration models: specification, identification and multiple durations
- et al.
The nonparametric identification of treatment effects in duration models
Econometrica
(2003) - Belloni, A., Chen, X., Chernozhukov, V., Liao, Z., 2010, On Limiting Distributions of Possibly Unbounded Functionals of...
- et al.
Semi-nonparametric IV estimation of shape-invariant engel curves
Econometrica
(2007) Rank estimation of transformation models
Econometrica
(2002)