Well-posedness of measurement error models for self-reported data

https://doi.org/10.1016/j.jeconom.2012.01.036Get rights and content

Abstract

This paper considers the widely admitted ill-posed inverse problem for measurement error models: estimating the distribution of a latent variable X from an observed sample of X, a contaminated measurement of X. We show that the inverse problem is well-posed for self-reporting data under the assumption that the probability of truthful reporting is nonzero, which is supported by empirical evidences. Comparing with ill-posedness, well-posedness generally can be translated into faster rates of convergence for the nonparametric estimators of the latent distribution. Therefore, our optimistic result on well-posedness is of importance in economic applications, and it suggests that researchers should not ignore the point mass at zero in the measurement error distribution when they model measurement errors with self-reported data. We also analyze the implications of our results on the estimation of classical measurement error models. Then by both a Monte Carlo study and an empirical application, we show that failing to account for the nonzero probability of truthful reporting can lead to significant bias on estimation of the latent distribution.

Introduction

Empirical studies in microeconomics usually involve survey samples, where personal information is reported by the interviewees themselves, and therefore, the corresponding variables in the samples are subject to measurement errors. The measurement error problem can be summarized as estimating the distribution of a latent variable X, fX(), from an observed sample of X, a contaminated measurement of X, as follows1: fX(x)=fX|X(x|x)fX(x)dx, where both X and X have continuous support.

The conditional density fX|X describes the behavior of the measurement errors defined as XX. We focus on the estimation of the true model fX given the measurement error structure fX|X and a sample of X. A straightforward estimator is to solve for fX from Eq. (1) with fX replaced by its sample counterpart. In fact, Eq. (1) is a Fredholm integral equation of the first kind, which is notoriously ill-posed.2

The ill-posed inverse problems have been widely studied in statistics literature, and the main efforts in solving the problems were put into various regularization methods pioneered by Tikhonov (1963). In econometrics literature, economists also focus on constructing estimators and deriving optimal convergence rates of the estimators based on various regularization methods in a general setting, such as Eq. (1). (e.g., see Blundell et al. (2007), Chen and Reiss (2011), and Hall and Horowitz (2005))

In this paper, however, we show that the widely admitted ill-posed problem above is actually well-posed for self-reporting data, under the condition that interviewees report truthfully with a nonzero probability. The property of truthful-reporting can be observed from validation studies by Bollinger (1998) and Chen et al. (2008). Based on this property, we prove that the inverse problem Eq. (1) is in fact a Fredholm integral equation of the second kind, which is generally well-posed. We further employ the existing results in the literature to show that comparing with the case of ill-posedness, well-posedness can generally be translated into faster rates of convergence for the estimators of fX(). Hence the property of positive truth-reporting probability may help us gain great advantage in estimating the unknown distribution fX(). Therefore, we advocate that it is best for economists to exploit the property of self-reporting data while solving the inverse problems in measurement error models with a generally ill-posed setup, such as Eq. (1).

To further implore the implications of our results on well-posedness, we analyze the well-known classical measurement error case, where the error structure fX|X(x|x) is reduced to fϵ(xx). In this case, estimating the unknown density fX() is a deconvolution problem. We provide sufficient conditions under which a general deconvolution problem is well-posed, and we also present the convergence rate of the deconvolution estimator f̂X(). In general, this rate is faster than the existing ones in the literature (e.g., see Fan (1991)).

This paper points out that if self-reported errors satisfy that there is a nonzero probability of being zero, then the inverse problems in measurement error models are well-posed. In both general and classical measurement error cases, we show that for well-posed inverse problems, the achievable rates of convergence for estimating fX may be much faster than that available in the literature. These results imply that the estimation of the latent model fX from the observed sample of X may not be as technically challenging as previously thought. In this sense, our findings in this paper are important in economic applications. The importance of our findings is also due to the fact that the theoretical framework Eq. (1) generalizes many other interesting problems in economics. For instance, estimating the nonparametric structural function from an instrumental variable model in Newey and Powell (2003) is equivalent to estimating fX. The estimation of consumption based asset pricing Euler equations in Lewbel and Linton (2010) can also be described in the same framework as ours.3

We organize the rest of the paper as follows. In Section 2, we present a general setup of the inverse problem in measurement error models. In Section 3, we show the well-posedness of measurement error models for self-reporting data, and discuss the rates of convergence for f̂X when the problem is well-posed. In Section 4, we analyze the well-posedness in the case of classical measurement errors and present the convergence rate for the deconvolution estimator. In Section 5, we provide Monte Carlo evidence on the improvement that the property can make in estimating fX. In Section 6, we present an empirical illustration, using the data-set that matches self-reported earning from the CPS to employer-reported social security earnings (SSR) from 1978. Section 7 concludes. Proofs are in the Appendix.

Section snippets

A general setup

We are interested in the nonparametric estimation of the distribution of a latent variable X, fX(), given the known measurement error structure fX|X and a sample of X. The random sample {Xi}i=1,,n contains the contaminated measurements of the true values Xi in each observation i. The estimation of fX() is based on solving Eq. (1). Without loss of generality, we assume that the supports of X and X are the real line R and the inverse problem is defined on the Lp (1p) space over the

Measurement error models for self-reporting data

In this section, we show the well-posedness of measurement error models for self-reporting data and discuss the convergence rate for the nonparametric estimator of the latent distribution, f̂X. We first present a property observed in validation studies that individuals report the true values with a nonzero probability. As a consequence, the problem (2) becomes a Fredholm equation of the second kind and is well-posed. Next, we discuss the rates of convergence for f̂X in both well-posed and

A further discussion on the classical error case

In this section, we further explore the implications of Theorem 1 in a special case: the measurement error is classical, i.e., the measurement error ϵ is independent of the true value X.

For classical measurement errors, the error density fX|X(x|x) is reduced to fϵ(xx). Furthermore, it is known that the independence of X and ϵ implies that the characteristic functions of fX,fX, and fϵ (denoted by ϕX(),ϕX(), and ϕϵ(), respectively) have the following relationship: ϕX(t)=ϕX(t)ϕϵ(t).

Simulation studies: deconvolution with normal error

In this section, we conduct a simulation study to investigate the performance of various deconvolution estimators when the distribution of errors has a mass point at zero.

We consider X=X+ϵ, where X is distributed according to a truncated standard normal on the interval [1,1]. In this study, we estimate the density of X from a sample of X, and the known density of errors fϵ(). Following our discussions in previous sections, the density fϵ(xx) is assumed to be λδ(xx)+(1λ)g(xx), where λ

Empirical illustration

In this section, we illustrate our method empirically by using the data-set we analyzed in Section 3. Besides in Chen et al. (2008) and Bollinger (1998), the data-set has also been used in Bound and Krueger (1991) to study the extent of measurement error in earnings, and in Chen et al. (2005) to study the problem of parameter inference in econometric models when the data are measured with error. A full description of the data-set can be found in Bound and Krueger (1991).

For this data-set, Chen

Conclusions

In this paper, we consider the widely admitted ill-posed inverse problem for measurement error models. We show that measurement error models for self-reporting data are well-posed under the assumption that the probability of reporting truthfully is nonzero, which is supported by empirical evidences. This optimistic result suggests that researchers should not ignore the point mass at zero in the measurement error distribution when they model measurement errors in self-reported data. In fact,

References (25)

  • X. Chen et al.

    Measurement error models with auxiliary data

    Review of Economic Studies

    (2005)
  • Chen, X., Hong, H., Tarozzi, A., 2008. Semiparametric efficiency in GMM models of nonclassical measurement errors,...
  • Cited by (0)

    We are grateful to an associate editor, two anonymous referees, Chris Bollinger, Arthur Lewbel, Tong Li, Susanne Schennach, Stephen Shore, Richard Spady, Tiemen Woutersen, and seminar participants at the ESWC 2010 for helpful comments or discussions. We also thank Han Hong for sharing the dataset and Wendy Chi for proofreading the draft. All errors remain our own.

    View full text