Elsevier

Journal of Econometrics

Volume 161, Issue 2, 1 April 2011, Pages 182-202
Journal of Econometrics

Large panels with common factors and spatial correlation

https://doi.org/10.1016/j.jeconom.2010.12.003Get rights and content

Abstract

This paper considers methods for estimating the slope coefficients in large panel data models that are robust to the presence of various forms of error cross-section dependence. It introduces a general framework where error cross-section dependence may arise because of unobserved common effects and/or error spill-over effects due to spatial or other forms of local dependencies. Initially, this paper focuses on a panel regression model where the idiosyncratic errors are spatially dependent and possibly serially correlated, and derives the asymptotic distributions of the mean group and pooled estimators under heterogeneous and homogeneous slope coefficients, and for these estimators proposes non-parametric variance matrix estimators. The paper then considers the more general case of a panel data model with a multifactor error structure and spatial error correlations. Under this framework, the Common Correlated Effects (CCE) estimator, recently advanced by Pesaran (2006), continues to yield estimates of the slope coefficients that are consistent and asymptotically normal. Small sample properties of the estimators under various patterns of cross-section dependence, including spatial forms, are investigated by Monte Carlo experiments. Results show that the CCE approach works well in the presence of weak and/or strong cross-sectionally correlated errors.

Introduction

Over the past few years there has been a growing literature, both empirical and theoretical, on econometric analysis of panel data models with cross-sectionally dependent error processes. Such cross-correlations can arise for a variety of reasons, such as omitted common factors, spatial spill-overs, and interactions within socioeconomic networks. Conditioning on variables specific to the cross-section units alone does not deliver cross-section error independence; an assumption required by the standard literature on panel data models. In the presence of such dependence, conventional panel estimators such as fixed or random effects can result in misleading inference and even inconsistent estimators (Phillips and Sul, 2003). Further, conventional panel estimators may be inconsistent if regressors are correlated with unobserved common factors that might be causing the error cross-section dependence (Andrews, 2005).

Currently, there are two main strands in the literature for dealing with error cross-section dependence in panels where N is large relative to T, namely the residual multifactor and the spatial econometric approaches. The multifactor approach assumes that the cross-dependence can be characterized by a finite number of unobserved common factors, possibly due to economy-wide shocks that affect all units, albeit with different intensities. Under this framework, the error term is a linear combination of few common time-specific effects with heterogeneous factor loadings plus an idiosyncratic (individual-specific) error term. Estimation of a panel with such a multifactor residual structure can be addressed by using statistical techniques commonly adopted in factor analysis, such as the maximum likelihood (Robertson and Symons, 2000, Robertson and Symons, 2007), and the principal components procedures (Coakley et al., 2002, Bai, 2009). Recently, Pesaran (2006) has suggested an estimation method, referred to as Common Correlated Effects (CCE), that consists of approximating the linear combinations of the unobserved factors by cross-section averages of the dependent and explanatory variables and then running standard panel regressions augmented with these cross-section averages. An advantage of this approach is that it yields consistent estimates under a variety of situations, such as serial correlation in errors, unit roots in the factors and possible contemporaneous dependence of the observed regressors with the unobserved factors (Coakley et al., 2006, Kapetanios and Pesaran, 2007, Kapetanios et al., 2011).

The spatial approach assumes that the structure of the cross-section correlation is related to location and distance among units, defined according to a pre-specified metric. Proximity need not be measured in terms of physical space, but can be defined using other types of metrics, such as economic (Conley, 1999, Pesaran et al., 2004), policy, or social distance (Conley and Topa, 2002). Hence, cross-section correlation is represented by means of a spatial process, which explicitly relates each unit to its neighbours (Whittle, 1954). Estimation of panels with spatially correlated errors can be based on maximum likelihood (ML) techniques (Lee, 2004), or on the generalized method of moments (GMM) (Kelejian and Prucha, 1999, Lee, 2007, Kelejian and Prucha, 2009). Recently, non-parametric methods based on heteroskedasticity and autocorrelation consistent estimators applied to spatial models have also been proposed (Conley, 1999, Kelejian and Prucha, 2007, Bester et al., 2009).

In this paper we build on the existing literature and consider a general panel data model where error cross-section dependence is due to unobserved common factors and/or spatial dependence, whilst at the same time allow for the errors to be serially correlated. We focus on estimation and inference procedures that are robust to the presence of various forms of cross-sectional and temporal dependencies in the error processes. Robust methods are needed because the source and extent of error cross-section dependence is often unknown. The error cross-section dependence can take many different forms and its nature could differ at micro and macro levels. For instance, at a micro level, individual consumption behaviour can be influenced by economy-wide factors, such as changes in taxation and interest rates, and by local neighbourhood effects such as keeping up with the Jones’s (Cowan et al., 2004). In macroeconomics, several studies have argued business cycle fluctuations could be the result of both strategic interactions as well as aggregate technological shocks (Cooper and Haltiwanger, 1996). Our econometric specification, by allowing for the presence of both sources of contemporaneous error correlations, is sufficiently general and includes the models proposed in the literature as special cases.

We focus on estimation of slope coefficients in the case of a number of different specifications. Initially, we concentrate on a panel data model without unobserved factors where the errors are spatially dependent and possibly serially correlated, and derive the asymptotic distribution of the mean group and pooled estimators, under alternative assumptions regarding the slope coefficients. In the presence of heterogeneous slopes, we show that the non-parametric approach advanced by Pesaran (2006) continues to be applicable and can be used to obtain standard errors that are robust to both spatial and serial error correlations. However, in the case of homogeneous slopes the CCE procedure will not be applicable. In this case we propose a non-parametric variance matrix estimator that adapts the Newey and West (1987)’s heteroskedasticity autocorrelation consistent (HAC) procedure to allow for the spatial effects along the lines recently advanced by Kelejian and Prucha (2007). We refer to this variance estimator as spatial, heteroskedasticity, autocorrelation (SHAC) estimator. We then consider the more general case where the error term in the panel data model is composed of a multifactor structure and a spatial process, and show that Pesaran’s CCE approach continues to be valid and yields consistent estimates of the slope coefficients and their standard errors. We also show how to obtain consistent estimates of the errors in the panel to be used in tests of cross-section independence, and for further analysis of the underlying spatial processes.

Using Monte Carlo techniques, we investigate the small sample performance of the estimators under various patterns of error cross-section dependence, with and without error serial correlation, under both cases of heterogeneous and homogeneous slopes. We examine the performance of the alternative estimators when the errors only display spatial dependence, when they are subject to unobserved common factors as well as spatial dependence, and in the case where the source of cross-section dependence changes over time. Our results indicate that the mean group and pooled estimators with robust standard errors do work well under certain regularity conditions outlined in our theorems. However, under slope homogeneity or in the presence of unobserved common factors these estimators fail to provide the correct inference. The results also document the tendency of the tests based on HAC type standard errors to over reject the null hypothesis in small samples even in the case of error cross-section dependence which is purely spatial. In contrast, our Monte Carlo experiments clearly show that the augmentations of panel regressions with cross-section averages, as formulated by the CCE procedure, eliminates the effects of all forms of spatial and temporal correlations, irrespective of whether these are due to spatial and/or unobserved common factors. The small sample properties of CCE estimators do not seem to be affected by the heterogeneity assumptions on slope coefficients, or by the presence of error serial correlations. It is this level of robustness of the CCE estimator which particularly commends it for use in empirical analysis.

The plan of the remainder of the paper is as follows: Section 2 sets out a panel regression model with unobserved common factors and general spatial and temporal error processes. Section 3 develops the asymptotic distribution of the mean group and pooled estimators in the presence of spatial error dependence and error serial correlation. Section 4 considers the more general case where the errors also contain unobserved common factors, and establishes the validity of the CCE estimators for this class of models. Consistent estimation of the residuals from such models is considered in Section 5, where the necessary identification conditions are stated. Section 6 describes the Monte Carlo experiments and report the results. Section 7 ends with some concluding remarks.

Notation: λ1(A)λ2(A)λn(A) are the eigenvalues of a matrix AMn×n, where Mn×n is the space of real n×n matrices. A denotes a generalized inverse of A. The column norm of AMn×n is A1=max1jni=1n|aij|. The row norm of A is A=max1inj=1n|aij|. The Euclidean norm of A is A2=[Tr(AA)]1/2. K is used for a fixed positive constant. (N,T)j denotes N and T tending to infinity jointly but in no particular order.

Section snippets

Heterogenous panels with unobserved common factors and spatial error correlation

We begin with a general specification where the dependent variable is a function of a set of individual-specific regressors, a linear combination of common observed and unobserved factors, and includes errors that are serially and spatially correlated. Let yit be the observation on the ith cross-section unit at time t for i=1,2,,N; t=1,2,,T, and suppose that it is generated as yit=αidt+βixit+γift+eit, where dt=(d1t,d2t,,dnt) is a n×1 vector of observed common effects, and xit is a k×1

Estimating panels with spatial error correlation

The literature on spatial econometrics typically considers the problem of spatial dependence under strong assumptions of homogeneity and temporal independence. Only recently, a strand of literature in spatial econometrics has considered the incorporation of unobserved heterogeneity in spatial panel data models, where N is usually assumed to be large relative to T. Baltagi et al. (2003) and Kapoor et al. (2007) have focused on ML and GMM estimation of panels where the error term is the sum of an

Estimating panels with unobserved common factors and spatial error correlation

We now turn to the estimation of the slope coefficients in the context of panels with both common factors and spatial error dependence. We restrict our attention to the CCE approach since, as compared to other existing methods, it is simple to apply and has been shown to be robust to the choice of m (the number of common factors), the temporal dynamics of unobserved common factors, and the idiosyncratic error. The idea underlying this approach is that, as far as estimation of the slope

Residuals from CCE regression

We now consider the consistent estimation of regression errors uit=yitαidtβixit in model (1). Estimation of uit is needed for computing tests of error cross-section independence, or when the objects of interest are the coefficients of the spatial process, eit. Before continuing, without loss of generality, we specify some further assumptions on the observed and unobserved common factors. In particular:

Assumption 11

E(ft)=0, for t=1,,T, and the n×1 vector of observed common factors, dt, is distributed

Monte Carlo design

This section provides Monte Carlo evidence on the small sample properties of our estimators, under a range of assumptions on the stochastic process generating the error terms. The study is comprised of three sets of experiments. In the first set, we consider a panel where the error term is generated by a SAR process and with no common factors. In the second set, we assume that the error process is the orthogonal sum of a factor structure and a spatial process, and allow the dependent variable

Concluding remarks

The main aim of this paper has been to consider estimation of a panel regression model under a number of different specifications of cross-section error correlations, such as spatial and/or common factor models. We have derived the asymptotic distributions of the mean group and pooled estimators for a panel regression model where the source of error cross-section dependence is purely spatial or results from omitted unobserved factors, or both. In each case we have distinguished between panels

References (48)

  • D. Robertson et al.

    Maximum likelihood factor analysis with rank-deficient sample covariance matrices

    Journal of Multivariate Analysis

    (2007)
  • D. Andrews

    Cross section regression with common shocks

    Econometrica

    (2005)
  • L. Anselin

    Spatial Econometrics: Methods and Models

    (1988)
  • J. Bai

    Panel data models with interactive fixed effects

    Econometrica

    (2009)
  • N.K. Bakirov et al.

    Student’s t-test for Gaussian scale mixtures

    Journal of Mathematical Sciences

    (2006)
  • Baltagi, B.H., Egger, P., Pfaffermayr, M., 2009. A generalized spatial panel data model with random effects. Center for...
  • D.S. Bernstein

    Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory

    (2005)
  • Bester, C.A., Conley, T.G., Hansen, C.B., 2009. Inference with dependent data using cluster covariance estimators....
  • Chudik, A., Pesaran, M.H., Tosetti, E., 2010. Weak and strong cross section dependence and estimation of large panels....
  • K.L. Chung

    A Course in Probability Theory

    (2001)
  • Coakley, J., Fuertes, A.M., Smith, R., 2002. A principal components approach to cross-section dependence in panels....
  • T.G. Conley et al.

    Socio-economic distance and spatial patterns in unemployment

    Journal of Applied Econometrics

    (2002)
  • R. Cooper et al.

    Evidence on macroeconomic complementarities

    The Review of Economics and Statistics

    (1996)
  • R. Cowan et al.

    Waves in consumption with interdependence among consumers

    Canadian Journal of Economics

    (2004)
  • Cited by (377)

    • Cross-section bootstrap for CCE regressions

      2024, Journal of Econometrics
    View all citing articles on Scopus

    We are grateful to the Editor (Cheng Hsiao), an Associate Editor and three anonymous referees, Badi Baltagi, Alexander Chudik and George Kapetanios for helpful comments and suggestions. Elisa Tosetti acknowledges financial support from ESRC (Ref. no. RES-061-25-0317).

    View full text