9. Conclusion


Panel data are analyzed to investigate group and time effects using fixed effect and random effect models. The fixed effect model asks how group and/or time affect the intercept, while the random effect model analyzes error variance structures affected by group and/or time. Slopes are assumed unchanged in both fixed effect and random effect models.

A panel data set needs to be arranged in the long format as shown in Section 1.1. If the number of groups (subjects) or time periods is extremely large, panel data models may be less useful because the null hypothesis of F test is too strong. Then, you may consider categorizing subjects to reduce the number of groups. If data are severely unbalanced, read output with caution and consider dropping subjects with many missing data points. This document assumes that data are balanced without missing values.

Fixed effect models are estimated by the least squares dummy variable (LSDV) regression and within effect model. LSDV has three approaches to avoid perfect multicollinearity. LSDV1 drops a dummy, LSDV2 suppresses the intercept, and LSDV3 includes all dummies and imposes restrictions instead. LSDV1 is commonly used since it produces correct statistics. LSDV2 provides actual parameter estimates of groups (Y-intercepts), but reports incorrect R2 and F statistic. Notice that the dummy parameters of three LSDV approaches have different meanings and thus conduct different t-tests.

The within effect model does not use dummy variables but deviations from group means. Thus, this model is useful when there are many groups and/or time periods in the panel data set since it is able to avoid the incidental parameter problem. The dummy parameter estimates need to be computed afterward. Because of its larger degrees of freedom, the within effect model produces incorrect MSE and standard errors of parameters. As a result, you need to adjust the standard errors to conduct correct t-tests.

Random effect models are estimated by the generalized least squares (GLS) and the feasible generalization least squares (FGLS). When the variance structure is known, GLS is used. If unknown, FGLS estimates theta. Parameter estimates vary depending on estimation methods.

Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange multiplier test. The Hausman specification test compares a fixed effect model and a random effect model. If the null hypothesis of uncorrelation is rejected, the fixed effect model is preferred. Poolabiltiy is tested by running group by group or time by time regressions.

Among the four statistical packages addressed in this document, I would recommend SAS and Stata. In particular, PROC PANEL provides various ways of analyzing panel data and report correct (adjusted) statistics (see Table 4.1 and 7.1). Stata is very handy to manipulate panel data reports incorrect F-test and R2. LIMDEP is able to estimate various panel data models but does not good at data management. SPSS is least recommended for panel data models.

Extensions to these basic linear panel data models include dynamic models with autocorrelation, random coefficient model, and hierarchical linear model, and logit/probit models.

Top


Appendix: Data Sets


Data set 1: Data of the top 50 information technology firms presented in OECD Information Technology Outlook 2004 (http://thesius.sourceoecd.org/).

Download (UITS Stat/Math Center): CSV (ASCII) | Stata
  • firm = IT company name
  • type = type of IT firm
  • rnd = 2002 R&D investment in current USD millions
  • income = 2000 net income in current USD millions
  • d1 = 1 for equipment and software firms and 0 for telecommunication and electronics
The followings summarize descriptive statistics of these variables. Note that there are many zero counts that indicate an overdispersion problem.

. tab type d1

   Type of Firm |         0          1 |     Total
----------------+----------------------+----------
        Telecom |        18          0 |        18
    Electronics |        17          0 |        17
   IT Equipment |         0          6 |         6
Comm. Equipment |         0          5 |         5
  Service & S/W |         0          4 |         4
----------------+----------------------+----------
          Total |        35         15 |        50

. sum rnd income

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         rnd |        39    2023.564    1615.417          0       5490
      income |        50     2509.78    3104.585       -732      11797


Data set 2: Cost data for U.S. airlines (1970-1984) presented in Greene (2003).
URL: http://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm

Download (UITS Stat/Math Center): CSV (ASCII) | Stata | SAS | LIMDEP
  • airline = airline (six airlines)
  • year = year (fifteen years)
  • output0 = output in revenue passenger miles, index number
  • cost0 = total cost in $1,000
  • fuel0 = fuel price
  • load = load factor, the average capacity utilization of the fleet

. tsset

       panel variable:  airline, 1 to 6
        time variable:  year, 1 to 15

. sum output0 cost0 fuel0 load

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     output0 |        90    .5449946    .5335865    .037682    1.93646
       cost0 |        90     1122524     1192075      68978    4748320
       fuel0 |        90      471683    329502.9     103795    1015610
        load |        90    .5604602    .0527934    .432066    .676287

Top


References

  • Baltagi, Badi H. 2001. Econometric Analysis of Panel Data. Wiley, John & Sons.
  • Baltagi, Badi H., and Young-Jae Chang. 1994. "Incomplete Panels: A Comparative Study of Alternative Estimators for the Unbalanced One-way Error Component Regression Model." Journal of Econometrics, 62(2): 67-89.
  • Breusch, T. S., and A. R. Pagan. 1980. "The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics." Review of Economic Studies, 47(1):239-253.
  • Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.
  • Cameron, A. Colin, and Pravin K. Trivedi. 2009. Microeconometrics Using Stata. TX: Stata Press.
  • Freund, Rudolf J., and Ramon C. Littell. 2000. SAS System for Regression, 3rd ed. Cary, NC: SAS Institute.
  • Fuller, Wayne A. and George E. Battese. 1973. "Transformations for Estimation of Linear Models with Nested-Error Structure." Journal of the American Statistical Association, 68(343) (September): 626-632.
  • Fuller, Wayne A. and George E. Battese. 1974. "Estimation of Linear Models with Crossed-Error Structure." Journal of Econometrics, 2: 67-78.
  • Greene, William H. 2003. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Prentice Hall.
  • Greene, William H. 2007. LIMDEP Version 9.0 Econometric Modeling Guide. Plainview, New York: Econometric Software.
  • Hausman, J. A. 1978. "Specification Tests in Econometrics." Econometrica, 46(6):1251-1271.
  • SAS Institute. 2004. SAS/ETS 9.1 User’s Guide. Cary, NC: SAS Institute.
  • SAS Institute. 2004. SAS/STAT 9.1 User's Guide. Cary, NC: SAS Institute.
  • Stata Press. 2007. Stata Base Reference Manual, Release 10. College Station, TX: Stata Press.
  • Stata Press. 2007. Stata Longitudinal/Panel Data Reference Manual, Release 10. College Station, TX: Stata Press.
  • Stata Press. 2007. Stata Time-Series Reference Manual, Release 10. College Station, TX: Stata Press.
  • Suits, Daniel B. 1984. Dummy Variables: Mechanics V. Interpretation. Review of Economics & Statistics 66 (1):177-180.
  • Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.


Acknowledgements

I have to thank Dr. Heejoon Kang of the Kelley School of Business and Dr. David H. Good of the School of Public and Environmental Affairs, Indiana University at Bloomington. I am also grateful to Jeremy Albright and Kevin Wilhite at the UITS Center for Statistical and Mathematical Computing for comments and suggestions. A special thanks to many readers around the world who have eagerly provided constructive feedback and encouraged me to keep improving this document.


Revision History

  • 2005.11 First draft.
  • 2008.04, 11 Corrected some errors
  • 2009.09 Second draft.


Up: Table of Contents
Prev: Poolability Test