Panel data are analyzed to investigate group and time effects using fixed effect and random effect models. The fixed effect model asks how
group and/or time affect the intercept, while the random effect model analyzes error variance structures affected by group and/or time.
Slopes are assumed unchanged in both fixed effect and random effect models.
A panel data set needs to be arranged in the long format as shown in Section 1.1. If the number of groups (subjects) or time periods is
extremely large, panel data models may be less useful because the null hypothesis of F test is too strong. Then, you may consider
categorizing subjects to reduce the number of groups. If data are severely unbalanced, read output with caution and consider dropping
subjects with many missing data points. This document assumes that data are balanced without missing values.
Fixed effect models are estimated by the least squares dummy variable (LSDV) regression and within effect model. LSDV has three approaches
to avoid perfect multicollinearity. LSDV1 drops a dummy, LSDV2 suppresses the intercept, and LSDV3 includes all dummies and imposes
restrictions instead. LSDV1 is commonly used since it produces correct statistics. LSDV2 provides actual parameter estimates of groups
(Y-intercepts), but reports incorrect R2 and F statistic. Notice that the dummy parameters of three LSDV approaches have different
meanings and thus conduct different t-tests.
The within effect model does not use dummy variables but deviations from group means. Thus, this model is useful when there are
many groups and/or time periods in the panel data set since it is able to avoid the incidental parameter problem. The dummy
parameter estimates need to be computed afterward. Because of its larger degrees of freedom, the within effect model produces
incorrect MSE and standard errors of parameters. As a result, you need to adjust the standard errors to conduct correct t-tests.
Random effect models are estimated by the generalized least squares (GLS) and the feasible generalization least squares (FGLS). When the
variance structure is known, GLS is used. If unknown, FGLS estimates theta. Parameter estimates vary depending on estimation methods.
Fixed effects are tested by the F-test and random effects by the Breusch-Pagan Lagrange multiplier test. The Hausman specification test
compares a fixed effect model and a random effect model. If the null hypothesis of uncorrelation is rejected, the fixed effect model is
preferred. Poolabiltiy is tested by running group by group or time by time regressions.
Among the four statistical packages addressed in this document, I would recommend SAS and Stata. In particular, PROC PANEL provides
various ways of analyzing panel data and report correct (adjusted) statistics (see Table 4.1 and 7.1). Stata is very handy to
manipulate panel data reports incorrect F-test and R2. LIMDEP is able to estimate various panel data models but does not good at
data management. SPSS is least recommended for panel data models.
Extensions to these basic linear panel data models include dynamic models with autocorrelation, random coefficient model, and
hierarchical linear model, and logit/probit models.
Appendix: Data Sets
Data set 1
: Data of the top 50 information technology firms presented in OECD Information Technology Outlook 2004 (http://thesius.sourceoecd.org/
Download (UITS Stat/Math Center): CSV (ASCII)
- firm = IT company name
- type = type of IT firm
- rnd = 2002 R&D investment in current USD millions
- income = 2000 net income in current USD millions
- d1 = 1 for equipment and software firms and 0 for telecommunication and electronics
The followings summarize descriptive statistics of these variables. Note that there are many zero counts that indicate an overdispersion problem.
. tab type d1
Type of Firm |
IT Equipment |
Service & S/W | 0
4 | 4
. sum rnd income
Obs Mean Std.
rnd | 39
income | 50
Data set 2
: Cost data for U.S. airlines (1970-1984) presented in Greene (2003).
Download (UITS Stat/Math Center):
- airline = airline (six airlines)
- year = year (fifteen years)
- output0 = output in revenue passenger miles, index number
- cost0 = total cost in $1,000
- fuel0 = fuel price
- load = load factor, the average capacity utilization of the fleet
panel variable: airline, 1 to 6
time variable: year, 1 to 15
. sum output0 cost0 fuel0 load
Obs Mean Std.
output0 | 90
.5449946 .5335865 .037682
cost0 | 90
329502.9 103795 1015610
load | 90
.5604602 .0527934 .432066
- Baltagi, Badi H. 2001. Econometric Analysis of Panel Data. Wiley, John & Sons.
- Baltagi, Badi H., and Young-Jae Chang. 1994. "Incomplete Panels: A Comparative Study of Alternative Estimators for the Unbalanced One-way Error Component Regression Model." Journal of Econometrics, 62(2): 67-89.
- Breusch, T. S., and A. R. Pagan. 1980. "The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics." Review of Economic Studies, 47(1):239-253.
- Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.
- Cameron, A. Colin, and Pravin K. Trivedi. 2009. Microeconometrics Using Stata. TX: Stata
- Freund, Rudolf J., and Ramon C. Littell. 2000. SAS System for Regression, 3rd ed. Cary, NC: SAS Institute.
- Fuller, Wayne A. and George E. Battese. 1973. "Transformations for Estimation of Linear Models with Nested-Error Structure." Journal of the American Statistical Association, 68(343) (September): 626-632.
- Fuller, Wayne A. and George E. Battese. 1974. "Estimation of Linear Models with Crossed-Error Structure." Journal of Econometrics, 2: 67-78.
- Greene, William H. 2003. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Prentice Hall.
- Greene, William H. 2007. LIMDEP Version 9.0 Econometric Modeling Guide. Plainview, New York:
- Hausman, J. A. 1978. "Specification Tests in Econometrics." Econometrica, 46(6):1251-1271.
- SAS Institute. 2004. SAS/ETS 9.1 Userís Guide. Cary, NC: SAS Institute.
- SAS Institute. 2004. SAS/STAT 9.1 User's Guide. Cary, NC: SAS Institute.
- Stata Press. 2007. Stata Base Reference Manual, Release 10. College Station, TX: Stata
- Stata Press. 2007. Stata Longitudinal/Panel Data Reference Manual, Release 10. College Station, TX: Stata Press.
- Stata Press. 2007. Stata Time-Series Reference Manual, Release 10. College Station, TX: Stata Press.
- Suits, Daniel B. 1984. Dummy Variables: Mechanics V. Interpretation. Review of Economics & Statistics 66 (1):177-180.
- Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.
I have to thank Dr. Heejoon Kang of the Kelley School of Business and Dr. David H. Good of the School of Public and Environmental
Affairs, Indiana University at Bloomington. I am also grateful to Jeremy Albright and Kevin Wilhite at the UITS Center for Statistical
and Mathematical Computing for comments and suggestions. A special thanks to many readers around the world who have eagerly provided
constructive feedback and encouraged me to keep improving this document.
- 2005.11 First draft.
- 2008.04, 11 Corrected some errors
- 2009.09 Second draft.