1. Introduction


A panel data set contains n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. Thus, the total number of observations is nT. Ideally, panel data are measured at regular time intervals (e.g., year, quarter, and month). Otherwise, panel data should be analyzed with caution. A short panel data set has many entities but few time periods (small T), while a long panel has many time periods (large T) but few entities (Cameron and Trivedi 2009: 230).

1.1 Data Arrangement

Panel data have a cross-sectional (or group) variable and a time-series variable. In Stata, this arrangement is called the long form (as opposed to the wide form). While the long form has both group (individual level) and time variables, the wide form includes either group or time variable. Look at the following data set to see how panel data are arranged. There are 6 groups (airlines) and 15 time periods (years). The .use command below loads a Stata data set through TCP/IP and in 1/20 of the .list command displays the first 20 observations.

If data are structured in the wide form, you need to rearrange data first. Stata has the .reshape command to rearrange a data set back and forth between the long and wide form. The following command changes from the long form to wide one so that the wide form has only six observations that have a group variable and as many variables as the time period (4*15 year).

. keep airline year load cost output fuel

. reshape wide cost output fuel load, i(airline) j(year)

If you wish to rearrange the data set back to the long form, run the following command.

. reshape long cost output fuel load, i(airline) j(year)

In balanced panel data, all entities have measurements in all time periods. In a contingency table of cross-sectional and time-series variables, each cell should have only one frequency. When each entity in a data set has different numbers of observations due to missing values, the panel data are not balanced. Some cells in the contingency table have zero frequency. In unbalanced panel data, the total number of observations is not nT. Unbalanced panel data entail some computational and estimation issues although most software packages are able to handle both balanced and unbalanced data.

1.2 Fixed Effect versus Random Effect Models

Panel data models examine fixed and/or random effects of entity (individual or subject) or time. The core difference between fixed and random effect models lies in the role of dummy variables. If dummies are considered as a part of the intercept, this is a fixed effect model. In a random effect model, the dummies act as an error term (see Table 1.1).

A fixed group effect model examines group differences in intercepts, assuming the same slopes and constant variance across entities or subjects. Since a group (individual specific) effect is time invariant and considered a part of the intercept, ui is allowed to be correlated to other regressors. Fixed effect models use least squares dummy variable (LSDV) and within effect estimation methods. Ordinary least squares (OLS) regressions with dummies, in fact, are fixed effect models.

Table 1.1 Fixed Effect and Random Effect Models

A random effect model, by contrast, estimates variance components for groups (or times) and error, assuming the same intercept and slopes. ui is a part of errors and thus should not be correlated to any regressor; otherwise, a core OLS assumption is violated. The difference among groups (or time periods) lies in their variance of the error term, not in their intercepts. A random effect model is estimated by generalized least squares (GLS) when the omega matrix, a variance structure among groups, is known. The feasible generalized least squares (FGLS) method is used to estimate the variance structure when omega is not known. A typical example is the groupwise heteroscedastic regression model (Greene 2003). There are various estimation methods for FGLS including the maximum likelihood method and simulation (Baltagi and Cheng 1994).

Fixed effects are tested by the (incremental) F test, while random effects are examined by the Lagrange Multiplier (LM) test (Breusch and Pagan 1980). If the null hypothesis is not rejected, the pooled OLS regression is favored. The Hausman specification test (Hausman 1978) compares fixed effect and random effect models. Table 1.1 compares the fixed effect and random effect models.

If one cross-sectional or time-series variable is considered (e.g., country, firm, and race), this is called a one-way fixed or random effect model. Two-way effect models have two sets of dummy variables for group and/or time variables (e.g., state and year).

1.3 Estimation and Software Issues

The LSDV regression, within effect model, between effect model (group or time mean model), GLS, and FGLS are fundamentally based on OLS in terms of estimation. Thus, any procedure and command for OLS is good for linear panel data models (Table 1.2).

The REG procedure of SAS/STAT, Stata .regress (.cnsreg), LIMDEP regress$, and SPSS regression commands all fit LSDV1 by dropping one dummy and have options to suppress the intercept (LSDV2). SAS, Stata, and LIMDEP can estimate OLS with restrictions (LSDV3), but SPSS cannot. In Stata, .cnsreg command requires restrictions defined in the .constraint command (Table 1.2).

Table 1.2 Procedures and Commands in SAS, Stata, LIMDEP, and SPSS

SAS 9.2 Stata 11 Limdep 9 SPSS 17
Regression(OLS) PROC REG.regressRegress$Regression
LSDV1 w/o a dummyw/o a dummyw/o a dummyw/o a dummy
LSDV2 /NOINT,noconstantw/o ONE/Origin
LSDV3 RESTRICT.cnsregCls:N/A
One-way fixed (within effect) PANEL /FIXONE.xtreg, fe; .areg, absRegress;Panel;Str=; Fixed$N/A
Two-way fixed (within effect) PANEL /FIXTWON/ARegress;Panel;Str=; Period=;Fixed$N/A
Between effect PANEL /BTWNGO; PANEL /BTWNT.xtreg, beRegress;Panel;Str=; Means$N/A
One-way random effect PANEL /RANONE
MIXED /RANDOM
.xtreg, reRegress;Panel;Str=; Random$N/A
Two-way random effect PANEL /RANTWO.xtmixedRegress;Panel;Str=;Period=; Random$N/A
Random coefficient model MIXED /RANDOM.xtmixed
.xtrc
Regress;PRM=;Str=;$N/A

SAS, Stata, and LIMDEP also provide the procedures and commands that estimate panel data models in a convenient way (Table 1.2). SAS/ETS has the TSCSREG and PANEL procedures to estimate one-way and two-way fixed/random effect models. These procedures estimate the within effect model for a fixed effect model and by default employ the Fuller-Battese method (1974) to estimate variance components for group, time, and error for a random effect model. PROC TSCSREG and PROC PANEL also support other estimation methods such as Parks (1967) autoregressive model and Da Silva moving average method.

PROC TSCSREG can handle balanced data only, whereas PROC PANEL is able to deal with balanced and unbalanced data. PROC PANEL requires each entity (subject) has more than one observation. PROC TSCSREG provides one-way and two-way fixed and random effect models, while PROC PANEL supports the between effect model (/BTWNT and /BTWNG) and pooled OLS regression (/POOLED) as well. PROC PANEL has BP and BP2 options to conduct the Breusch-Pagen LM test for random effects, while PROC TSCSREG does not. Despite advanced features of PROC PANEL, the output of the two procedures is similar. PROC MIXED is also able to fit random effect and random coefficient (parameter) models and supports maximum likelihood estimation that is not available in PROC PANEL and TSCSREG.

The Stata .xtreg command estimates a within effect (fixed effect) model with the fe option, a between effect model with be, and a random effect model with re. This command, however, does not directly fit two-way fixed and random effect models. The .areg command with the absorb option, equivalent to the .xtreg with the fe option, fits the one-way within effect model that has a large dummy variable set. A random effect model can be also estimated using the .xtmixed command. Stata has .xtgls that fits panel data models with heteroscedasticity across groups and/or autocorrelation within groups.

The LIMDEP Regress$ command with the Panel subcommand estimates panel data models. The Fixed effect subcommand fits a fixed effect model, Random effect estimates a random effect model, and Means is for a between effect model. SPSS has limited ability to analyze panel data.

1.4 Data Sets

This document uses two data sets. A cross-sectional data set contains research and development (R&D) expenditure data of the top 50 information technology firms presented in OECD Information Technology Outlook 2004. A panel data set has cost data for U.S. airlines (1970-1984), which are used in Econometric Analysis (Greene 2003). See the Appendix for the details.


Up: Table of Contents
Next: Least Squares Dummy Variable Regression