A panel data set contains n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t
time period. Thus, the total number of observations is nT. Ideally, panel data are measured at regular time intervals (e.g., year,
quarter, and month). Otherwise, panel data should be analyzed with caution. A short panel data set has many entities but few time periods
(small T), while a long panel has many time periods (large T) but few entities (Cameron and Trivedi 2009: 230).
1.1 Data Arrangement
Panel data have a cross-sectional (or group) variable and a time-series variable. In Stata, this arrangement is called the long form (as
opposed to the wide form). While the long form has both group (individual level) and time variables, the wide form includes either group
or time variable. Look at the following data set to see how panel data are arranged. There are 6 groups (airlines) and 15 time periods
(years). The .use command below loads a Stata data set through TCP/IP and in 1/20 of the .list command displays the first 20 observations.
If data are structured in the wide form, you need to rearrange data first. Stata has the .reshape command to rearrange a data set back and
forth between the long and wide form. The following command changes from the long form to wide one so that the wide form has only six
observations that have a group variable and as many variables as the time period (4*15 year).
. keep airline year load cost output fuel
. reshape wide cost output fuel load, i(airline) j(year)
If you wish to rearrange the data set back to the long form, run the following command.
. reshape long cost output fuel load, i(airline) j(year)
In balanced panel data, all entities have measurements in all time periods. In a contingency table of cross-sectional and time-series
variables, each cell should have only one frequency. When each entity in a data set has different numbers of observations due to missing
values, the panel data are not balanced. Some cells in the contingency table have zero frequency. In unbalanced panel data, the total
number of observations is not nT. Unbalanced panel data entail some computational and estimation issues although most software packages
are able to handle both balanced and unbalanced data.
1.2 Fixed Effect versus Random Effect Models
Panel data models examine fixed and/or random effects of entity (individual or subject) or time. The core difference between fixed and
random effect models lies in the role of dummy variables. If dummies are considered as a part of the intercept, this is a fixed effect
model. In a random effect model, the dummies act as an error term (see Table 1.1).
A fixed group effect model examines group differences in intercepts, assuming the same slopes and constant variance across entities or
subjects. Since a group (individual specific) effect is time invariant and considered a part of the intercept, ui is allowed to be
correlated to other regressors. Fixed effect models use least squares dummy variable (LSDV) and within effect estimation methods. Ordinary
least squares (OLS) regressions with dummies, in fact, are fixed effect models.
Table 1.1 Fixed Effect and Random Effect Models
A random effect model, by contrast, estimates variance components for groups (or times) and error, assuming the same intercept and slopes.
ui is a part of errors and thus should not be correlated to any regressor; otherwise, a core OLS assumption is violated. The difference
among groups (or time periods) lies in their variance of the error term, not in their intercepts. A random effect model is estimated by
generalized least squares (GLS) when the omega matrix, a variance structure among groups, is known. The feasible generalized least
squares (FGLS) method is used to estimate the variance structure when omega is not known. A typical example is the groupwise heteroscedastic
regression model (Greene 2003). There are various estimation methods for FGLS including the maximum likelihood method and simulation
(Baltagi and Cheng 1994).
Fixed effects are tested by the (incremental) F test, while random effects are examined by the Lagrange Multiplier (LM) test (Breusch and
Pagan 1980). If the null hypothesis is not rejected, the pooled OLS regression is favored. The Hausman specification test (Hausman 1978)
compares fixed effect and random effect models. Table 1.1 compares the fixed effect and random effect models.
If one cross-sectional or time-series variable is considered (e.g., country, firm, and race), this is called a one-way fixed or random
effect model. Two-way effect models have two sets of dummy variables for group and/or time variables (e.g., state and year).
1.3 Estimation and Software Issues
The LSDV regression, within effect model, between effect model (group or time mean model), GLS, and FGLS are fundamentally based on OLS in
terms of estimation. Thus, any procedure and command for OLS is good for linear panel data models (Table 1.2).
The REG procedure of SAS/STAT, Stata .regress (.cnsreg), LIMDEP regress$, and SPSS regression commands all fit LSDV1 by dropping one dummy
and have options to suppress the intercept (LSDV2). SAS, Stata, and LIMDEP can estimate OLS with restrictions (LSDV3), but SPSS cannot. In
Stata, .cnsreg command requires restrictions defined in the .constraint command (Table 1.2).
Table 1.2 Procedures and Commands in SAS, Stata, LIMDEP, and SPSS
||w/o a dummy||w/o a dummy||w/o a dummy||w/o a dummy
|One-way fixed (within effect)
||PANEL /FIXONE||.xtreg, fe; .areg, abs||Regress;Panel;Str=; Fixed$||N/A
|Two-way fixed (within effect)
||PANEL /FIXTWO||N/A||Regress;Panel;Str=; Period=;Fixed$||N/A
||PANEL /BTWNGO; PANEL /BTWNT||.xtreg, be||Regress;Panel;Str=; Means$||N/A
|One-way random effect
|.xtreg, re||Regress;Panel;Str=; Random$||N/A
|Two-way random effect
||PANEL /RANTWO||.xtmixed||Regress;Panel;Str=;Period=; Random$||N/A
|Random coefficient model
SAS, Stata, and LIMDEP also provide the procedures and commands that estimate panel data models in a convenient way
(Table 1.2). SAS/ETS has the
TSCSREG and PANEL procedures to estimate one-way and two-way fixed/random effect models. These procedures estimate the within effect
model for a fixed effect model and by default employ the Fuller-Battese method (1974) to estimate variance components for group, time, and
error for a random effect model. PROC TSCSREG and PROC PANEL also support other estimation methods such as Parks (1967) autoregressive
model and Da Silva moving average method.
PROC TSCSREG can handle balanced data only, whereas PROC PANEL is able to deal with balanced and unbalanced data. PROC PANEL requires each
entity (subject) has more than one observation. PROC TSCSREG provides one-way and two-way fixed and random effect models, while PROC PANEL
supports the between effect model (/BTWNT and /BTWNG) and pooled OLS regression (/POOLED) as well. PROC PANEL has BP and BP2 options to
conduct the Breusch-Pagen LM test for random effects, while PROC TSCSREG does not. Despite advanced features of PROC PANEL, the output
of the two procedures is similar. PROC MIXED is also able to fit random effect and random coefficient (parameter)
supports maximum likelihood estimation that is not available in PROC PANEL and TSCSREG.
The Stata .xtreg command estimates a within effect (fixed effect) model with the fe option, a between effect model with be, and a random
effect model with re. This command, however, does not directly fit two-way fixed and random effect models. The .areg command with the
absorb option, equivalent to the .xtreg with the fe option, fits the one-way within effect model that has a large dummy variable set. A
random effect model can be also estimated using the .xtmixed command. Stata has .xtgls that fits panel data models with heteroscedasticity
across groups and/or autocorrelation within groups.
The LIMDEP Regress$ command with the Panel subcommand estimates panel data models. The Fixed effect subcommand fits a fixed effect model,
Random effect estimates a random effect model, and Means is for a between effect model. SPSS has limited ability to analyze panel data.
1.4 Data Sets
This document uses two data sets. A cross-sectional data set contains research and development (R&D) expenditure data of the top 50
information technology firms presented in OECD Information Technology Outlook 2004. A panel data set has cost data for U.S. airlines
(1970-1984), which are used in Econometric Analysis (Greene 2003). See the Appendix for the details.