Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

## 4. The Fixed Group Effect Model

The one-way fixed group model examines group differences in the intercepts. The LSDV for this fixed model needs to create as many dummy variables as the number of groups or subjects. When many dummies are needed, the within effect model is useful since it transforms variables using group means to avoid dummies. The between effect model uses group means of variables.

4.1 The Pooled OLS Regression Model

Let us first consider the pooled model without dummy variables.

. regress cost output fuel load // pooled model

Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  3,    86) = 2419.34
Model |  112.705452     3  37.5684839           Prob > F      =  0.0000
Residual |  1.33544153    86   .01552839           R-squared     =  0.9883
Total |  114.040893    89  1.28135835           Root MSE      =  .12461

------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
output |   .8827385   .0132545    66.60   0.000     .8563895    .9090876
fuel |    .453977   .0203042    22.36   0.000     .4136136    .4943404
load |   -1.62751    .345302    -4.71   0.000    -2.313948   -.9410727
_cons |   9.516923   .2292445    41.51   0.000       9.0612    9.972645
------------------------------------------------------------------------------
cost = 9.517 + .883*output +.454*fuel -1.628*load.

This model fits the data well (p<.0000 and R2=.9883). We may, however, suspect fixed group effects that produce different intercepts across groups. As discussed in Chapter 2, there are three equivalent approaches of LSDV. They report the identical parameter estimates of regresors excluding dummies. Let us begin with LSDV1.

Top

4.2 LSDV1 without a Dummy

LSDV1 drops a dummy variable to identify the model. LSDV1 produces correct ANOVA information, goodness of fit, parameter estimates, and standard errors. As a consequence, this approach is commonly used in practice. LSDV produces six regression equations for six groups (airlines).

Group1: cost = 9.706 + .919*output +.417*fuel -1.070*load
Group2: cost = 9.665 + .919*output +.417*fuel -1.070*load
Group3: cost = 9.497 + .919*output +.417*fuel -1.070*load
Group4: cost = 9.891 + .919*output +.417*fuel -1.070*load
Group5: cost = 9.730 + .919*output +.417*fuel -1.070*load
Group6: cost = 9.793 + .919*output +.417*fuel -1.070*load

In SAS, the REG procedure fits the OLS regression model. Let us drop the last dummy g6, the reference point.

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Used          90

Analysis of Variance

Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     8      113.74827       14.21853    3935.79    <.0001
Error                    81        0.29262        0.00361
Corrected Total          89      114.04089

Root MSE              0.06011    R-Square     0.9974
Dependent Mean       13.36561    Adj R-Sq     0.9972
Coeff Var             0.44970

Parameter Estimates

Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        9.79300        0.26366      37.14      <.0001
g1            1       -0.08706        0.08420      -1.03      0.3042
g2            1       -0.12830        0.07573      -1.69      0.0941
g3            1       -0.29598        0.05002      -5.92      <.0001
g4            1        0.09749        0.03301       2.95      0.0041
g5            1       -0.06301        0.02389      -2.64      0.0100
output        1        0.91928        0.02989      30.76      <.0001
fuel          1        0.41749        0.01520      27.47      <.0001
load          1       -1.07040        0.20169      -5.31      <.0001

Note that the parameter estimate of g6 is presented in the intercept (9.793). Other dummy parameter estimates are computed with the reference point. The actual intercept of the group 1, for example, is computed as 9.706 = 9.793 + (-.087)*1 + (-.1283)*0 + (-.2960)*0 + (.0975)*0 + (-.0630)*0, where 9.793 is the reference point.

Stata has the .regress command for OLS regression (LSDV).

. regress cost g1-g5 output fuel load

Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  8,    81) = 3935.79
Model |   113.74827     8  14.2185338           Prob > F      =  0.0000
Residual |  .292622872    81  .003612628           R-squared     =  0.9974
Total |  114.040893    89  1.28135835           Root MSE      =  .06011

------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 |  -.0870617   .0841995    -1.03   0.304    -.2545924     .080469
g2 |  -.1282976   .0757281    -1.69   0.094    -.2789728    .0223776
g3 |  -.2959828   .0500231    -5.92   0.000     -.395513   -.1964526
g4 |    .097494   .0330093     2.95   0.004     .0318159    .1631721
g5 |   -.063007   .0238919    -2.64   0.010    -.1105443   -.0154697
output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
_cons |   9.793004   .2636622    37.14   0.000     9.268399    10.31761
------------------------------------------------------------------------------

Now, run the LIMDEP Regress\$ command to fit the LSDV1. Do not forget to include ONE for the intercept in the Rhs;.

+-----------------------------------------------------------------------+
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Autocorrel: Durbin-Watson Statistic =   1.02645,   Rho =       .48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant     9.793021272       .26366104   37.142   .0000
G1       -.8707201949E-01  .84199161E-01   -1.034   .3042     .16666667
G2          -.1283060033   .75727781E-01   -1.694   .0940     .16666667
G3          -.2959885994   .50022855E-01   -5.917   .0000     .16666667
G4        .9749253376E-01  .33009146E-01    2.954   .0041     .16666667
G5       -.6300770422E-01  .23891796E-01   -2.637   .0100     .16666667
OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

What if you drop a different dummy variable, say g1, instead of g6? Since the different reference point is applied, you will get different dummy coefficients. The other statistics such as goodness-of-fits, however, remain unchanged.

. regress cost g2-g6 output fuel load // LSDV1 dropping g1

Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  8,    81) = 3935.79
Model |   113.74827     8  14.2185338           Prob > F      =  0.0000
Residual |  .292622872    81  .003612628           R-squared     =  0.9974
Total |  114.040893    89  1.28135835           Root MSE      =  .06011

------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2 |  -.0412359   .0251839    -1.64   0.105    -.0913441    .0088722
g3 |  -.2089211   .0427986    -4.88   0.000    -.2940769   -.1237652
g4 |   .1845557   .0607527     3.04   0.003     .0636769    .3054345
g5 |   .0240547   .0799041     0.30   0.764    -.1349293    .1830387
g6 |   .0870617   .0841995     1.03   0.304     -.080469    .2545924
output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
_cons |   9.705942    .193124    50.26   0.000     9.321686     10.0902
------------------------------------------------------------------------------

When you have not created dummy variables, take advantage of the .xi prefix command. Note that Stata by default drops the first dummy variable while the SAS TSCSREG and PANEL procedures in 4.5.2 drops the last dummy.

. xi: regress cost i.airline output fuel load

i.airline         _Iairline_1-6       (naturally coded; _Iairline_1 omitted)

Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  8,    81) = 3935.79
Model |   113.74827     8  14.2185338           Prob > F      =  0.0000
Residual |  .292622872    81  .003612628           R-squared     =  0.9974
Total |  114.040893    89  1.28135835           Root MSE      =  .06011

------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iairline_2 |  -.0412359   .0251839    -1.64   0.105    -.0913441    .0088722
_Iairline_3 |  -.2089211   .0427986    -4.88   0.000    -.2940769   -.1237652
_Iairline_4 |   .1845557   .0607527     3.04   0.003     .0636769    .3054345
_Iairline_5 |   .0240547   .0799041     0.30   0.764    -.1349293    .1830387
_Iairline_6 |   .0870617   .0841995     1.03   0.304     -.080469    .2545924
output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
_cons |   9.705942    .193124    50.26   0.000     9.321686     10.0902
------------------------------------------------------------------------------

Top

4.3 LSDV2 without the Intercept

LSDV2 reports actual parameter estimates of the dummies. Because LSDV2 suppresses the intercept, you will get incorrect F and R2 statistics.

In the SAS REG procedure, you need to use the /NOINT option to suppress the intercept. Note that the F value of 497,985 and R2 of 1 are not likely.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load /NOINT;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

Number of Observations Used          90

NOTE: No intercept in model. R-Square is redefined.

Analysis of Variance

Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     9          16191     1799.03381     497985    <.0001
Error                    81        0.29262        0.00361
Uncorrected Total        90          16192

Root MSE              0.06011    R-Square     1.0000
Dependent Mean       13.36561    Adj R-Sq     1.0000
Coeff Var             0.44970

Parameter Estimates

Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

g1            1        9.70594        0.19312      50.26      <.0001
g2            1        9.66471        0.19898      48.57      <.0001
g3            1        9.49702        0.22496      42.22      <.0001
g4            1        9.89050        0.24176      40.91      <.0001
g5            1        9.73000        0.26094      37.29      <.0001
g6            1        9.79300        0.26366      37.14      <.0001
output        1        0.91928        0.02989      30.76      <.0001
fuel          1        0.41749        0.01520      27.47      <.0001
load          1       -1.07040        0.20169      -5.31      <.0001

Stata uses the noconstant option to suppress the intercept. Note that noc is its abbreviation.

. regress cost g1-g6 output fuel load, noc

Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  9,    81) =       .
Model |  16191.3043     9  1799.03381           Prob > F      =  0.0000
Residual |  .292622872    81  .003612628           R-squared     =  1.0000
Total |  16191.5969    90  179.906633           Root MSE      =  .06011

------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 |   9.705942    .193124    50.26   0.000     9.321686     10.0902
g2 |   9.664706    .198982    48.57   0.000     9.268794    10.06062
g3 |   9.497021   .2249584    42.22   0.000     9.049424    9.944618
g4 |   9.890498   .2417635    40.91   0.000     9.409464    10.37153
g5 |   9.729997   .2609421    37.29   0.000     9.210804    10.24919
g6 |   9.793004   .2636622    37.14   0.000     9.268399    10.31761
output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
------------------------------------------------------------------------------

In LIMDEP, you need to drop ONE out of the Rhs; to suppress the intercept. Unlike SAS and Stata, LIMDEP reports correct R2 and F even in LSDV2.

+-----------------------------------------------------------------------+
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Model does not contain ONE. R-squared and F can be negative!          |
| Autocorrel: Durbin-Watson Statistic =   1.02645,   Rho =       .48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
G1           9.705949253       .19312325   50.258   .0000     .16666667
G2           9.664715269       .19898117   48.571   .0000     .16666667
G3           9.497032673       .22495746   42.217   .0000     .16666667
G4           9.890513806       .24176245   40.910   .0000     .16666667
G5           9.730013568       .26094094   37.288   .0000     .16666667
G6           9.793021272       .26366104   37.142   .0000     .16666667
OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

Top

4.4 LSDV3 with Restrictions

LSDV3 imposes a restriction that the sum of the dummy parameters is zero. The SAS REG procedure uses the RESTRICT statement to impose restrictions.

PROC REG DATA=masil.airline;
MODEL cost = g1-g6 output fuel load;
RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

The REG Procedure
Model: MODEL1
Dependent Variable: cost

NOTE: Restrictions have been applied to parameter estimates.

Number of Observations Used          90

Analysis of Variance

Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     8      113.74827       14.21853    3935.79    <.0001
Error                    81        0.29262        0.00361
Corrected Total          89      114.04089

Root MSE              0.06011    R-Square     0.9974
Dependent Mean       13.36561    Adj R-Sq     0.9972
Coeff Var             0.44970

Parameter Estimates

Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        9.71353        0.22964      42.30     <.0001
g1            1       -0.00759        0.04562      -0.17     0.8683
g2            1       -0.04882        0.03798      -1.29     0.2023
g3            1       -0.21651        0.01606     -13.48     <.0001
g4            1        0.17697        0.01942       9.11     <.0001
g5            1        0.01647        0.03669       0.45     0.6547
g6            1        0.07948        0.04050       1.96     0.0532
output        1        0.91928        0.02989      30.76     <.0001
fuel          1        0.41749        0.01520      27.47     <.0001
load          1       -1.07040        0.20169      -5.31     <.0001
RESTRICT     -1    3.01674E-15    1.51088E-10       0.00     1.0000*

* Probability computed using beta distribution.

The dummy coefficients mean deviations from the averaged group effect (9.714). The actual intercept of group 2, for example, is 9.665 =9.714+ (-.049). Note that the 3.01674E-15 of RESTRICT below is virtually zero.

In Stata, you have to use the .cnsreg command rather than .regress. The command, however, does not provide an ANOVA table and goodness-of-fit statistics.

. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0
. cnsreg cost g1-g6 output fuel load, constraint(1)

Constrained linear regression                          Number of obs =      90
F(  8,    81) = 3935.79
Prob > F      =  0.0000
Root MSE      =  .06011
( 1)  g1 + g2 + g3 + g4 + g5 + g6 = 0
------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g1 |  -.0075859   .0456178    -0.17   0.868    -.0983509    .0831792
g2 |  -.0488218   .0379787    -1.29   0.202    -.1243875    .0267439
g3 |  -.2165069   .0160624   -13.48   0.000    -.2484661   -.1845478
g4 |   .1769698   .0194247     9.11   0.000     .1383208    .2156189
g5 |   .0164689   .0366904     0.45   0.655    -.0565335    .0894712
g6 |   .0794759   .0405008     1.96   0.053     -.001108    .1600597
output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
_cons |   9.713528    .229641    42.30   0.000     9.256614    10.17044
------------------------------------------------------------------------------

LIMDEP has the Cls\$ subcommand to impose restrictions. Again, do not forget to include ONE in the Rhs;.

Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)=0\$

+-----------------------------------------------------------------------+
| Linearly restricted regression                                        |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
|             (Note:  Not using OLS.  R-squared is not bounded in [0,1] |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Note, when restrictions are imposed, R-squared can be less than zero. |
| F[ 1,    80] for the restrictions =       .0000, Prob =  1.0000       |
| Autocorrel: Durbin-Watson Statistic =   1.02645,   Rho =       .48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant     12.12205614       .27886962   43.469   .0000
G1          -2.416106889   .89836871E-01  -26.894   .0000     .16666667
G2          -2.457340873   .82929154E-01  -29.632   .0000     .16666667
G3          -2.625023469   .56175656E-01  -46.729   .0000     .16666667
G4          -2.231542336   .41557714E-01  -53.697   .0000     .16666667
G5          -2.392042574   .29995908E-01  -79.746   .0000     .16666667
G6          -2.329034870   .33569388E-01  -69.380   .0000     .16666667
OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016

LSDV3 in LIMDEP reports different dummy coefficients. But you may draw actual intercepts of groups in a manner similar to what you would do in SAS and Stata. The actual intercept of group 3, for example, is 9.497 = 12.122 + (-2.625).

Top

4.5 Within Group Effect Model

The within effect model does not use the dummies and thus has larger degrees of freedom, smaller MSE, and smaller standard errors of parameters than those of LSDV. As a consequence, you need to adjust standard errors. This model does not report individual dummy coefficients either. The SAS TSCSREG procedure and LIMDEP Regress\$ command report the adjusted (correct) MSE, SEE (Root MSE), R2, and standard errors.

4.5.1 Estimating the Within Effect Model

First, let us manually estimate the within group effect model in Stata. You need to compute group means and transform dependent and independent variables using group means (log is skipped here).

. egen gm_cost=mean(cost), by(airline) // compute group means
. egen gm_output=mean(output), by(airline)
. egen gm_fuel=mean(fuel), by(airline)

You will get the following group means of variables.
+------------------------------------------------------+
| airline    gm_cost   gm_output    gm_fuel    gm_load |
|------------------------------------------------------|
|       1   14.67563    .3192696    12.7318   .5971917 |
|       2   14.37247    -.033027   12.75171   .5470946 |
|       3   13.37231   -.9122626   12.78972   .5845358 |
|       4    13.1358   -1.635174   12.77803   .5476773 |
|       5   12.36304   -2.285681    12.7921   .5664859 |
|       6   12.27441    -2.49898    12.7788   .5197756 |
+------------------------------------------------------+

. gen gw_cost = cost - gm_cost // compute deviations from the group means
. gen gw_output = output - gm_output
. gen gw_fuel = fuel - gm_fuel

Now, we are ready to run the within effect model. Keep in mind that you have to suppress the intercept. Carefully check MSE, SEE, R2, and standard errors.

. regress gw_cost gw_output gw_fuel gw_load, noc // within effect

Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  3,    87) = 3871.82
Model |  39.0683861     3  13.0227954           Prob > F      =  0.0000
Residual |  .292622861    87  .003363481           R-squared     =  0.9926
Total |   39.361009    90  .437344544           Root MSE      =    .058

------------------------------------------------------------------------------
gw_cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
gw_output |   .9192846    .028841    31.87   0.000       .86196    .9766092
gw_fuel |   .4174918   .0146657    28.47   0.000     .3883422    .4466414
gw_load |  -1.070396   .1946109    -5.50   0.000    -1.457206   -.6835858
------------------------------------------------------------------------------

You may compute group intercepts. For example, the intercept of airline 5 is computed as 9.730 = 12.363 ?{.919*(-2.286) + .417*12.792 + (-1.073)*.566 }. In order to get the correct standard errors, you need to adjust them using the ratio of degrees of freedom of the within effect model and the LSDV. For example, the standard error of the logged output is computed as .0299=.0288*sqrt(87/81).

Top

4.5.2 Using the SAS TSCSREG and PANEL Procedures

The TSCSREG and PANEL procedures of SAS/ETS allows users to fit the within effect model conveniently. The procedures, in fact, report LSDV1, but you do not need to create dummy variables and compute deviations from the group means. This procedures reports correct MSE, SEE, R2, and standard errors, and conducts the F test for the fixed group effect as well.

PROC SORT DATA=masil.airline;
BY airline year;

PROC TSCSREG DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;

The TSCSREG Procedure

Dependent Variable: cost

Model Description

Estimation Method             FixOne
Number of Cross Sections           6
Time Series Length                15

Fit Statistics

SSE              0.2926    DFE                  81
MSE              0.0036    Root MSE         0.0601
R-Square         0.9974

F Test for No Fixed Effects

Num DF      Den DF    F Value    Pr > F

5          81      57.73    <.0001

Parameter Estimates

Standard
Variable        DF    Estimate       Error    t Value    Pr > |t|    Label

CS1              1    -0.08706      0.0842      -1.03      0.3042    Cross Sectional
Effect    1
CS2              1     -0.1283      0.0757      -1.69      0.0941    Cross Sectional
Effect    2
CS3              1    -0.29598      0.0500      -5.92      <.0001    Cross Sectional
Effect    3
CS4              1    0.097494      0.0330       2.95      0.0041    Cross Sectional
Effect    4
CS5              1    -0.06301      0.0239      -2.64      0.0100    Cross Sectional
Effect    5
Intercept        1    9.793004      0.2637      37.14      <.0001    Intercept
output           1    0.919285      0.0299      30.76      <.0001
fuel             1    0.417492      0.0152      27.47      <.0001
load             1     -1.0704      0.2017      -5.31      <.0001

Note that a data set needs to be sorted in advance by variables to appear in the ID statement of the TSCSREG and PANEL procedures. The following PANEL procedure returns the same output.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /FIXONE;
RUN;

Top

4.5.3 Using Stata

The Stata .xtreg command fits the within group effect model without creating dummy variables. The command reports correct standard errors and the F test for fixed group effects. This command, however, does not provide an analysis of variance (ANOVA) table and correct R2 and F statistics. The .xtreg command should follow the .tsset command that specifies grouping and time variables.

. tsset airline year

panel variable:  airline, 1 to 6
time variable:  year, 1 to 15

The fe of .xtreg indicates the within effect model and i(airline) specifies airline as the independent unit. Note that this command reports adjusted (correct) standard errors.

. xtreg cost output fuel load, fe i(airline) // within group effect

Fixed-effects (within) regression               Number of obs      =        90
Group variable (i): airline                     Number of groups   =         6

R-sq:  within  = 0.9926                         Obs per group: min =        15
between = 0.9856                                        avg =      15.0
overall = 0.9873                                        max =        15

F(3,81)            =   3604.80
corr(u_i, Xb)  = -0.3475                        Prob > F           =    0.0000

------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
_cons |   9.713528    .229641    42.30   0.000     9.256614    10.17044
-------------+----------------------------------------------------------------
sigma_u |   .1320775
sigma_e |  .06010514
rho |  .82843653   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(5, 81) =    57.73               Prob > F = 0.0000

The last line of the output tests the null hypothesis that all dummy parameters in LSDV1 are zero (e.g., g1=0, g2=0, g3=0, g4=0, and g5=0). Not the intercept of 9.714 is that of LSDV3.

Top

4.5.4 Using LIMDEP

In LIMDEP, you have to specify the panel data model and stratification or time variables. The Panel\$ and Fixed\$ subcommands mean a fixed effect panel data model. The Str\$ subcommand specifies a stratification variable.

+-----------------------------------------------------------------------+
| OLS Without Group Dummy Variables                                     |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   4, Deg.Fr.=     86 |
| Residuals:  Sum of squares= 1.335449522    , Std.Dev.=         .12461 |
| Fit:        R-squared=  .988290, Adjusted R-squared =          .98788 |
| Model test: F[  3,     86] = 2419.33,    Prob value =          .00000 |
| Diagnostic: Log-L =     61.7699, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -4.122, Akaike Info. Crt.=     -1.284 |
| Panel Data Analysis of COST       [ONE way]                           |
|           Unconditional ANOVA (No regressors)                         |
| Source      Variation        Deg. Free.     Mean Square               |
| Between       74.6799                5.         14.9360               |
| Residual      39.3611               84.         .468584               |
| Total         114.041               89.         1.28136               |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
OUTPUT       .8827386341   .13254552E-01   66.599   .0000    -1.1743092
FUEL         .4539777119   .20304240E-01   22.359   .0000     12.770359
LOAD        -1.627507797       .34530293   -4.713   .0000     .56046016
Constant     9.516912231       .22924522   41.514   .0000
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

+-----------------------------------------------------------------------+
| Least Squares with Group Dummy Variables                              |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Estd. Autocorrelation of e(i,t)     .573531                           |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

LIMDEP reports both the pooled OLS regression and the within effect model. Like the SAS TSCSREG procedure, LIMDEP provides correct MSE, SEE, R2, and standard errors.

Top

4.6 Between Group Effect Model: Group Mean Regression

The between effect model uses aggregate information, group means of variables. In other words, the unit of analysis is not an individual observation, but groups or subjects. The number of observations jumps down to n from nT. This group mean regression produces different goodness-of-fits and parameter estimates from those of LSDV and the within effect model.

Let us compute group means and run the OLS regression with them. The .collapse command computes aggregate information and saves into a new data set. Note that /// links two command lines.

. collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) ///

. regress gm_cost gm_output gm_fuel gm_load

Source |       SS       df       MS              Number of obs =       6
-------------+------------------------------           F(  3,     2) =  104.12
Model |  4.94698124     3  1.64899375           Prob > F      =  0.0095
Residual |  .031675926     2  .015837963           R-squared     =  0.9936
Total |  4.97865717     5  .995731433           Root MSE      =  .12585

------------------------------------------------------------------------------
gm_cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
gm_output |   .7824568   .1087646     7.19   0.019     .3144803    1.250433
gm_fuel |  -5.523904   4.478718    -1.23   0.343    -24.79427    13.74647
gm_load |  -1.751072   2.743167    -0.64   0.589    -13.55397    10.05182
_cons |    85.8081   56.48199     1.52   0.268    -157.2143    328.8305
------------------------------------------------------------------------------

The SAS PANEL procedure has the /BTWNG and /BTWNT option to estimate the between effect model. The TSCSREG procedure does not have this option.

PROC PANEL DATA=masil.airline;
ID airline year;
MODEL cost = output fuel load /BTWNG;
RUN;

The PANEL Procedure
Between Groups Estimates

Dependent Variable: cost

Model Description

Estimation Method            BtwGrps
Number of Cross Sections           6
Time Series Length                15

Fit Statistics

SSE              0.0317    DFE                   2
MSE              0.0158    Root MSE         0.1258
R-Square         0.9936

Parameter Estimates

Standard
Variable        DF    Estimate       Error    t Value    Pr > |t|    Label

Intercept        1    85.80901     56.4830       1.52      0.2681    Intercept
output           1    0.782455      0.1088       7.19      0.0188
fuel             1    -5.52398      4.4788      -1.23      0.3427
load             1    -1.75102      2.7432      -0.64      0.5886

The Stata .xtreg command has the be option to fit the between effect model. This command, however, does not report the ANOVA table.

. xtreg cost output fuel load, be i(airline)

Between regression (regression on group means)  Number of obs      =        90
Group variable (i): airline                     Number of groups   =         6

R-sq:  within  = 0.8808                         Obs per group: min =        15
between = 0.9936                                        avg =      15.0
overall = 0.1371                                        max =        15

F(3,2)             =    104.12
sd(u_i + avg(e_i.))=  .1258491                  Prob > F           =    0.0095

------------------------------------------------------------------------------
cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
output |   .7824552   .1087663     7.19   0.019     .3144715    1.250439
fuel |  -5.523978   4.478802    -1.23   0.343    -24.79471    13.74675
load |  -1.751016    2.74319    -0.64   0.589    -13.55401    10.05198
_cons |   85.80901   56.48302     1.52   0.268    -157.2178    328.8358
------------------------------------------------------------------------------

LIMDEP has the Means; subcommand to fit the between effect model.

+-----------------------------------------------------------------------+
| Group Means Regression                                                |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = YBAR(i.) Mean=   13.36560933    , S.D.=   .9978636346     |
| Model size: Observations =       6, Parameters =   4, Deg.Fr.=      2 |
| Residuals:  Sum of squares= .3167277206E-01, Std.Dev.=         .12584 |
| Fit:        R-squared=  .993638, Adjusted R-squared =          .98410 |
| Model test: F[  3,      2] =  104.13,    Prob value =          .00953 |
| Diagnostic: Log-L =      7.2185, Restricted(b=0) Log-L =      -7.9538 |
|             LogAmemiyaPrCrt.=   -3.635, Akaike Info. Crt.=     -1.073 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
OUTPUT       .7824472689       .10876126    7.194   .0000  .23025612E-11
FUEL        -5.524437466       4.4786519   -1.234   .2174     .18642891
LOAD        -1.750947653       2.7430470    -.638   .5233     .32541105
Constant     85.81483169       56.481148    1.519   .1287

4.7 Testing Fixed Group Effects (F-test)

How do we know whether there are fixed group effects? The null hypothesis is that all dummy parameters except one are zero.

In order to conduct a F-test, let us take the SSE (e’e) of 1.3354 from the pooled OLS regression and .2926 from the LSDVs (LSDV1 through LSDV3) or the within effect model. Alternatively, you may draw R2 of .9974 from LSDV1 or LSDV3 and .9883 from the pooled OLS. Do not, however, use LSDV2 and the within effect model for R2.

The Fstatistic is computed as

The large F statistic rejects the null hypothesis in favor of the fixed group effect model (p<.0000).

The SAS TSCSREG and PANEL procedures and Stata .xtreg command by default conduct the F test. Alternatively, you may conduct the same test with LSDV1. In SAS, add the TEST statement in the REG procedure and run the procedure again (other outputs are skipped).

PROC REG DATA=masil.airline;
MODEL cost = g1-g5 output fuel load;
TEST g1 = g2 = g3 = g4 = g5 = 0;
RUN;

The REG Procedure
Model: MODEL1

Test 1 Results for Dependent Variable cost

Mean
Source             DF         Square    F Value    Pr > F

Numerator           5        0.20856      57.73    <.0001
Denominator        81        0.00361

In Stata, run the .test command, a follow-up command for the Wald test, right after estimating the model.

. quietly regress cost g1-g5 output fuel load // LSDV1
. test g1 g2 g3 g4 g5

( 1)  g1 = 0
( 2)  g2 = 0
( 3)  g3 = 0
( 4)  g4 = 0
( 5)  g5 = 0

F(  5,    81) =   57.73
Prob > F =    0.0000

4.8 Summary

Table 6 summarizes the estimation of panel data models in SAS, Stata, and LIMDEP. The SAS REG and TSCSREG procedures are generally preferred to Stata and LIMDEP commands.

Table 6 Comparison of the Fixed Effect Model in SAS, Stata, LIMDEP*