4. The Fixed Group Effect Model

The one-way fixed group model examines group differences in the intercepts. The LSDV for this fixed model needs to create as many dummy variables as the number of groups or subjects. When many dummies are needed, the within effect model is useful since it transforms variables using group means to avoid dummies. The between effect model uses group means of variables.

4.1 The Pooled OLS Regression Model

Let us first consider the pooled model without dummy variables.

. regress cost output fuel load // pooled model

      Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  3,    86) = 2419.34
       Model |  112.705452     3  37.5684839           Prob > F      =  0.0000
    Residual |  1.33544153    86   .01552839           R-squared     =  0.9883
-------------+------------------------------           Adj R-squared =  0.9879
       Total |  114.040893    89  1.28135835           Root MSE      =  .12461
 
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      output |   .8827385   .0132545    66.60   0.000     .8563895    .9090876
        fuel |    .453977   .0203042    22.36   0.000     .4136136    .4943404
        load |   -1.62751    .345302    -4.71   0.000    -2.313948   -.9410727
       _cons |   9.516923   .2292445    41.51   0.000       9.0612    9.972645
------------------------------------------------------------------------------
cost = 9.517 + .883*output +.454*fuel -1.628*load.

This model fits the data well (p<.0000 and R2=.9883). We may, however, suspect fixed group effects that produce different intercepts across groups. As discussed in Chapter 2, there are three equivalent approaches of LSDV. They report the identical parameter estimates of regresors excluding dummies. Let us begin with LSDV1.

Top

4.2 LSDV1 without a Dummy

LSDV1 drops a dummy variable to identify the model. LSDV1 produces correct ANOVA information, goodness of fit, parameter estimates, and standard errors. As a consequence, this approach is commonly used in practice. LSDV produces six regression equations for six groups (airlines).

Group1: cost = 9.706 + .919*output +.417*fuel -1.070*load
Group2: cost = 9.665 + .919*output +.417*fuel -1.070*load
Group3: cost = 9.497 + .919*output +.417*fuel -1.070*load
Group4: cost = 9.891 + .919*output +.417*fuel -1.070*load
Group5: cost = 9.730 + .919*output +.417*fuel -1.070*load
Group6: cost = 9.793 + .919*output +.417*fuel -1.070*load

In SAS, the REG procedure fits the OLS regression model. Let us drop the last dummy g6, the reference point.

PROC REG DATA=masil.airline;
   MODEL cost = g1-g5 output fuel load;
RUN;

                                       The REG Procedure
                                         Model: MODEL1
                                   Dependent Variable: cost
 
                            Number of Observations Read          90
                            Number of Observations Used          90
 
 
                                      Analysis of Variance
 
                                             Sum of           Mean
         Source                   DF        Squares         Square    F Value    Pr > F
 
         Model                     8      113.74827       14.21853    3935.79    <.0001
         Error                    81        0.29262        0.00361
         Corrected Total          89      114.04089
 
 
                      Root MSE              0.06011    R-Square     0.9974
                      Dependent Mean       13.36561    Adj R-Sq     0.9972
                      Coeff Var             0.44970
 
 
                                      Parameter Estimates
 
                                   Parameter       Standard
              Variable     DF       Estimate          Error    t Value    Pr > |t|
 
              Intercept     1        9.79300        0.26366      37.14      <.0001
              g1            1       -0.08706        0.08420      -1.03      0.3042
              g2            1       -0.12830        0.07573      -1.69      0.0941
              g3            1       -0.29598        0.05002      -5.92      <.0001
              g4            1        0.09749        0.03301       2.95      0.0041
              g5            1       -0.06301        0.02389      -2.64      0.0100
              output        1        0.91928        0.02989      30.76      <.0001
              fuel          1        0.41749        0.01520      27.47      <.0001
              load          1       -1.07040        0.20169      -5.31      <.0001

Note that the parameter estimate of g6 is presented in the intercept (9.793). Other dummy parameter estimates are computed with the reference point. The actual intercept of the group 1, for example, is computed as 9.706 = 9.793 + (-.087)*1 + (-.1283)*0 + (-.2960)*0 + (.0975)*0 + (-.0630)*0, where 9.793 is the reference point.

STATA has the .regress command for OLS regression (LSDV).

. regress cost g1-g5 output fuel load

      Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  8,    81) = 3935.79
       Model |   113.74827     8  14.2185338           Prob > F      =  0.0000
    Residual |  .292622872    81  .003612628           R-squared     =  0.9974
-------------+------------------------------           Adj R-squared =  0.9972
       Total |  114.040893    89  1.28135835           Root MSE      =  .06011
 
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          g1 |  -.0870617   .0841995    -1.03   0.304    -.2545924     .080469
          g2 |  -.1282976   .0757281    -1.69   0.094    -.2789728    .0223776
          g3 |  -.2959828   .0500231    -5.92   0.000     -.395513   -.1964526
          g4 |    .097494   .0330093     2.95   0.004     .0318159    .1631721
          g5 |   -.063007   .0238919    -2.64   0.010    -.1105443   -.0154697
      output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
        fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
        load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
       _cons |   9.793004   .2636622    37.14   0.000     9.268399    10.31761
------------------------------------------------------------------------------

Now, run the LIMDEP Regress$ command to fit the LSDV1. Do not forget to include ONE for the intercept in the Rhs;.

--> REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,OUTPUT,FUEL,LOAD$

+-----------------------------------------------------------------------+
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Autocorrel: Durbin-Watson Statistic =   1.02645,   Rho =       .48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 Constant     9.793021272       .26366104   37.142   .0000
 G1       -.8707201949E-01  .84199161E-01   -1.034   .3042     .16666667
 G2          -.1283060033   .75727781E-01   -1.694   .0940     .16666667
 G3          -.2959885994   .50022855E-01   -5.917   .0000     .16666667
 G4        .9749253376E-01  .33009146E-01    2.954   .0041     .16666667
 G5       -.6300770422E-01  .23891796E-01   -2.637   .0100     .16666667
 OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
 FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
 LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

What if you drop a different dummy variable, say g1, instead of g6? Since the different reference point is applied, you will get different dummy coefficients. The other statistics such as goodness-of-fits, however, remain unchanged.

. regress cost g2-g6 output fuel load // LSDV1 dropping g1

      Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  8,    81) = 3935.79
       Model |   113.74827     8  14.2185338           Prob > F      =  0.0000
    Residual |  .292622872    81  .003612628           R-squared     =  0.9974
-------------+------------------------------           Adj R-squared =  0.9972
       Total |  114.040893    89  1.28135835           Root MSE      =  .06011
 
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          g2 |  -.0412359   .0251839    -1.64   0.105    -.0913441    .0088722
          g3 |  -.2089211   .0427986    -4.88   0.000    -.2940769   -.1237652
          g4 |   .1845557   .0607527     3.04   0.003     .0636769    .3054345
          g5 |   .0240547   .0799041     0.30   0.764    -.1349293    .1830387
          g6 |   .0870617   .0841995     1.03   0.304     -.080469    .2545924
      output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
        fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
        load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
       _cons |   9.705942    .193124    50.26   0.000     9.321686     10.0902
------------------------------------------------------------------------------

When you have not created dummy variables, take advantage of the .xi prefix command. Note that STATA by default drops the first dummy variable while the SAS TSCSREG and PANEL procedures in 4.5.2 drops the last dummy.

. xi: regress cost i.airline output fuel load

i.airline         _Iairline_1-6       (naturally coded; _Iairline_1 omitted)
 
      Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  8,    81) = 3935.79
       Model |   113.74827     8  14.2185338           Prob > F      =  0.0000
    Residual |  .292622872    81  .003612628           R-squared     =  0.9974
-------------+------------------------------           Adj R-squared =  0.9972
       Total |  114.040893    89  1.28135835           Root MSE      =  .06011
 
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Iairline_2 |  -.0412359   .0251839    -1.64   0.105    -.0913441    .0088722
 _Iairline_3 |  -.2089211   .0427986    -4.88   0.000    -.2940769   -.1237652
 _Iairline_4 |   .1845557   .0607527     3.04   0.003     .0636769    .3054345
 _Iairline_5 |   .0240547   .0799041     0.30   0.764    -.1349293    .1830387
 _Iairline_6 |   .0870617   .0841995     1.03   0.304     -.080469    .2545924
      output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
        fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
        load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
       _cons |   9.705942    .193124    50.26   0.000     9.321686     10.0902
------------------------------------------------------------------------------

Top

4.3 LSDV2 without the Intercept

LSDV2 reports actual parameter estimates of the dummies. Because LSDV2 suppresses the intercept, you will get incorrect F and R2 statistics.

In the SAS REG procedure, you need to use the /NOINT option to suppress the intercept. Note that the F value of 497,985 and R2 of 1 are not likely.

PROC REG DATA=masil.airline;
   MODEL cost = g1-g6 output fuel load /NOINT;
RUN;

                                       The REG Procedure
                                         Model: MODEL1
                                   Dependent Variable: cost
 
                            Number of Observations Read          90
                            Number of Observations Used          90
 
 
                      NOTE: No intercept in model. R-Square is redefined.
 
                                      Analysis of Variance
 
                                             Sum of           Mean
         Source                   DF        Squares         Square    F Value    Pr > F
 
         Model                     9          16191     1799.03381     497985    <.0001
         Error                    81        0.29262        0.00361
         Uncorrected Total        90          16192
 
 
                      Root MSE              0.06011    R-Square     1.0000
                      Dependent Mean       13.36561    Adj R-Sq     1.0000
                      Coeff Var             0.44970
 
 
                                      Parameter Estimates
 
                                   Parameter       Standard
              Variable     DF       Estimate          Error    t Value    Pr > |t|
 
              g1            1        9.70594        0.19312      50.26      <.0001
              g2            1        9.66471        0.19898      48.57      <.0001
              g3            1        9.49702        0.22496      42.22      <.0001
              g4            1        9.89050        0.24176      40.91      <.0001
              g5            1        9.73000        0.26094      37.29      <.0001
              g6            1        9.79300        0.26366      37.14      <.0001
              output        1        0.91928        0.02989      30.76      <.0001
              fuel          1        0.41749        0.01520      27.47      <.0001
              load          1       -1.07040        0.20169      -5.31      <.0001

STATA uses the noconstant option to suppress the intercept. Note that noc is its abbreviation.

. regress cost g1-g6 output fuel load, noc

      Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  9,    81) =       .
       Model |  16191.3043     9  1799.03381           Prob > F      =  0.0000
    Residual |  .292622872    81  .003612628           R-squared     =  1.0000
-------------+------------------------------           Adj R-squared =  1.0000
       Total |  16191.5969    90  179.906633           Root MSE      =  .06011
 
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          g1 |   9.705942    .193124    50.26   0.000     9.321686     10.0902
          g2 |   9.664706    .198982    48.57   0.000     9.268794    10.06062
          g3 |   9.497021   .2249584    42.22   0.000     9.049424    9.944618
          g4 |   9.890498   .2417635    40.91   0.000     9.409464    10.37153
          g5 |   9.729997   .2609421    37.29   0.000     9.210804    10.24919
          g6 |   9.793004   .2636622    37.14   0.000     9.268399    10.31761
      output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
        fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
        load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
------------------------------------------------------------------------------

In LIMDEP, you need to drop ONE out of the Rhs; to suppress the intercept. Unlike SAS and STATA, LIMDEP reports correct R2 and F even in LSDV2.

--> REGRESS;Lhs=COST;Rhs=G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD$

+-----------------------------------------------------------------------+
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Model does not contain ONE. R-squared and F can be negative!          |
| Autocorrel: Durbin-Watson Statistic =   1.02645,   Rho =       .48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 G1           9.705949253       .19312325   50.258   .0000     .16666667
 G2           9.664715269       .19898117   48.571   .0000     .16666667
 G3           9.497032673       .22495746   42.217   .0000     .16666667
 G4           9.890513806       .24176245   40.910   .0000     .16666667
 G5           9.730013568       .26094094   37.288   .0000     .16666667
 G6           9.793021272       .26366104   37.142   .0000     .16666667
 OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
 FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
 LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

Top

4.4 LSDV3 with Restrictions

LSDV3 imposes a restriction that the sum of the dummy parameters is zero. The SAS REG procedure uses the RESTRICT statement to impose restrictions.

PROC REG DATA=masil.airline;
   MODEL cost = g1-g6 output fuel load;
   RESTRICT g1 + g2 + g3 + g4 + g5 + g6 = 0;
RUN;

                                       The REG Procedure
                                         Model: MODEL1
                                   Dependent Variable: cost
 
NOTE: Restrictions have been applied to parameter estimates.
 
 
                            Number of Observations Read          90
                            Number of Observations Used          90
 
 
                                      Analysis of Variance
 
                                             Sum of           Mean
         Source                   DF        Squares         Square    F Value    Pr > F
 
         Model                     8      113.74827       14.21853    3935.79    <.0001
         Error                    81        0.29262        0.00361
         Corrected Total          89      114.04089
 
 
                      Root MSE              0.06011    R-Square     0.9974
                      Dependent Mean       13.36561    Adj R-Sq     0.9972
                      Coeff Var             0.44970
 
 
                                      Parameter Estimates
 
                                   Parameter       Standard
              Variable     DF       Estimate          Error    t Value    Pr > |t|
 
              Intercept     1        9.71353        0.22964      42.30     <.0001
              g1            1       -0.00759        0.04562      -0.17     0.8683
              g2            1       -0.04882        0.03798      -1.29     0.2023
              g3            1       -0.21651        0.01606     -13.48     <.0001
              g4            1        0.17697        0.01942       9.11     <.0001
              g5            1        0.01647        0.03669       0.45     0.6547
              g6            1        0.07948        0.04050       1.96     0.0532
              output        1        0.91928        0.02989      30.76     <.0001
              fuel          1        0.41749        0.01520      27.47     <.0001
              load          1       -1.07040        0.20169      -5.31     <.0001
              RESTRICT     -1    3.01674E-15    1.51088E-10       0.00     1.0000*
 
                        * Probability computed using beta distribution.

The dummy coefficients mean deviations from the averaged group effect (9.714). The actual intercept of group 2, for example, is 9.665 =9.714+ (-.049). Note that the 3.01674E-15 of RESTRICT below is virtually zero.

In STATA, you have to use the .cnsreg command rather than .regress. The command, however, does not provide an ANOVA table and goodness-of-fit statistics.

. constraint define 1 g1 + g2 + g3 + g4 + g5 + g6 = 0
. cnsreg cost g1-g6 output fuel load, constraint(1)

Constrained linear regression                          Number of obs =      90
                                                       F(  8,    81) = 3935.79
                                                       Prob > F      =  0.0000
                                                       Root MSE      =  .06011
 ( 1)  g1 + g2 + g3 + g4 + g5 + g6 = 0
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          g1 |  -.0075859   .0456178    -0.17   0.868    -.0983509    .0831792
          g2 |  -.0488218   .0379787    -1.29   0.202    -.1243875    .0267439
          g3 |  -.2165069   .0160624   -13.48   0.000    -.2484661   -.1845478
          g4 |   .1769698   .0194247     9.11   0.000     .1383208    .2156189
          g5 |   .0164689   .0366904     0.45   0.655    -.0565335    .0894712
          g6 |   .0794759   .0405008     1.96   0.053     -.001108    .1600597
      output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
        fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
        load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
       _cons |   9.713528    .229641    42.30   0.000     9.256614    10.17044
------------------------------------------------------------------------------

LIMDEP has the Cls$ subcommand to impose restrictions. Again, do not forget to include ONE in the Rhs;.

--> REGRESS;Lhs=COST;Rhs=ONE,G1,G2,G3,G4,G5,G6,OUTPUT,FUEL,LOAD;
   Cls:b(1)+b(2)+b(3)+b(4)+b(5)+b(6)=0$

+-----------------------------------------------------------------------+
| Linearly restricted regression                                        |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
|             (Note:  Not using OLS.  R-squared is not bounded in [0,1] |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Note, when restrictions are imposed, R-squared can be less than zero. |
| F[ 1,    80] for the restrictions =       .0000, Prob =  1.0000       |
| Autocorrel: Durbin-Watson Statistic =   1.02645,   Rho =       .48677 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 Constant     12.12205614       .27886962   43.469   .0000
 G1          -2.416106889   .89836871E-01  -26.894   .0000     .16666667
 G2          -2.457340873   .82929154E-01  -29.632   .0000     .16666667
 G3          -2.625023469   .56175656E-01  -46.729   .0000     .16666667
 G4          -2.231542336   .41557714E-01  -53.697   .0000     .16666667
 G5          -2.392042574   .29995908E-01  -79.746   .0000     .16666667
 G6          -2.329034870   .33569388E-01  -69.380   .0000     .16666667
 OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
 FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
 LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016

LSDV3 in LIMDEP reports different dummy coefficients. But you may draw actual intercepts of groups in a manner similar to what you would do in SAS and STATA. The actual intercept of group 3, for example, is 9.497 = 12.122 + (-2.625).

Top

4.5 Within Group Effect Model

The within effect model does not use the dummies and thus has larger degrees of freedom, smaller MSE, and smaller standard errors of parameters than those of LSDV. As a consequence, you need to adjust standard errors. This model does not report individual dummy coefficients either. The SAS TSCSREG procedure and LIMDEP Regress$ command report the adjusted (correct) MSE, SEE (Root MSE), R2, and standard errors.

4.5.1 Estimating the Within Effect Model

First, let us manually estimate the within group effect model in STATA. You need to compute group means and transform dependent and independent variables using group means (log is skipped here).

. egen gm_cost=mean(cost), by(airline) // compute group means
. egen gm_output=mean(output), by(airline)
. egen gm_fuel=mean(fuel), by(airline)
. egen gm_load=mean(load), by(airline)

You will get the following group means of variables.
+------------------------------------------------------+
  | airline    gm_cost   gm_output    gm_fuel    gm_load |
  |------------------------------------------------------|
  |       1   14.67563    .3192696    12.7318   .5971917 |
  |       2   14.37247    -.033027   12.75171   .5470946 |
  |       3   13.37231   -.9122626   12.78972   .5845358 |
  |       4    13.1358   -1.635174   12.77803   .5476773 |
  |       5   12.36304   -2.285681    12.7921   .5664859 |
  |       6   12.27441    -2.49898    12.7788   .5197756 |
  +------------------------------------------------------+

. gen gw_cost = cost - gm_cost // compute deviations from the group means
. gen gw_output = output - gm_output
. gen gw_fuel = fuel - gm_fuel
. gen gw_load = load - gm_load

Now, we are ready to run the within effect model. Keep in mind that you have to suppress the intercept. Carefully check MSE, SEE, R2, and standard errors.

. regress gw_cost gw_output gw_fuel gw_load, noc // within effect

      Source |       SS       df       MS              Number of obs =      90
-------------+------------------------------           F(  3,    87) = 3871.82
       Model |  39.0683861     3  13.0227954           Prob > F      =  0.0000
    Residual |  .292622861    87  .003363481           R-squared     =  0.9926
-------------+------------------------------           Adj R-squared =  0.9923
       Total |   39.361009    90  .437344544           Root MSE      =    .058
 
------------------------------------------------------------------------------
     gw_cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   gw_output |   .9192846    .028841    31.87   0.000       .86196    .9766092
     gw_fuel |   .4174918   .0146657    28.47   0.000     .3883422    .4466414
     gw_load |  -1.070396   .1946109    -5.50   0.000    -1.457206   -.6835858
------------------------------------------------------------------------------

You may compute group intercepts. For example, the intercept of airline 5 is computed as 9.730 = 12.363 ?{.919*(-2.286) + .417*12.792 + (-1.073)*.566 }. In order to get the correct standard errors, you need to adjust them using the ratio of degrees of freedom of the within effect model and the LSDV. For example, the standard error of the logged output is computed as .0299=.0288*sqrt(87/81).

Top

4.5.2 Using the SAS TSCSREG and PANEL Procedures

The TSCSREG and PANEL procedures of SAS/ETS allows users to fit the within effect model conveniently. The procedures, in fact, report LSDV1, but you do not need to create dummy variables and compute deviations from the group means. This procedures reports correct MSE, SEE, R2, and standard errors, and conducts the F test for the fixed group effect as well.

PROC SORT DATA=masil.airline;
   BY airline year;

PROC TSCSREG DATA=masil.airline;
   ID airline year;
   MODEL cost = output fuel load /FIXONE;
RUN;

                                     The TSCSREG Procedure
 
Dependent Variable: cost
 
                                       Model Description
 
                              Estimation Method             FixOne
                              Number of Cross Sections           6
                              Time Series Length                15
 
 
                                         Fit Statistics
 
                       SSE              0.2926    DFE                  81
                       MSE              0.0036    Root MSE         0.0601
                       R-Square         0.9974
 
 
                                  F Test for No Fixed Effects
 
                             Num DF      Den DF    F Value    Pr > F
 
                                  5          81      57.73    <.0001
 
 
                                      Parameter Estimates
 
                                     Standard
   Variable        DF    Estimate       Error    t Value    Pr > |t|    Label
 
   CS1              1    -0.08706      0.0842      -1.03      0.3042    Cross Sectional
                                                                        Effect    1
   CS2              1     -0.1283      0.0757      -1.69      0.0941    Cross Sectional
                                                                        Effect    2
   CS3              1    -0.29598      0.0500      -5.92      <.0001    Cross Sectional
                                                                        Effect    3
   CS4              1    0.097494      0.0330       2.95      0.0041    Cross Sectional
                                                                        Effect    4
   CS5              1    -0.06301      0.0239      -2.64      0.0100    Cross Sectional
                                                                        Effect    5
   Intercept        1    9.793004      0.2637      37.14      <.0001    Intercept
   output           1    0.919285      0.0299      30.76      <.0001
   fuel             1    0.417492      0.0152      27.47      <.0001
   load             1     -1.0704      0.2017      -5.31      <.0001

Note that a data set needs to be sorted in advance by variables to appear in the ID statement of the TSCSREG and PANEL procedures. The following PANEL procedure returns the same output.

PROC PANEL DATA=masil.airline;
   ID airline year;
   MODEL cost = output fuel load /FIXONE;
RUN;

Top

4.5.3 Using STATA

The STATA .xtreg command fits the within group effect model without creating dummy variables. The command reports correct standard errors and the F test for fixed group effects. This command, however, does not provide an analysis of variance (ANOVA) table and correct R2 and F statistics. The .xtreg command should follow the .tsset command that specifies grouping and time variables.

. tsset airline year

      panel variable:  airline, 1 to 6
       time variable:  year, 1 to 15

The fe of .xtreg indicates the within effect model and i(airline) specifies airline as the independent unit. Note that this command reports adjusted (correct) standard errors.

. xtreg cost output fuel load, fe i(airline) // within group effect

Fixed-effects (within) regression               Number of obs      =        90
Group variable (i): airline                     Number of groups   =         6
 
R-sq:  within  = 0.9926                         Obs per group: min =        15
       between = 0.9856                                        avg =      15.0
       overall = 0.9873                                        max =        15
 
                                                F(3,81)            =   3604.80
corr(u_i, Xb)  = -0.3475                        Prob > F           =    0.0000
 
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      output |   .9192846   .0298901    30.76   0.000     .8598126    .9787565
        fuel |   .4174918   .0151991    27.47   0.000     .3872503    .4477333
        load |  -1.070396     .20169    -5.31   0.000    -1.471696   -.6690963
       _cons |   9.713528    .229641    42.30   0.000     9.256614    10.17044
-------------+----------------------------------------------------------------
     sigma_u |   .1320775
     sigma_e |  .06010514
         rho |  .82843653   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(5, 81) =    57.73               Prob > F = 0.0000

The last line of the output tests the null hypothesis that all dummy parameters in LSDV1 are zero (e.g., g1=0, g2=0, g3=0, g4=0, and g5=0). Not the intercept of 9.714 is that of LSDV3.

Top

4.5.4 Using LIMDEP

In LIMDEP, you have to specify the panel data model and stratification or time variables. The Panel$ and Fixed$ subcommands mean a fixed effect panel data model. The Str$ subcommand specifies a stratification variable.

--> REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Fixed$

+-----------------------------------------------------------------------+
| OLS Without Group Dummy Variables                                     |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   4, Deg.Fr.=     86 |
| Residuals:  Sum of squares= 1.335449522    , Std.Dev.=         .12461 |
| Fit:        R-squared=  .988290, Adjusted R-squared =          .98788 |
| Model test: F[  3,     86] = 2419.33,    Prob value =          .00000 |
| Diagnostic: Log-L =     61.7699, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -4.122, Akaike Info. Crt.=     -1.284 |
| Panel Data Analysis of COST       [ONE way]                           |
|           Unconditional ANOVA (No regressors)                         |
| Source      Variation        Deg. Free.     Mean Square               |
| Between       74.6799                5.         14.9360               |
| Residual      39.3611               84.         .468584               |
| Total         114.041               89.         1.28136               |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 OUTPUT       .8827386341   .13254552E-01   66.599   .0000    -1.1743092
 FUEL         .4539777119   .20304240E-01   22.359   .0000     12.770359
 LOAD        -1.627507797       .34530293   -4.713   .0000     .56046016
 Constant     9.516912231       .22924522   41.514   .0000
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
 
+-----------------------------------------------------------------------+
| Least Squares with Group Dummy Variables                              |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = COST     Mean=   13.36560933    , S.D.=   1.131971444     |
| Model size: Observations =      90, Parameters =   9, Deg.Fr.=     81 |
| Residuals:  Sum of squares= .2926207777    , Std.Dev.=         .06010 |
| Fit:        R-squared=  .997434, Adjusted R-squared =          .99718 |
| Model test: F[  8,     81] = 3935.82,    Prob value =          .00000 |
| Diagnostic: Log-L =    130.0865, Restricted(b=0) Log-L =    -138.3581 |
|             LogAmemiyaPrCrt.=   -5.528, Akaike Info. Crt.=     -2.691 |
| Estd. Autocorrelation of e(i,t)     .573531                           |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 OUTPUT       .9192881432   .29889967E-01   30.756   .0000    -1.1743092
 FUEL         .4174910457   .15199071E-01   27.468   .0000     12.770359
 LOAD        -1.070395015       .20168924   -5.307   .0000     .56046016
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

LIMDEP reports both the pooled OLS regression and the within effect model. Like the SAS TSCSREG procedure, LIMDEP provides correct MSE, SEE, R2, and standard errors.

Top

4.6 Between Group Effect Model: Group Mean Regression

The between effect model uses aggregate information, group means of variables. In other words, the unit of analysis is not an individual observation, but groups or subjects. The number of observations jumps down to n from nT. This group mean regression produces different goodness-of-fits and parameter estimates from those of LSDV and the within effect model.

Let us compute group means and run the OLS regression with them. The .collapse command computes aggregate information and saves into a new data set. Note that /// links two command lines.

. collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) ///
  gm_load=load, by(airline)

. regress gm_cost gm_output gm_fuel gm_load

      Source |       SS       df       MS              Number of obs =       6
-------------+------------------------------           F(  3,     2) =  104.12
       Model |  4.94698124     3  1.64899375           Prob > F      =  0.0095
    Residual |  .031675926     2  .015837963           R-squared     =  0.9936
-------------+------------------------------           Adj R-squared =  0.9841
       Total |  4.97865717     5  .995731433           Root MSE      =  .12585
 
------------------------------------------------------------------------------
     gm_cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   gm_output |   .7824568   .1087646     7.19   0.019     .3144803    1.250433
     gm_fuel |  -5.523904   4.478718    -1.23   0.343    -24.79427    13.74647
     gm_load |  -1.751072   2.743167    -0.64   0.589    -13.55397    10.05182
       _cons |    85.8081   56.48199     1.52   0.268    -157.2143    328.8305
------------------------------------------------------------------------------

The SAS PANEL procedure has the /BTWNG and /BTWNT option to estimate the between effect model. The TSCSREG procedure does not have this option.

PROC PANEL DATA=masil.airline;
   ID airline year;
   MODEL cost = output fuel load /BTWNG;
RUN;

                                      The PANEL Procedure
                                    Between Groups Estimates
 
Dependent Variable: cost
 
                                       Model Description
 
                              Estimation Method            BtwGrps
                              Number of Cross Sections           6
                              Time Series Length                15
 
 
                                         Fit Statistics
 
                       SSE              0.0317    DFE                   2
                       MSE              0.0158    Root MSE         0.1258
                       R-Square         0.9936
 
 
                                      Parameter Estimates
 
                                     Standard
   Variable        DF    Estimate       Error    t Value    Pr > |t|    Label
 
   Intercept        1    85.80901     56.4830       1.52      0.2681    Intercept
   output           1    0.782455      0.1088       7.19      0.0188
   fuel             1    -5.52398      4.4788      -1.23      0.3427
   load             1    -1.75102      2.7432      -0.64      0.5886

The STATA .xtreg command has the be option to fit the between effect model. This command, however, does not report the ANOVA table.

. xtreg cost output fuel load, be i(airline)

Between regression (regression on group means)  Number of obs      =        90
Group variable (i): airline                     Number of groups   =         6
 
R-sq:  within  = 0.8808                         Obs per group: min =        15
       between = 0.9936                                        avg =      15.0
       overall = 0.1371                                        max =        15
 
                                                F(3,2)             =    104.12
sd(u_i + avg(e_i.))=  .1258491                  Prob > F           =    0.0095
 
------------------------------------------------------------------------------
        cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      output |   .7824552   .1087663     7.19   0.019     .3144715    1.250439
        fuel |  -5.523978   4.478802    -1.23   0.343    -24.79471    13.74675
        load |  -1.751016    2.74319    -0.64   0.589    -13.55401    10.05198
       _cons |   85.80901   56.48302     1.52   0.268    -157.2178    328.8358
------------------------------------------------------------------------------

LIMDEP has the Means; subcommand to fit the between effect model.

--> REGRESS;Lhs=COST;Rhs=ONE,OUTPUT,FUEL,LOAD;Panel;Str=AIRLINE;Means$

+-----------------------------------------------------------------------+
| Group Means Regression                                                |
| Ordinary    least squares regression    Weighting variable = none     |
| Dep. var. = YBAR(i.) Mean=   13.36560933    , S.D.=   .9978636346     |
| Model size: Observations =       6, Parameters =   4, Deg.Fr.=      2 |
| Residuals:  Sum of squares= .3167277206E-01, Std.Dev.=         .12584 |
| Fit:        R-squared=  .993638, Adjusted R-squared =          .98410 |
| Model test: F[  3,      2] =  104.13,    Prob value =          .00953 |
| Diagnostic: Log-L =      7.2185, Restricted(b=0) Log-L =      -7.9538 |
|             LogAmemiyaPrCrt.=   -3.635, Akaike Info. Crt.=     -1.073 |
+-----------------------------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 OUTPUT       .7824472689       .10876126    7.194   .0000  .23025612E-11
 FUEL        -5.524437466       4.4786519   -1.234   .2174     .18642891
 LOAD        -1.750947653       2.7430470    -.638   .5233     .32541105
 Constant     85.81483169       56.481148    1.519   .1287

4.7 Testing Fixed Group Effects (F-test)

How do we know whether there are fixed group effects? The null hypothesis is that all dummy parameters except one are zero.

In order to conduct a F-test, let us take the SSE (e’e) of 1.3354 from the pooled OLS regression and .2926 from the LSDVs (LSDV1 through LSDV3) or the within effect model. Alternatively, you may draw R2 of .9974 from LSDV1 or LSDV3 and .9883 from the pooled OLS. Do not, however, use LSDV2 and the within effect model for R2.

The Fstatistic is computed as

The large F statistic rejects the null hypothesis in favor of the fixed group effect model (p<.0000).

The SAS TSCSREG and PANEL procedures and STATA .xtreg command by default conduct the F test. Alternatively, you may conduct the same test with LSDV1. In SAS, add the TEST statement in the REG procedure and run the procedure again (other outputs are skipped).

PROC REG DATA=masil.airline;
   MODEL cost = g1-g5 output fuel load;
   TEST g1 = g2 = g3 = g4 = g5 = 0;
RUN;

                                       The REG Procedure
                                         Model: MODEL1
 
                          Test 1 Results for Dependent Variable cost
 
                                                   Mean
                   Source             DF         Square    F Value    Pr > F
 
                   Numerator           5        0.20856      57.73    <.0001
                   Denominator        81        0.00361

In STATA, run the .test command, a follow-up command for the Wald test, right after estimating the model.

. quietly regress cost g1-g5 output fuel load // LSDV1
. test g1 g2 g3 g4 g5

 ( 1)  g1 = 0
 ( 2)  g2 = 0
 ( 3)  g3 = 0
 ( 4)  g4 = 0
 ( 5)  g5 = 0
 
       F(  5,    81) =   57.73
            Prob > F =    0.0000

4.8 Summary

Table 6 summarizes the estimation of panel data models in SAS, STATA, and LIMDEP. The SAS REG and TSCSREG procedures are generally preferred to STATA and LIMDEP commands.

Table 6 Comparison of the Fixed Effect Model in SAS, STATA, LIMDEP*



Up: Table of Contents
Next: The Fixed Time Effect Model
Prev: Panel Data Model