2. The Poisson Regression Model (PRM)

The SAS GENMOD procedure, STATA .poisson command, and LIMDEP Poisson$ command estimate the Poisson regression model (PRM).


2.1 PRM in SAS

SAS has the GENMOD procedure for the PRM. The /DIST=POISSON option tells SAS to use the Poisson distribution.


PROC GENMOD DATA = masil.accident;
   MODEL accident=emps strict /DIST=POISSON LINK=LOG;
RUN;

                                      The GENMOD Procedure
 
                                       Model Information
 
                               Data Set              COUNT.WASTE
                               Distribution              Poisson
                               Link Function                 Log
                               Dependent Variable       Accident
                               Observations Used             778
 
 
                             Criteria For Assessing Goodness Of Fit
 
                  Criterion                 DF           Value        Value/DF
 
                  Deviance                 775       2827.2079          3.6480
                  Scaled Deviance          775       2827.2079          3.6480
                  Pearson Chi-Square       775       4944.9473          6.3806
                  Scaled Pearson X2        775       4944.9473          6.3806
                  Log Likelihood                     -667.2291
 
          Algorithm converged.
 
 
                                Analysis Of Parameter Estimates
 
                                   Standard     Wald 95% Confidence       Chi-
    Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
 
    Intercept     1      0.3901      0.0467      0.2986      0.4816      69.84        <.0001
    Emps          1      0.0054      0.0007      0.0040      0.0069      53.13        <.0001
    Strict        1     -0.7042      0.0668     -0.8350     -0.5733     111.25        <.0001
    Scale         0      1.0000      0.0000      1.0000      1.0000
 
NOTE: The scale parameter was held fixed.

You will need to run a restricted model without regressors in order to conduct the likelihood ratio test for goodness-of-fit. The chi-squared statistic is 124.8218 = 2* [-667.2291 - (-729.6400)] (p<.0000).

PROC GENMOD DATA = masil.accident;
   MODEL accident= /DIST=POISSON LINK=LOG;
RUN;

                                      The GENMOD Procedure
 
                                       Model Information
 
                              Data Set              MASIL.ACCIDENT
                              Distribution                 Poisson
                              Link Function                    Log
                              Dependent Variable          accident
 
 
                            Number of Observations Read         778
                            Number of Observations Used         778
 
 
                             Criteria For Assessing Goodness Of Fit
 
                  Criterion                 DF           Value        Value/DF
 
                  Deviance                 777       2952.0297          3.7993
                  Scaled Deviance          777       2952.0297          3.7993
                  Pearson Chi-Square       777       4919.9745          6.3320
                  Scaled Pearson X2        777       4919.9745          6.3320
                  Log Likelihood                     -729.6400
 
          Algorithm converged.
 
 
                                Analysis Of Parameter Estimates
 
                                   Standard     Wald 95% Confidence       Chi-
    Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
 
    Intercept     1      0.3168      0.0306      0.2568      0.3768     107.20        <.0001
    Scale         0      1.0000      0.0000      1.0000      1.0000
 
NOTE: The scale parameter was held fixed.

Top

2.2 PRM in STATA

STATA has the .poisson command for the PRM. This command provides likelihood ratio and Pseudo R2 statistics.

. poisson accident emps strict

Iteration 0:   log likelihood = -1821.5112 
Iteration 1:   log likelihood = -1821.5101 
Iteration 2:   log likelihood = -1821.5101 
 
Poisson regression                                Number of obs   =        778
                                                  LR chi2(2)      =     124.82
                                                  Prob > chi2     =     0.0000
Log likelihood = -1821.5101                       Pseudo R2       =     0.0331
 
------------------------------------------------------------------------------
    accident |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        emps |   .0054186   .0007434     7.29   0.000     .0039615    .0068757
      strict |  -.7041664   .0667619   -10.55   0.000    -.8350174   -.5733154
       _cons |   .3900961   .0466787     8.36   0.000     .2986076    .4815846
------------------------------------------------------------------------------

Let us run a restricted model and then run the .display command in order to double check that the likelihood ratio for goodness-of-fit is 124.8218.

. poisson accident

Iteration 0:   log likelihood =  -1883.921 
Iteration 1:   log likelihood =  -1883.921 
 
Poisson regression                                Number of obs   =        778
                                                  LR chi2(0)      =       0.00
                                                  Prob > chi2     =          .
Log likelihood =  -1883.921                       Pseudo R2       =     0.0000
 
------------------------------------------------------------------------------
    accident |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .3168165   .0305995    10.35   0.000     .2568426    .3767904
------------------------------------------------------------------------------

. display 2 * (-1821.5101 - (-1883.921))
124.8218


Top

2.3 Using the SPost Module in STATA

The SPost module provides useful commands for follow-up analyses of various categorical dependent variable models. The .fitstat command calculates various goodness-of-fit statistics such as log likelihood, McFadden’s R2 (or Pseudo R2), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC).

. quietly poisson accident emps strict

. fitstat

Measures of Fit for poisson of accident
 
Log-Lik Intercept Only:    -1883.921     Log-Lik Full Model:        -1821.510
D(775):                     3643.020     LR(2):                       124.822
                                         Prob > LR:                     0.000
McFadden's R2:                 0.033     McFadden's Adj R2:             0.032
Maximum Likelihood R2:         0.148     Cragg & Uhler's R2:            0.149
AIC:                           4.690     AIC*n:                      3649.020
BIC:                       -1515.943     BIC':                       -111.508

The .listcoef command lists unstandardized coefficients (parameter estimates), factor and percent changes, and standardized coefficients to help interpret regression results.

. listcoef, help

poisson (N=778): Factor Change in Expected Count
 
 Observed SD: 2.9482675
 
----------------------------------------------------------------------
    accident |      b         z     P>|z|    e^b    e^bStdX      SDofX
-------------+--------------------------------------------------------
        emps |   0.00542    7.289   0.000   1.0054   1.2297    38.1548
      strict |  -0.70417  -10.547   0.000   0.4945   0.7031     0.5003
----------------------------------------------------------------------
       b = raw coefficient
       z = z-score for test of b=0
   P>|z| = p-value for z-test
     e^b = exp(b) = factor change in expected count for unit increase in X
 e^bStdX = exp(b*SD of X) = change in expected count for SD increase in X
   SDofX = standard deviation of X

The .prtab command constructs a table of predicted values (events) for all combinations of categorical variables listed. The following example shows that the predicted number of accidents under the strict policy is .9172 at the mean waste quota (emps=42.0129).

. prtab strict

poisson: Predicted rates for accident
 
----------------------
   strict | Prediction
----------+-----------
        0 |     1.8547
        1 |     0.9172
----------------------
 
         emps     strict
x=  42.012853  .50771208

The .prvalue lists predicted values for a given set of values for the independent variables. For example, the predicted probability of a zero count is .3996 at the mean waste quota under the strict policy (strict=1). Note that the predicted rate of .917 is equivalent to .9172 in the .prtab above.

. prvalue, x(strict=1) maxcnt(5)

poisson: Predictions for accident
 
Predicted rate: .917     95% CI [.827   ,   1.02]
 
Predicted probabilities:
 
  Pr(y=0|x):   0.3996  Pr(y=1|x):   0.3665
  Pr(y=2|x):   0.1681  Pr(y=3|x):   0.0514
  Pr(y=4|x):   0.0118  Pr(y=5|x):   0.0022
 
         emps     strict
x=  42.012853          1

The most useful command is the .prchange that calculates marginal effects (changes) and discrete changes. For instance, a standard deviation increase in waste quota form its mean will increase accidents by .3841 under the lenient policy (strict=0).

. prchange, x(strict=0)

poisson: Changes in Predicted Rate for accident
 
        min->max      0->1     -+1/2    -+sd/2  MargEfct
  emps    2.3070    0.0080    0.0101    0.3841    0.0101
strict   -0.9375   -0.9375   -1.3332   -0.6568   -1.3060
 
exp(xb):   1.8547
 
           emps   strict
    x=  42.0129        0
sd(x)=  38.1548  .500262

SPost also includes the .prgen command, which computes a series of predictions by holding all variables but one constant and allowing that variable to vary (Long and Freese 2003). These SPost commands work with most categorical and count data models such as .logit, .probit, .poisson, .nbreg, .zip, and .zinb.

Top

2.4 PRM in LIMDEP

The LIMDEP Poisson$ command estimates the PRM. LIMDEP reports log likelihoods of both the unrestricted and restricted models. Keep in mind that you must include the ONE for the intercept.

POISSON;
   Lhs=ACCIDENT;
   Rhs=ONE,EMPS,STRICT$

+---------------------------------------------+
| Poisson Regression                          |
| Maximum Likelihood Estimates                |
| Model estimated: Aug 24, 2005 at 04:56:45PM.|
| Dependent variable             ACCIDENT     |
| Weighting variable                 None     |
| Number of observations              778     |
| Iterations completed                  8     |
| Log likelihood function       -1821.510     |
| Restricted log likelihood     -1883.921     |
| Chi squared                    124.8218     |
| Degrees of freedom                    2     |
| Prob[ChiSqd > value] =         .0000000     |
| Chi- squared =  4944.94781  RsqP=  -.0051   |
| G  - squared =  2827.20794  RsqD=   .0423   |
| Overdispersion tests: g=mu(i)  :  4.720     |
| Overdispersion tests: g=mu(i)^2:  4.253     |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 Constant     .3900961420   .46678663E-01    8.357   .0000
 EMPS      .5418599057E-02  .74341923E-03    7.289   .0000     42.012853
 STRICT      -.7041663804   .66761926E-01  -10.547   .0000     .50771208
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

SAS, STATA, and LIMDEP produce almost the same parameter estimates and standard errors (Table 3). The log likelihood in SAS is different from that of STATA and LIMDEP (-667.291 versus -1821.5101). This difference seems to come from the generalized linear model that the GENMOD procedure uses. These log likelihoods are, however, equivalent in the sense that they result in the same likelihood ratio.


Up: Table of Contents
Next: The Negative Binomial Regression Model
Prev: Introduction