5. Ordered Logit/Probit Regression Models

Suppose we have an ordinal dependent variable such as the degree of illegal parking (0=none, 1=sometimes, and 2=often). The ordered logit and probit models have the parallel regression assumption, which is violated from time to time.

5.1 Ordered Logit/Probit in STATA (.ologit and .oprobit)

STATA has the .ologit and .oprobit commands to estimate the ordered logit and probit models, respectively.

. ologit parking income age male

Iteration 0:   log likelihood = -103.78713
Iteration 1:   log likelihood = -92.739147
Iteration 2:   log likelihood = -90.036393
Iteration 3:   log likelihood = -89.861679
Iteration 4:   log likelihood = -89.860105
Iteration 5:   log likelihood = -89.860105
 
Ordered logistic regression                       Number of obs   =        437
                                                  LR chi2(3)      =      27.85
                                                  Prob > chi2     =     0.0000
Log likelihood = -89.860105                       Pseudo R2       =     0.1342
 
------------------------------------------------------------------------------
     parking |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |  -.5140709   1.283192    -0.40   0.689    -3.029082     2.00094
         age |  -.7362588   .1894339    -3.89   0.000    -1.107542   -.3649752
        male |  -1.227092   .4705859    -2.61   0.009    -2.149423   -.3047605
-------------+----------------------------------------------------------------
       /cut1 |  -12.74479   3.787616                     -20.16839   -5.321203
       /cut2 |  -10.83295   3.801685                     -18.28412   -3.381786
------------------------------------------------------------------------------

STATA estimates taus, /cut1 and /cut2, assuming beta0=0 (Long and Freese 2003). This parameterization is different from that of SAS and LIMDEP, which assume tau0=0.

. oprobit parking income age male

Iteration 0:   log likelihood = -103.78713
Iteration 1:   log likelihood = -90.990455
Iteration 2:   log likelihood = -89.496288
Iteration 3:   log likelihood = -89.430915
Iteration 4:   log likelihood = -89.430754
 
Ordered probit regression                         Number of obs   =        437
                                                  LR chi2(3)      =      28.71
                                                  Prob > chi2     =     0.0000
Log likelihood = -89.430754                       Pseudo R2       =     0.1383
 
------------------------------------------------------------------------------
     parking |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |  -.1869839   .6116037    -0.31   0.760    -1.385705    1.011737
         age |  -.3594853   .0924817    -3.89   0.000     -.540746   -.1782246
        male |  -.5867871   .2205253    -2.66   0.008    -1.019009   -.1545655
-------------+----------------------------------------------------------------
       /cut1 |  -6.000986   1.869046                     -9.664248   -2.337724
       /cut2 |  -5.118676   1.862909                     -8.769911   -1.467442
------------------------------------------------------------------------------

Top

5.2 The Parallel Assumption and the Generalized Ordered Logit Model

The .brant command of SPost is valid only in the .ologit command. This command tests the parallel regression assumption of the ordinal regression model. The outputs here are skipped.

. quietly ologit parking income male

. brant

The parallel regression assumption is often violated. If this is the case, you may use the multinomial regression model or estimate the generalized ordered logit model (GOLM) using either the .gologit command written by Fu (1998) or the .gologit2 command by Williams (2005).

. gologit2 parking income age male, autofit

------------------------------------------------------------------------------
Testing parallel lines assumption using the .05 level of significance...
 
Step  1:  male meets the pl assumption (P Value = 0.9901)
Step  2:  income meets the pl assumption (P Value = 0.8958)
Step  3:  age meets the pl assumption (P Value = 0.7964)
Step  4:  All explanatory variables meet the pl assumption
 
Wald test of parallel lines assumption for the final model:
 
 ( 1)  [0]male - [1]male = 0
 ( 2)  [0]income - [1]income = 0
 ( 3)  [0]age - [1]age = 0
 
           chi2(  3) =    0.04
         Prob > chi2 =    0.9982
 
An insignificant test statistic indicates that the final model
does not violate the proportional odds/ parallel lines assumption
 
If you re-estimate this exact same model with gologit2, instead
of autofit you can save time by using the parameter
 
pl(male income age)
 
------------------------------------------------------------------------------
 
Generalized Ordered Logit Estimates               Number of obs   =        437
                                                  Wald chi2(3)    =      21.74
                                                  Prob > chi2     =     0.0001
Log likelihood = -89.860105                       Pseudo R2       =     0.1342
 
 ( 1)  [0]male - [1]male = 0
 ( 2)  [0]income - [1]income = 0
 ( 3)  [0]age - [1]age = 0
------------------------------------------------------------------------------
     parking |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
0            |
      income |  -.5140709   1.283192    -0.40   0.689    -3.029082     2.00094
         age |  -.7362588   .1894339    -3.89   0.000    -1.107543   -.3649752
        male |  -1.227092   .4705859    -2.61   0.009    -2.149423   -.3047605
       _cons |   12.74479   3.787616     3.36   0.001     5.321202    20.16839
-------------+----------------------------------------------------------------
1            |
      income |  -.5140709   1.283192    -0.40   0.689    -3.029082     2.00094
         age |  -.7362588   .1894339    -3.89   0.000    -1.107543   -.3649752
        male |  -1.227092   .4705859    -2.61   0.009    -2.149423   -.3047605
       _cons |   10.83295   3.801686     2.85   0.004     3.381785    18.28412
------------------------------------------------------------------------------

Top

5.3 Ordered Logit in SAS

The QLIM, LOGISTIC, and PROBIT procedures estimate ordered logit and probit models. As shown in Tables 3 and 4, the QLIM procedure is most recommended. Note that the DIST=LOGISTIC indicates the logit model to be estimated.

PROC QLIM DATA=masil.students;
   MODEL parking = income age male /DISCRETE (DIST=LOGISTIC);
RUN;

                                       The QLIM Procedure
 
                              Discrete Response Profile of parking
 
                       Index         Value           Frequency    Percent
 
                         1             0                   413      94.51
                         2             1                    20       4.58
                         3             2                     4       0.92
 
 
                                       Model Fit Summary
 
                          Number of Endogenous Variables             1
                          Endogenous Variable                  parking
                          Number of Observations                   437
                          Log Likelihood                     -89.86011
                          Maximum Absolute Gradient         8.14046E-7
                          Number of Iterations                      23
                          AIC                                189.72021
                          Schwarz Criterion                  210.11988
 
 
                                    Goodness-of-Fit Measures
 
           Measure                      Value    Formula
 
           Likelihood Ratio (R)        27.854    2 * (LogL - LogL0)
           Upper Bound of R (U)        207.57    - 2 * LogL0
           Aldrich-Nelson              0.0599    R / (R+N)
           Cragg-Uhler 1               0.0618    1 - exp(-R/N)
           Cragg-Uhler 2               0.1633    (1-exp(-R/N)) / (1-exp(-U/N))
           Estrella                    0.0662    1 - (1-R/U)^(U/N)
           Adjusted Estrella           0.0418    1 - ((LogL-K)/LogL0)^(-2/N*LogL0)
           McFadden's LRI              0.1342    R / U
           Veall-Zimmermann            0.1861    (R * (U+N)) / (U * (R+N))
           McKelvey-Zavoina            0.6462
 
           N = # of observations, K = # of regressors
 
Algorithm converged.
 
 
                                      Parameter Estimates
 
                                                 Standard                 Approx
                Parameter        Estimate           Error    t Value    Pr > |t|
 
                Intercept       12.744794        3.787615       3.36      0.0008
                income          -0.514071        1.283192      -0.40      0.6887
                age             -0.736259        0.189434      -3.89      0.0001
                male            -1.227092        0.470586      -2.61      0.0091
                _Limit2          1.911842        0.468050       4.08      <.0001

The SAS QLIM procedure estimates the intercept and , assuming . The estimated intercept of SAS is equivalent to (0-/cut1) in STATA. The _Limit2 of SAS is the difference between cut points of STATA, 1.91184=-10.83295-(-12.74479).

The SAS LOGISTIC and PROBIT procedures are also used to estimate the ordered logit and probit models. These procedures recognize binary or ordinal response models by examining the dependent variable.

PROC LOGISTIC DATA = masil.students DESC;
   MODEL parking = income age male /LINK=LOGIT;
RUN;

Like the STATA .ologit command, The LOGISTIC procedure fits the model, assuming the intercept is zero. The parameter estimates and standard errors are slightly different from those of the QLIM procedure and the .ologit command. Other parts of the output are skipped.

                           Analysis of Maximum Likelihood Estimates
 
                                              Standard          Wald
             Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq
 
             Intercept 2     1     10.8324      3.8112        8.0784        0.0045
             Intercept 1     1     12.7444      3.8021       11.2354        0.0008
             income          1     -0.5142      1.2908        0.1587        0.6904
             age             1     -0.7362      0.1900       15.0221        0.0001
             male            1     -1.2271      0.4709        6.7902        0.0092

PROC PROBIT DATA = masil.students;
   CLASS parking;
   MODEL parking = income age male /DIST=LOGISTIC;
RUN;

The PROBIT procedure returns almost the same results as the QLIM procedure except for the signs of the estimates. Other parts of the output are skipped.

                                Analysis of Parameter Estimates
 
                                     Standard   95% Confidence     Chi-
              Parameter  DF Estimate    Error       Limits       Square Pr > ChiSq
 
              Intercept   1 -12.7448   3.7876 -20.1684  -5.3212   11.32     0.0008
              Intercept2  1   1.9118   0.4680   0.9945   2.8292   16.68     <.0001
              income      1   0.5141   1.2832  -2.0009   3.0291    0.16     0.6887
              age         1   0.7363   0.1894   0.3650   1.1075   15.11     0.0001
              male        1   1.2271   0.4706   0.3048   2.1494    6.80     0.0091

Top

5.4 Ordered Probit in SAS

The QLIM procedure by default estimates a probit model. The DIST=NORMAL, the default option, may be omitted.

PROC QLIM DATA=masil.students;
   MODEL parking = income age male /DISCRETE (DIST=NORMAL);
RUN;

                                       The QLIM Procedure
 
                              Discrete Response Profile of parking
 
                       Index         Value           Frequency    Percent
 
                         1             0                   413      94.51
                         2             1                    20       4.58
                         3             2                     4       0.92
 
 
                                       Model Fit Summary
 
                          Number of Endogenous Variables             1
                          Endogenous Variable                  parking
                          Number of Observations                   437
                          Log Likelihood                     -89.43075
                          Maximum Absolute Gradient         4.69307E-6
                          Number of Iterations                      17
                          AIC                                188.86151
                          Schwarz Criterion                  209.26117
 
 
                                    Goodness-of-Fit Measures
 
           Measure                      Value    Formula
 
           Likelihood Ratio (R)        28.713    2 * (LogL - LogL0)
           Upper Bound of R (U)        207.57    - 2 * LogL0
           Aldrich-Nelson              0.0617    R / (R+N)
           Cragg-Uhler 1               0.0636    1 - exp(-R/N)
           Cragg-Uhler 2               0.1682    (1-exp(-R/N)) / (1-exp(-U/N))
           Estrella                    0.0683    1 - (1-R/U)^(U/N)
           Adjusted Estrella           0.0439    1 - ((LogL-K)/LogL0)^(-2/N*LogL0)
           McFadden's LRI              0.1383    R / U
           Veall-Zimmermann            0.1915    (R * (U+N)) / (U * (R+N))
           McKelvey-Zavoina            0.3011
 
           N = # of observations, K = # of regressors
 
Algorithm converged.
 
 
                                      Parameter Estimates
 
                                                 Standard                 Approx
                Parameter        Estimate           Error    t Value    Pr > |t|
 
                Intercept        6.000986        1.869053       3.21      0.0013
                income          -0.186984        0.611605      -0.31      0.7598
                age             -0.359485        0.092482      -3.89      0.0001
                male            -0.586787        0.220526      -2.66      0.0078
                _Limit2          0.882310        0.196555       4.49      <.0001

The QLIM procedure and .oprobit command produce almost the same result except for the tau2 estimate. The _Limit2 of SAS is the difference of the cut points of STATA, .88231=-5.118676-(-6.000986).

The PROBIT and LOGISTIC procedures also estimate the ordered probit model. Keep in mind that the signs of the coefficients are reversed in the PROBIT procedure.

PROC LOGISTIC DATA = masil.students DESC;
   MODEL parking = income age male /LINK=PROBIT;
RUN;

                           Analysis of Maximum Likelihood Estimates
 
                                              Standard          Wald
             Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq
 
             Intercept 2     1      5.1181      1.8373        7.7601        0.0053
             Intercept 1     1      6.0004      1.8441       10.5872        0.0011
             income          1     -0.1869      0.6160        0.0921        0.7615
             age             1     -0.3595      0.0908       15.6767        <.0001
             male            1     -0.5868      0.2203        7.0941        0.0077

PROC PROBIT DATA = masil.students;
   CLASS parking;
   MODEL parking = income age male /DIST=NORMAL;
RUN;

                                Analysis of Parameter Estimates
 
                                     Standard   95% Confidence     Chi-
              Parameter  DF Estimate    Error       Limits       Square Pr > ChiSq
 
              Intercept   1  -6.0010   1.8691  -9.6643  -2.3377   10.31     0.0013
              Intercept2  1   0.8823   0.1966   0.4971   1.2675   20.15     <.0001
              income      1   0.1870   0.6116  -1.0117   1.3857    0.09     0.7598
              age         1   0.3595   0.0925   0.1782   0.5407   15.11     0.0001
              male        1   0.5868   0.2205   0.1546   1.0190    7.08     0.0078

Top

5.5 Ordered Logit/Probit in LIMDEP (Ordered$)

The LIMDEP Ordered$ command estimates ordered logit and probit models. The Logit$ subcommand runs the ordered logit model.

ORDERED;
   Lhs=parking;
   Rhs=ONE,income,age,male;
   Logit$

Normal exit from iterations. Exit status=0.
 
+---------------------------------------------+
| Ordered Probability Model                   |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 18, 2005 at 05:53:44PM.|
| Dependent variable              PARKING     |
| Weighting variable                 None     |
| Number of observations              437     |
| Iterations completed                 13     |
| Log likelihood function       -89.86011     |
| Restricted log likelihood     -103.7871     |
| Chi squared                    27.85404     |
| Degrees of freedom                    3     |
| Prob[ChiSqd > value] =         .3896741E-05 |
| Underlying probabilities based on Logistic  |
|    Cell frequencies for outcomes            |
|  Y Count Freq  Y Count Freq  Y Count Freq   |
|  0   413 .945  1    20 .045  2     4 .009   |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
          Index function for probability
 Constant     12.74479424       3.7876161    3.365   .0008
 INCOME      -.5140708643       1.2831923    -.401   .6887     .61683982
 AGE         -.7362588281       .18943391   -3.887   .0001     20.691076
 MALE        -1.227091964       .47058590   -2.608   .0091     .57208238
          Threshold parameters for index
 Mu(1)        1.911841923       .46804996    4.085   .0000
 
+---------------------------------------------------------------------------+
|   Cross tabulation of predictions. Row is actual, column is predicted.    |
|   Model = Logistic  .  Prediction is number of the most probable cell.    |
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| Actual|Row Sum|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      0|    413|  413|    0|    0|
|      1|     20|   20|    0|    0|
|      2|      4|    4|    0|    0|
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|Col Sum|    437|  437|    0|    0|    0|    0|    0|    0|    0|    0|    0|
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

LIMDEP and SAS QLIM produce the same results for the ordered logit model. Note that _Limit2 in SAS is equivalent to Mu(1), the threshold parameter, in LIMDEP.

The ordered probit model is estimated by the Ordered$ command without the Logit$ subcommand. The command by default fits the ordered logit model. The output is comparable to that of the QLIM procedure.

ORDERED;
   Lhs=parking;
   Rhs=ONE,income,age,male;

Normal exit from iterations. Exit status=0.
 
+---------------------------------------------+
| Ordered Probability Model                   |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 18, 2005 at 05:55:42PM.|
| Dependent variable              PARKING     |
| Weighting variable                 None     |
| Number of observations              437     |
| Iterations completed                 11     |
| Log likelihood function       -89.43075     |
| Restricted log likelihood     -103.7871     |
| Chi squared                    28.71275     |
| Degrees of freedom                    3     |
| Prob[ChiSqd > value] =         .2572557E-05 |
| Underlying probabilities based on Normal    |
|    Cell frequencies for outcomes            |
|  Y Count Freq  Y Count Freq  Y Count Freq   |
|  0   413 .945  1    20 .045  2     4 .009   |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
          Index function for probability
 Constant     6.000985035       1.8690536    3.211   .0013
 INCOME      -.1869836008       .61160494    -.306   .7598     .61683982
 AGE         -.3594852294   .92482090E-01   -3.887   .0001     20.691076
 MALE        -.5867870572       .22052578   -2.661   .0078     .57208238
          Threshold parameters for index
 Mu(1)        .8823095981       .19655461    4.489   .0000
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
 
+---------------------------------------------------------------------------+
|   Cross tabulation of predictions. Row is actual, column is predicted.    |
|   Model = Probit    .  Prediction is number of the most probable cell.    |
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| Actual|Row Sum|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      0|    413|  413|    0|    0|
|      1|     20|   20|    0|    0|
|      2|      4|    4|    0|    0|
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|Col Sum|    874|  437|    0|    0|    0|    0|    0|    0|    0|    0|    0|
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

5.6 Ordered Logit/Probit in SPSS

The Plum command estimates the ordered logit and probit models in SPSS. The Threshold points in SPSS are equivalent to the cut points in STATA.

PLUM parking WITH income age male
   /CRITERIA = CIN(95) DELTA(0) LCONVERGE(0) MXITER(100) MXSTEP(5)
      PCONVERGE(1.0E-6) SINGULAR(1.0E-8)
   /LINK = LOGIT /PRINT = FIT PARAMETER SUMMARY .

PLUM parking WITH income age male
   /CRITERIA = CIN(95) DELTA(0) LCONVERGE(0) MXITER(100) MXSTEP(5)
      PCONVERGE(1.0E-6) SINGULAR(1.0E-8)
   /LINK = PROBIT /PRINT = FIT PARAMETER SUMMARY .



Up: Table of Contents
Next: The Multinomial Logit Model
Prev: Bivariate Logit/Probit Models