Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

### 2. The Binary Logit Regression Model

The binary logit model is represented as , where the link function indicates the cumulative standard logistic probability distribution function. This chapter examines how car ownership (owncar) is affected by monthly income (income), age, and gender (male). See the appendix for details about the data set.

2.1 Binary Logit in Stata (.logit)

Stata provides two equivalent commands for the binary logit model, which present the same result in different ways. The .logit command produces coefficients with respect to logit (log of odds), while the .logistic reports estimates as odd ratios.

. logistic owncar income age male

Logistic regression                               Number of obs   =        437
LR chi2(3)      =      18.24
Prob > chi2     =     0.0004
Log likelihood = -273.84758                       Pseudo R2       =     0.0322

------------------------------------------------------------------------------
owncar | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
income |   .9898826   .5677504    -0.02   0.986     .3216431    3.046443
age |   1.279626    .088997     3.55   0.000     1.116561    1.466505
male |   1.513669   .3111388     2.02   0.044     1.011729    2.264633
------------------------------------------------------------------------------

. logit

In order to get the coefficients (log of odds), simply run the .logit without any argument right after the .logistic command. Or run an independent .logit command with all arguments.

. logit owncar income age male

Iteration 0:   log likelihood = -282.96512
Iteration 1:   log likelihood = -273.93537
Iteration 2:   log likelihood = -273.84761
Iteration 3:   log likelihood = -273.84758

Logistic regression                               Number of obs   =        437
LR chi2(3)      =      18.24
Prob > chi2     =     0.0004
Log likelihood = -273.84758                       Pseudo R2       =     0.0322

------------------------------------------------------------------------------
owncar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
income |   -.010169   .5735533    -0.02   0.986    -1.134313    1.113975
age |   .2465678   .0695492     3.55   0.000     .1102539    .3828817
male |   .4145366   .2055527     2.02   0.044     .0116606    .8174126
_cons |  -4.682741   1.474519    -3.18   0.001    -7.572745   -1.792738
------------------------------------------------------------------------------

Note that a coefficient of the .logit is the logarithmic transformed corresponding estimator of the .logistic. For example, .2465678= log(1.279626).

Stata has post-estimation commands that conduct follow-up analyses. The .predict command computes predictions, residuals, or standard errors of the prediction and stores them into a new variable.

. predict r, residual

The .test and .lrtest commands respectively conduct the Wald test and likelihood ratio test.

. test income age

( 1)  income = 0
( 2)  age = 0

chi2(  2) =   12.57
Prob > chi2 =    0.0019

Top

2.2 Using the SPost Module in Stata

The SPost module provides useful follow-up analysis commands (ado files) for various categorical dependent variable models (Long and Freese 2003). The .fitstat command calculates various goodness-of-fit statistics such as log likelihood, McFadden’s R2 (or Pseudo R2), Akaike Information Criterion (AIC), and (Bayesian Information Criterion (BIC).

. fitstat

Measures of Fit for logistic of owncar

Log-Lik Intercept Only:     -282.965     Log-Lik Full Model:         -273.848
D(433):                      547.695     LR(3):                        18.235
Prob > LR:                     0.000
Maximum Likelihood R2:         0.041     Cragg & Uhler's R2:            0.056
McKelvey and Zavoina's R2:     0.059     Efron's R2:                    0.040
Variance of y*:                3.495     Variance of error:             3.290
Count R2:                      0.638     Adj Count R2:                 -0.033
AIC:                           1.272     AIC*n:                       555.695
BIC:                       -2084.916     BIC':                          0.005

The likelihood ratio for goodness of fit is computed as,

. di 2*(-273.848 - (-282.965))

18.234

The .listcoef command lists unstandardized coefficients (parameter estimates), factor and percent changes, and standardized coefficients to help interpret results. The help option tells how to read the outputs.

. listcoef, help

logistic (N=437): Factor Change in Odds

Odds of: 1 vs 0

----------------------------------------------------------------------
owncar |      b         z     P>|z|    e^b    e^bStdX      SDofX
-------------+--------------------------------------------------------
income |  -0.01017   -0.018   0.986   0.9899   0.9982     0.1792
age |   0.24657    3.545   0.000   1.2796   1.4876     1.6108
male |   0.41454    2.017   0.044   1.5137   1.2279     0.4953
----------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
e^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in X
SDofX = standard deviation of X

The .prtab command constructs a table of predicted values (events) for all combinations of categorical variables listed. The following example shows that 60 percent of female and 70 percent of male students are likely to own cars, given the mean values of income and age.

. prtab male

logistic: Predicted probabilities of positive outcome for owncar

----------------------
male | Prediction
----------+-----------
0 |     0.6017
1 |     0.6958
----------------------

income        age       male
x=  .61683982  20.691076  .57208238

The .prvalue lists predicted probabilities of positive and negative outcomes for a given set of values for the independent variables. Note both the .prtab and .prvalue commands report the identical predicted probability that male students own cars, .6017, holding other variables at their means.

. prvalue, x(male=0) rest(mean)

logistic: Predictions for owncar

Pr(y=1|x):          0.6017   95% ci: (0.5286,0.6706)
Pr(y=0|x):          0.3983   95% ci: (0.3294,0.4714)

income        age       male
x=  .61683982  20.691076          0

The most useful command is the .prchange, which calculates marginal effects (changes) and discrete changes at the given set of values of independent variables. The help option tells how to read the outputs. For instance, the predicted probability that a male students owns a car is .094 (0->1) higher than that of female students, holding other variables at their mean.

. prchange, help

logit: Changes in Predicted Probabilities for owncar

min->max      0->1     -+1/2    -+sd/2  MargEfct
income   -0.0019   -0.0023   -0.0023   -0.0004   -0.0023
age    0.4404    0.0032    0.0555    0.0893    0.0556
male    0.0940    0.0940    0.0932    0.0462    0.0934

0       1
Pr(y|x)  0.3430  0.6570

income      age     male
x=   .61684  20.6911  .572082
sd(x)=   .17918  1.61081  .495344

Pr(y|x): probability of observing each y for specified x values
Avg|Chg|: average of absolute value of the change across categories
Min->Max: change in predicted probability as x changes from its minimum to
its maximum
0->1: change in predicted probability as x changes from 0 to 1
-+1/2: change in predicted probability as x changes from 1/2 unit below
base value to 1/2 unit above
-+sd/2: change in predicted probability as x changes from 1/2 standard
dev below base to 1/2 standard dev above
MargEfct: the partial derivative of the predicted probability/rate with
respect to a given independent variable

The SPost module also includes the .prgen, which computes a series of predictions by holding all variables but one interval variable constant and allowing that variable to vary (Long and Freese 2003).

. prgen income, from(.1) to(1.5) x(male=1) rest(median) generate(ppcar)

logistic: Predicted values as income varies from .1 to 1.5.

income        age       male
x=  .58200002         21          1

The above command computes predicted probabilities that male students own cars when income changes from \$100 through \$1,500, holding age at its median of 21 and stores them into a new variable ppcar.

Top

2.3 Using the SAS LOGISTIC and PROBIT Procedures

SAS has several procedures for the binary logit model such as the LOGISTIC, PROBIT, GENMOD, and QLIM. The LOGISTIC procedure is commonly used for the binary logit model, but the PROBIT procedure also estimates the binary logit. Let us first consider the LOGISTIC procedure.

PROC LOGISTIC DESCENDING DATA = masil.students;
MODEL owncar = income age male;
RUN;

The LOGISTIC Procedure

Model Information

Data Set                      MASIL.STUDENTS
Response Variable             owncar
Number of Response Levels     2
Model                         binary logit
Optimization Technique        Fisher's scoring

Number of Observations Used         437

Response Profile

Ordered                      Total
Value       owncar     Frequency

1            1           284
2            0           153

Probability modeled is owncar=1.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept
Intercept            and
Criterion          Only     Covariates

AIC             567.930        555.695
SC              572.010        572.015
-2 Log L        565.930        547.695

Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        18.2351        3         0.0004
Score                   17.4697        3         0.0006
Wald                    16.7977        3         0.0008

Analysis of Maximum Likelihood Estimates

Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -4.6827      1.4745       10.0855        0.0015
income        1     -0.0102      0.5736        0.0003        0.9859
age           1      0.2466      0.0695       12.5686        0.0004
male          1      0.4145      0.2056        4.0670        0.0437

Odds Ratio Estimates

Point          95% Wald
Effect    Estimate      Confidence Limits

income       0.990       0.322       3.046
age          1.280       1.117       1.467
male         1.514       1.012       2.265

Association of Predicted Probabilities and Observed Responses

Percent Concordant     58.9    Somers' D    0.246
Percent Discordant     34.3    Gamma        0.264
Percent Tied            6.8    Tau-a        0.112
Pairs                 43452    c            0.623

The SAS LOGISTIC, PROBIT, and GENMOD procedures by default uses a smaller value in the dependent variable as success. Thus, the magnitudes of the coefficients remain the same, but the signs are opposite to those of the QLIM procedure, Stata, and LIMDEP. The DESCENDING option forces SAS to use a larger value as success. Alternatively, you may explicitly specify the category of successful event using the EVENT option as follows.

PROC LOGISTIC DESCENDING DATA = masil.students;
MODEL owncar(EVENT=’1’) = income age male;
RUN;

The SAS LOGISTIC procedure computes odds changes when independent variables increase by the units specified in the UNITS statement. The SD below indicates a standard deviation increase in income and age (e.g., -2 means a two unit decrease in independent variables).

PROC LOGISTIC DESCENDING DATA = masil.students;
MODEL owncar = income age male;
UNITS income=SD age=SD;
RUN;

The UNITS statement adds the “Adjusted Odds Ratios” to the end of the outputs above. Note that the odds changes of the two variables are identical to those under the “e^bStdX” of the previous SPost .listcoef output.

Effect         Unit     Estimate

income       0.1792        0.998
age          1.6108        1.488

Now, let us use the PROBIT procedure to estimate the same binary logit model. The PROBIT requires the CLASS statement to list categorical variables. The /DIST=LOGISTIC option indicates the probability distribution to be used in maximum likelihood estimation.

PROC PROBIT DATA = masil.students;
CLASS owncar;
MODEL owncar = income age male /DIST=LOGISTIC;
RUN;

Probit Procedure

Model Information

Data Set                  MASIL.STUDENTS
Dependent Variable                owncar
Number of Observations               437
Name of Distribution            Logistic
Log Likelihood               -273.847577

Number of Observations Used         437

Class Level Information

Name        Levels    Values

owncar           2    0 1

Response Profile

Ordered                  Total
Value    owncar    Frequency

1    0               153
2    1               284

PROC PROBIT is modeling the probabilities of levels of owncar having LOWER Ordered Values in
the response profile table.

Algorithm converged.

Type III Analysis of Effects

Wald
Effect       DF    Chi-Square    Pr > ChiSq

income        1        0.0003        0.9859
age           1       12.5686        0.0004
male          1        4.0670        0.0437

Analysis of Parameter Estimates

Standard   95% Confidence     Chi-
Parameter  DF Estimate    Error       Limits       Square Pr > ChiSq

Intercept   1   4.6827   1.4745   1.7927   7.5727   10.09     0.0015
income      1   0.0102   0.5736  -1.1140   1.1343    0.00     0.9859
age         1  -0.2466   0.0695  -0.3829  -0.1103   12.57     0.0004
male        1  -0.4145   0.2056  -0.8174  -0.0117    4.07     0.0437

Unlike LOGISTIC, PROBIT does not have the DESCENDING option. Thus, you have to switch the signs of coefficients when comparing with those of Stata and LIMDEP. The PROBIT procedure also does not have the UNITS statement to compute changes in odds.

Top

2.4 Using the SAS GENMOD and QLIM Procedures

The GENMOD provides flexible methods to estimate generalized linear model. The DISTRIBUTION (DIST) and the LINK=LOGIT options respectively specify a probability distribution and a link function.

PROC GENMOD DATA = masil.students DESC;
MODEL owncar = income age male /DIST=BINOMIAL LINK=LOGIT;
RUN;

The GENMOD Procedure

Model Information

Data Set              MASIL.STUDENTS
Distribution                Binomial
Dependent Variable            owncar

Number of Observations Used         437
Number of Events                    284
Number of Trials                    437

Response Profile

Ordered                  Total
Value    owncar    Frequency

1    1               284
2    0               153

PROC GENMOD is modeling the probability that owncar='1'.

Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                 433        547.6952          1.2649
Scaled Deviance          433        547.6952          1.2649
Pearson Chi-Square       433        436.4352          1.0079
Scaled Pearson X2        433        436.4352          1.0079
Log Likelihood                     -273.8476

Algorithm converged.

Analysis Of Parameter Estimates

Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept     1     -4.6827      1.4745     -7.5727     -1.7927      10.09        0.0015
income        1     -0.0102      0.5736     -1.1343      1.1140       0.00        0.9859
age           1      0.2466      0.0695      0.1103      0.3829      12.57        0.0004
male          1      0.4145      0.2056      0.0117      0.8174       4.07        0.0437
Scale         0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.

If you have categorical (string) independent variables, list the variables in the CLASS statement without creating dummy variables.

PROC GENMOD DATA = masil.students DESC;
CLASS male;
MODEL owncar = income age male /DIST=BINOMIAL LINK=LOGIT;
RUN;

PROC GENMOD DATA = masil.students DESC;
MODEL owncar = income age male /DIST=BINOMIAL;
RUN;

All three GENMOD examples discussed so far produce the identical result.

The QLIM procedure estimates not only logit and probit models, but also censored, truncated, and sample-selected models. You may provide characteristics of the dependent variable either in the ENDOGENOUS statement or the option of the MODEL statement.

PROC QLIM DATA=masil.students;
MODEL owncar = income age male;
ENDOGENOUS owncar ~ DISCRETE (DIST=LOGIT);
RUN;

Or,

PROC QLIM DATA=masil.students;
MODEL owncar = income age male /DISCRETE (DIST=LOGIT);
RUN;

The QLIM Procedure

Discrete Response Profile of owncar

Index         Value           Frequency    Percent

1             0                   153      35.01
2             1                   284      64.99

Model Fit Summary

Number of Endogenous Variables             1
Endogenous Variable                   owncar
Number of Observations                   437
Log Likelihood                    -273.84758
Number of Iterations                       8
AIC                                555.69515
Schwarz Criterion                  572.01489

Goodness-of-Fit Measures

Measure                      Value    Formula

Likelihood Ratio (R)        18.235    2 * (LogL - LogL0)
Upper Bound of R (U)        565.93    - 2 * LogL0
Aldrich-Nelson              0.0401    R / (R+N)
Cragg-Uhler 1               0.0409    1 - exp(-R/N)
Cragg-Uhler 2               0.0563    (1-exp(-R/N)) / (1-exp(-U/N))
Estrella                    0.0415    1 - (1-R/U)^(U/N)
Adjusted Estrella           0.0234    1 - ((LogL-K)/LogL0)^(-2/N*LogL0)
McFadden's LRI              0.0322    R / U
Veall-Zimmermann             0.071    (R * (U+N)) / (U * (R+N))
McKelvey-Zavoina            0.1699

N = # of observations, K = # of regressors

Algorithm converged.

Parameter Estimates

Standard                 Approx
Parameter        Estimate           Error    t Value    Pr > |t|

Intercept       -4.682741        1.474519      -3.18      0.0015
income          -0.010169        0.573553      -0.02      0.9859
age              0.246568        0.069549       3.55      0.0004
male             0.414537        0.205553       2.02      0.0437

Finally, the CATMOD procedure fits the logit model to the functions of categorical response variables. This procedure, however, produces slightly different estimators compared to those of other procedures discussed so far. This procedure is, therefore, less recommended for the binary logit model. The DIRECT statement specifies interval or ratio variables used in the MODEL. The /NOPROFILE suppresses the display of the population profiles and the response profiles.

PROC CATMOD DATA = masil.students;
DIRECT income age;
MODEL owncar = income age male /NOPROFILE;
RUN;

Top

2.5 Binary Logit in LIMDEP (Logit\$)

The Logit\$ command in LIMDEP estimates various logit models. The dependent variable is specified in the Lhs\$ (left-hand side) subcommand and a list of independent variables in the Rhs\$ (right-hand side). You have to explicitly specify the ONE for the intercept. The Marginal Effects\$ and the Means\$ subcommands compute marginal effects at the mean values of independent variables.

LOGIT;
Lhs=owncar;
Rhs=ONE,income,age,male;
Marginal Effects; Means\$

Normal exit from iterations. Exit status=0.

+---------------------------------------------+
| Multinomial Logit Model                     |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 17, 2005 at 05:31:28PM.|
| Dependent variable               OWNCAR     |
| Weighting variable                 None     |
| Number of observations              437     |
| Iterations completed                  5     |
| Log likelihood function       -273.8476     |
| Restricted log likelihood     -282.9651     |
| Chi squared                    18.23509     |
| Degrees of freedom                    3     |
| Prob[ChiSqd > value] =         .3933723E-03 |
| Hosmer-Lemeshow chi-squared =   8.44648     |
| P-value=  .39111 with deg.fr. =       8     |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Characteristics in numerator of Prob[Y = 1]
Constant    -4.682741385       1.4745190   -3.176   .0015
INCOME   -.1016896029E-01      .57355331    -.018   .9859     .61683982
AGE          .2465677833   .69549211E-01    3.545   .0004     20.691076
MALE         .4145365774       .20555276    2.017   .0437     .57208238
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

+--------------------------------------------------------------------+
| Information Statistics for Discrete Choice Model.                  |
|                            M=Model MC=Constants Only   M0=No Model |
| Criterion F (log L)     -273.84758        -282.96512    -302.90532 |
| LR Statistic vs. MC       18.23509            .00000        .00000 |
| Degrees of Freedom         3.00000            .00000        .00000 |
| Prob. Value for LR          .00039            .00000        .00000 |
| Entropy for probs.       273.84758         282.96512     302.90532 |
| Normalized Entropy          .90407            .93417       1.00000 |
| Entropy Ratio Stat.       58.11548          39.88039        .00000 |
| Bayes Info Criterion     565.93495         584.17004     624.05044 |
| BIC - BIC(no model)       58.11548          39.88039        .00000 |
| Pseudo R-squared            .03222            .00000        .00000 |
| Pct. Correct Prec.        63.84439            .00000      50.00000 |
| Means:       y=0    y=1    y=2    y=3    yu=4   y=5,    y=6   y>=7 |
| Outcome     .3501  .6499  .0000  .0000  .0000  .0000  .0000  .0000 |
| Pred.Pr     .3501  .6499  .0000  .0000  .0000  .0000  .0000  .0000 |
| Notes: Entropy computed as Sum(i)Sum(j)Pfit(i,j)*logPfit(i,j).     |
|        Normalized entropy is computed against M0.                  |
|        Entropy ratio statistic is computed against M0.             |
|        BIC = 2*criterion - log(N)*degrees of freedom.              |
|        If the model has only constants or if it has no constants,  |
|        the statistics reported here are not useable.               |
+--------------------------------------------------------------------+

+-------------------------------------------+
| Partial derivatives of probabilities with |
| respect to the vector of characteristics. |
| They are computed at the means of the Xs. |
+-------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Characteristics in numerator of Prob[Y = 1]
Constant    -1.055282283       .33183024   -3.180   .0015
INCOME   -.2291632775E-02      .12925338    -.018   .9859     .61683982
AGE       .5556544593E-01  .15534022E-01    3.577   .0003     20.691076
Marginal effect for dummy variable is P|1 - P|0.
MALE      .9403411023E-01  .46726710E-01    2.012   .0442     .57208238
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

+----------------------------------------+
| Fit Measures for Binomial Choice Model |
| Logit    model for variable OWNCAR     |
+----------------------------------------+
| Proportions P0= .350114   P1= .649886  |
| N =     437 N0=     153   N1=     284  |
| LogL =  -273.84758 LogL0 =  -282.9651  |
| Estrella = 1-(L/L0)^(-2L0/n) = .04153  |
+----------------------------------------+
|     Efron |  McFadden  |  Ben./Lerman  |
|    .03963 |    .03222  |       .56318  |
|    Cramer | Veall/Zim. |     Rsqrd_ML  |
|    .04010 |    .07099  |       .04087  |
+----------------------------------------+
| Information  Akaike I.C. Schwarz I.C.  |
| Criteria        1.27161     572.01489  |
+----------------------------------------+
Frequencies of actual & predicted outcomes
Predicted outcome has maximum probability.
Threshold value for predicting Y=1 = .5000
Predicted
------  ----------  +  -----
Actual      0    1  |  Total
------  ----------  +  -----
0        21  132  |    153
1        26  258  |    284
------  ----------  +  -----
Total      47  390  |    437

Note that the marginal effects above are identical to those of the SPost .prchange command in section 2.2. LIMDEP computes discrete changes for binary variables like male.

2.6 Binary Logit in SPSS

SPSS has the Logistic regression command for the binary logit model.

LOGISTIC REGRESSION VAR=owncar
/METHOD=ENTER income age male
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .