Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

3. The Binary Probit Regression Model

The probit model is represented as , where the link function indicates the cumulative standard normal probability distribution.

3.1 Binary Probit in STATA (.probit)

STATA has the .probit command to estimate the binary probit regression model.

. probit owncar income age male

Iteration 0:   log likelihood = -282.96512
Iteration 1:   log likelihood = -273.84832
Iteration 2:   log likelihood = -273.81741
Iteration 3:   log likelihood = -273.81741

Probit regression                                 Number of obs   =        437
LR chi2(3)      =      18.30
Prob > chi2     =     0.0004
Log likelihood = -273.81741                       Pseudo R2       =     0.0323

------------------------------------------------------------------------------
owncar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
income |   .0005613   .3476842     0.00   0.999    -.6808873    .6820098
age |   .1487005   .0409837     3.63   0.000      .068374    .2290271
male |   .2579112   .1256085     2.05   0.040     .0117231    .5040993
_cons |  -2.823671   .8730955    -3.23   0.001    -4.534907   -1.112435
------------------------------------------------------------------------------

In order to get standardized estimates and factor changes, run the SPost .listcoef command.

. listcoef

probit (N=437): Unstandardized and Standardized Estimates

Observed SD: .47755228
Latent SD: 1.0371456

-------------------------------------------------------------------------------
owncar |      b         z     P>|z|    bStdX    bStdY   bStdXY      SDofX
-------------+-----------------------------------------------------------------
income |   0.00056    0.002   0.999   0.0001   0.0005   0.0001     0.1792
age |   0.14870    3.628   0.000   0.2395   0.1434   0.2309     1.6108
male |   0.25791    2.053   0.040   0.1278   0.2487   0.1232     0.4953
-------------------------------------------------------------------------------

You may compute the marginal effects and discrete change using the SPost .prchange.

. prchange, x(income=1 age=21 male=0)

probit: Changes in Predicted Probabilities for owncar

min->max      0->1     -+1/2    -+sd/2  MargEfct
income    0.0002    0.0002    0.0002    0.0000    0.0002
age    0.4900    0.0014    0.0567    0.0912    0.0567
male    0.0937    0.0937    0.0981    0.0487    0.0984

0       1
Pr(y|x)  0.3822  0.6178

income      age     male
x=        1       21        0
sd(x)=   .17918  1.61081  .495344

Top

3.2 Using the PROBIT and LOGISTIC Procedures

The PROBIT and LOGISTIC procedures estimate the binary probit model. Keep in mind that the coefficients of PROBIT has opposite signs.

PROC PROBIT DATA = masil.students;
CLASS owncar;
MODEL owncar = income age male;
RUN;

Probit Procedure

Model Information

Data Set                  MASIL.STUDENTS
Dependent Variable                owncar
Number of Observations               437
Name of Distribution              Normal
Log Likelihood              -273.8174115

Number of Observations Used         437

Class Level Information

Name        Levels    Values

owncar           2    0 1

Response Profile

Ordered                  Total
Value    owncar    Frequency

1    0               153
2    1               284

PROC PROBIT is modeling the probabilities of levels of owncar having LOWER Ordered Values in
the response profile table.

Algorithm converged.

Type III Analysis of Effects

Wald
Effect       DF    Chi-Square    Pr > ChiSq

income        1        0.0000        0.9987
age           1       13.1644        0.0003
male          1        4.2160        0.0400

Analysis of Parameter Estimates

Standard   95% Confidence     Chi-
Parameter  DF Estimate    Error       Limits       Square Pr > ChiSq

Intercept   1   2.8237   0.8731   1.1124   4.5349   10.46     0.0012
income      1  -0.0006   0.3477  -0.6820   0.6809    0.00     0.9987
age         1  -0.1487   0.0410  -0.2290  -0.0684   13.16     0.0003
male        1  -0.2579   0.1256  -0.5041  -0.0117    4.22     0.0400

PROC LOGISTIC DATA = masil.students DESC;
MODEL owncar = income age male /LINK=PROBIT;
RUN;

The LOGISTIC Procedure

Model Information

Data Set                      MASIL.STUDENTS
Response Variable             owncar
Number of Response Levels     2
Model                         binary probit
Optimization Technique        Fisher's scoring

Number of Observations Used         437

Response Profile

Ordered                      Total
Value       owncar     Frequency

1            1           284
2            0           153

Probability modeled is owncar=1.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept
Intercept            and
Criterion          Only     Covariates

AIC             567.930        555.635
SC              572.010        571.955
-2 Log L        565.930        547.635

Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        18.2954        3         0.0004
Score                   17.4697        3         0.0006
Wald                    17.4690        3         0.0006

Analysis of Maximum Likelihood Estimates

Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -2.8237      0.8796       10.3048        0.0013
income        1    0.000548      0.3496        0.0000        0.9987
age           1      0.1487      0.0413       12.9602        0.0003
male          1      0.2579      0.1257        4.2096        0.0402

Association of Predicted Probabilities and Observed Responses

Percent Concordant     57.8    Somers' D    0.249
Percent Discordant     32.9    Gamma        0.274
Percent Tied            9.3    Tau-a        0.113
Pairs                 43452    c            0.624

Top

3.3 Using the GENMODE and QLIM Procedures

The GENMOD procedure also estimates the binary probit model using the /DIST=BINOMIAL and /LINK=PROBIT options in the MODEL statement.

PROC GENMOD DATA = masil.students DESC;
MODEL owncar = income age male /DIST=BINOMIAL LINK=PROBIT;
RUN;

The GENMOD Procedure

Model Information

Data Set              MASIL.STUDENTS
Distribution                Binomial
Dependent Variable            owncar

Number of Observations Used         437
Number of Events                    284
Number of Trials                    437

Response Profile

Ordered                  Total
Value    owncar    Frequency

1    1               284
2    0               153

PROC GENMOD is modeling the probability that owncar='1'.

Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                 433        547.6348          1.2647
Scaled Deviance          433        547.6348          1.2647
Pearson Chi-Square       433        437.0270          1.0093
Scaled Pearson X2        433        437.0270          1.0093
Log Likelihood                     -273.8174

Algorithm converged.

Analysis Of Parameter Estimates

Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept     1     -2.8237      0.8731     -4.5349     -1.1124      10.46        0.0012
income        1      0.0006      0.3477     -0.6809      0.6820       0.00        0.9987
age           1      0.1487      0.0410      0.0684      0.2290      13.16        0.0003
male          1      0.2579      0.1256      0.0117      0.5041       4.22        0.0400
Scale         0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.

The QLIM procedure provides various goodness-of-fit statistics. The DIST=NORMAL option indicates the normal probability distribution used in estimation.

PROC QLIM DATA=masil.students;
MODEL owncar = income age male /DISCRETE (DIST=NORMAL);
RUN;

The QLIM Procedure

Discrete Response Profile of owncar

Index         Value           Frequency    Percent

1             0                   153      35.01
2             1                   284      64.99

Model Fit Summary

Number of Endogenous Variables             1
Endogenous Variable                   owncar
Number of Observations                   437
Log Likelihood                    -273.81741
Number of Iterations                      10
AIC                                555.63482
Schwarz Criterion                  571.95456

Goodness-of-Fit Measures

Measure                      Value    Formula

Likelihood Ratio (R)        18.295    2 * (LogL - LogL0)
Upper Bound of R (U)        565.93    - 2 * LogL0
Aldrich-Nelson              0.0402    R / (R+N)
Cragg-Uhler 1                0.041    1 - exp(-R/N)
Cragg-Uhler 2               0.0565    (1-exp(-R/N)) / (1-exp(-U/N))
Estrella                    0.0417    1 - (1-R/U)^(U/N)
Adjusted Estrella           0.0235    1 - ((LogL-K)/LogL0)^(-2/N*LogL0)
McFadden's LRI              0.0323    R / U
Veall-Zimmermann            0.0712    (R * (U+N)) / (U * (R+N))
McKelvey-Zavoina            0.0702

N = # of observations, K = # of regressors

Algorithm converged.

Parameter Estimates

Standard                 Approx
Parameter        Estimate           Error    t Value    Pr > |t|

Intercept       -2.823671        0.873096      -3.23      0.0012
income           0.000561        0.347684       0.00      0.9987
age              0.148701        0.040984       3.63      0.0003
male             0.257911        0.125608       2.05      0.0400

Top

3.4 Binary Probit in LIMDEP (Probit\$)

The LIMDEP Probit\$ command estimates various probit models. Do not forget to include the ONE for the intercept.

PROBIT;
Lhs=owncar;
Rhs=ONE,income,age,male\$

Normal exit from iterations. Exit status=0.

+---------------------------------------------+
| Binomial Probit Model                       |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 17, 2005 at 10:28:56PM.|
| Dependent variable               OWNCAR     |
| Weighting variable                 None     |
| Number of observations              437     |
| Iterations completed                  4     |
| Log likelihood function       -273.8174     |
| Restricted log likelihood     -282.9651     |
| Chi squared                    18.29542     |
| Degrees of freedom                    3     |
| Prob[ChiSqd > value] =         .3822542E-03 |
| Hosmer-Lemeshow chi-squared =   8.18372     |
| P-value=  .41573 with deg.fr. =       8     |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Index function for probability
Constant    -2.823670829       .87309548   -3.234   .0012
INCOME    .5612515407E-03      .34768423     .002   .9987     .61683982
AGE          .1487005234   .40983697E-01    3.628   .0003     20.691076
MALE         .2579111914       .12560848    2.053   .0400     .57208238
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

+----------------------------------------+
| Fit Measures for Binomial Choice Model |
| Probit   model for variable OWNCAR     |
+----------------------------------------+
| Proportions P0= .350114   P1= .649886  |
| N =     437 N0=     153   N1=     284  |
| LogL =  -273.81741 LogL0 =  -282.9651  |
| Estrella = 1-(L/L0)^(-2L0/n) = .04166  |
+----------------------------------------+
|     Efron |  McFadden  |  Ben./Lerman  |
|    .03984 |    .03233  |       .56327  |
|    Cramer | Veall/Zim. |     Rsqrd_ML  |
|    .04016 |    .07121  |       .04100  |
+----------------------------------------+
| Information  Akaike I.C. Schwarz I.C.  |
| Criteria        1.27148     571.95456  |
+----------------------------------------+
Frequencies of actual & predicted outcomes
Predicted outcome has maximum probability.
Threshold value for predicting Y=1 = .5000
Predicted
------  ----------  +  -----
Actual      0    1  |  Total
------  ----------  +  -----
0         5  148  |    153
1         8  276  |    284
------  ----------  +  -----
Total      13  424  |    437

3.5 Binary Probit in SPSS

SPSS has the Probit command to fit the binary probit model. This command requires a variable (e.g., n in the following example) with constant 1.

COMPUTE n=1. PROBIT owncar OF n WITH income age male
/LOG NONE /MODEL PROBIT
/PRINT FREQ /CRITERIA ITERATE(20) STEPLIMIT(.1).