3. The Binary Probit Regression Model

The probit model is represented as , where the link function indicates the cumulative standard normal probability distribution.

3.1 Binary Probit in STATA (.probit)

STATA has the .probit command to estimate the binary probit regression model.

. probit owncar income age male

Iteration 0:   log likelihood = -282.96512
Iteration 1:   log likelihood = -273.84832
Iteration 2:   log likelihood = -273.81741
Iteration 3:   log likelihood = -273.81741
 
Probit regression                                 Number of obs   =        437
                                                  LR chi2(3)      =      18.30
                                                  Prob > chi2     =     0.0004
Log likelihood = -273.81741                       Pseudo R2       =     0.0323
 
------------------------------------------------------------------------------
      owncar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |   .0005613   .3476842     0.00   0.999    -.6808873    .6820098
         age |   .1487005   .0409837     3.63   0.000      .068374    .2290271
        male |   .2579112   .1256085     2.05   0.040     .0117231    .5040993
       _cons |  -2.823671   .8730955    -3.23   0.001    -4.534907   -1.112435
------------------------------------------------------------------------------

In order to get standardized estimates and factor changes, run the SPost .listcoef command.

. listcoef

probit (N=437): Unstandardized and Standardized Estimates
 
 Observed SD: .47755228
   Latent SD: 1.0371456
 
-------------------------------------------------------------------------------
      owncar |      b         z     P>|z|    bStdX    bStdY   bStdXY      SDofX
-------------+-----------------------------------------------------------------
      income |   0.00056    0.002   0.999   0.0001   0.0005   0.0001     0.1792
         age |   0.14870    3.628   0.000   0.2395   0.1434   0.2309     1.6108
        male |   0.25791    2.053   0.040   0.1278   0.2487   0.1232     0.4953
-------------------------------------------------------------------------------

You may compute the marginal effects and discrete change using the SPost .prchange.

. prchange, x(income=1 age=21 male=0)

probit: Changes in Predicted Probabilities for owncar
 
        min->max      0->1     -+1/2    -+sd/2  MargEfct
income    0.0002    0.0002    0.0002    0.0000    0.0002
   age    0.4900    0.0014    0.0567    0.0912    0.0567
  male    0.0937    0.0937    0.0981    0.0487    0.0984
 
              0       1
Pr(y|x)  0.3822  0.6178
 
         income      age     male
    x=        1       21        0
sd(x)=   .17918  1.61081  .495344

Top

3.2 Using the PROBIT and LOGISTIC Procedures

The PROBIT and LOGISTIC procedures estimate the binary probit model. Keep in mind that the coefficients of PROBIT has opposite signs.

PROC PROBIT DATA = masil.students;
   CLASS owncar;
   MODEL owncar = income age male;
RUN;

                                        Probit Procedure
 
                                       Model Information
 
                            Data Set                  MASIL.STUDENTS
                            Dependent Variable                owncar
                            Number of Observations               437
                            Name of Distribution              Normal
                            Log Likelihood              -273.8174115
 
 
                            Number of Observations Read         437
                            Number of Observations Used         437
 
 
                                    Class Level Information
 
                                  Name        Levels    Values
 
                                  owncar           2    0 1
 
 
                                        Response Profile
 
                                 Ordered                  Total
                                   Value    owncar    Frequency
 
                                       1    0               153
                                       2    1               284
 
PROC PROBIT is modeling the probabilities of levels of owncar having LOWER Ordered Values in
the response profile table.
 
 
          Algorithm converged.
 
 
                                 Type III Analysis of Effects
 
                                                   Wald
                          Effect       DF    Chi-Square    Pr > ChiSq
 
                          income        1        0.0000        0.9987
                          age           1       13.1644        0.0003
                          male          1        4.2160        0.0400
 
 
                                Analysis of Parameter Estimates
 
                                     Standard   95% Confidence     Chi-
              Parameter  DF Estimate    Error       Limits       Square Pr > ChiSq
 
              Intercept   1   2.8237   0.8731   1.1124   4.5349   10.46     0.0012
              income      1  -0.0006   0.3477  -0.6820   0.6809    0.00     0.9987
              age         1  -0.1487   0.0410  -0.2290  -0.0684   13.16     0.0003
              male        1  -0.2579   0.1256  -0.5041  -0.0117    4.22     0.0400

The LOGISTIC procedure requires a normal probability distribution as a link function (/LINK=PROBIT or /LINK=NORMIT).

PROC LOGISTIC DATA = masil.students DESC;
   MODEL owncar = income age male /LINK=PROBIT;
RUN;

                                     The LOGISTIC Procedure
 
                                       Model Information
 
                         Data Set                      MASIL.STUDENTS
                         Response Variable             owncar
                         Number of Response Levels     2
                         Model                         binary probit
                         Optimization Technique        Fisher's scoring
 
 
                            Number of Observations Read         437
                            Number of Observations Used         437
 
 
                                        Response Profile
 
                               Ordered                      Total
                                 Value       owncar     Frequency
 
                                     1            1           284
                                     2            0           153
 
                                Probability modeled is owncar=1.
 
 
                                    Model Convergence Status
 
                         Convergence criterion (GCONV=1E-8) satisfied.
 
 
                                      Model Fit Statistics
 
                                                          Intercept
                                           Intercept            and
                             Criterion          Only     Covariates
 
                             AIC             567.930        555.635
                             SC              572.010        571.955
                             -2 Log L        565.930        547.635
 
 
                            Testing Global Null Hypothesis: BETA=0
 
                    Test                 Chi-Square       DF     Pr > ChiSq
 
                    Likelihood Ratio        18.2954        3         0.0004
                    Score                   17.4697        3         0.0006
                    Wald                    17.4690        3         0.0006
 
 
                           Analysis of Maximum Likelihood Estimates
 
                                             Standard          Wald
              Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
 
              Intercept     1     -2.8237      0.8796       10.3048        0.0013
              income        1    0.000548      0.3496        0.0000        0.9987
              age           1      0.1487      0.0413       12.9602        0.0003
              male          1      0.2579      0.1257        4.2096        0.0402
 
 
                 Association of Predicted Probabilities and Observed Responses
 
                       Percent Concordant     57.8    Somers' D    0.249
                       Percent Discordant     32.9    Gamma        0.274
                       Percent Tied            9.3    Tau-a        0.113
                       Pairs                 43452    c            0.624


Top

3.3 Using the GENMODE and QLIM Procedures

The GENMOD procedure also estimates the binary probit model using the /DIST=BINOMIAL and /LINK=PROBIT options in the MODEL statement.

PROC GENMOD DATA = masil.students DESC;
   MODEL owncar = income age male /DIST=BINOMIAL LINK=PROBIT;
RUN;

                                      The GENMOD Procedure
 
                                       Model Information
 
                              Data Set              MASIL.STUDENTS
                              Distribution                Binomial
                              Link Function                 Probit
                              Dependent Variable            owncar
 
 
                            Number of Observations Read         437
                            Number of Observations Used         437
                            Number of Events                    284
                            Number of Trials                    437
 
 
                                        Response Profile
 
                                 Ordered                  Total
                                   Value    owncar    Frequency
 
                                       1    1               284
                                       2    0               153
 
PROC GENMOD is modeling the probability that owncar='1'.
 
 
                             Criteria For Assessing Goodness Of Fit
 
                  Criterion                 DF           Value        Value/DF
 
                  Deviance                 433        547.6348          1.2647
                  Scaled Deviance          433        547.6348          1.2647
                  Pearson Chi-Square       433        437.0270          1.0093
                  Scaled Pearson X2        433        437.0270          1.0093
                  Log Likelihood                     -273.8174
 
          Algorithm converged.
 
 
                                Analysis Of Parameter Estimates
 
                                   Standard     Wald 95% Confidence       Chi-
    Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
 
    Intercept     1     -2.8237      0.8731     -4.5349     -1.1124      10.46        0.0012
    income        1      0.0006      0.3477     -0.6809      0.6820       0.00        0.9987
    age           1      0.1487      0.0410      0.0684      0.2290      13.16        0.0003
    male          1      0.2579      0.1256      0.0117      0.5041       4.22        0.0400
    Scale         0      1.0000      0.0000      1.0000      1.0000
 
NOTE: The scale parameter was held fixed.

The QLIM procedure provides various goodness-of-fit statistics. The DIST=NORMAL option indicates the normal probability distribution used in estimation.

PROC QLIM DATA=masil.students;
   MODEL owncar = income age male /DISCRETE (DIST=NORMAL);
RUN;

                                       The QLIM Procedure
 
                              Discrete Response Profile of owncar
 
                       Index         Value           Frequency    Percent
 
                         1             0                   153      35.01
                         2             1                   284      64.99
 
 
                                       Model Fit Summary
 
                          Number of Endogenous Variables             1
                          Endogenous Variable                   owncar
                          Number of Observations                   437
                          Log Likelihood                    -273.81741
                          Maximum Absolute Gradient         3.82848E-8
                          Number of Iterations                      10
                          AIC                                555.63482
                          Schwarz Criterion                  571.95456
 
 
                                    Goodness-of-Fit Measures
 
           Measure                      Value    Formula
 
           Likelihood Ratio (R)        18.295    2 * (LogL - LogL0)
           Upper Bound of R (U)        565.93    - 2 * LogL0
           Aldrich-Nelson              0.0402    R / (R+N)
           Cragg-Uhler 1                0.041    1 - exp(-R/N)
           Cragg-Uhler 2               0.0565    (1-exp(-R/N)) / (1-exp(-U/N))
           Estrella                    0.0417    1 - (1-R/U)^(U/N)
           Adjusted Estrella           0.0235    1 - ((LogL-K)/LogL0)^(-2/N*LogL0)
           McFadden's LRI              0.0323    R / U
           Veall-Zimmermann            0.0712    (R * (U+N)) / (U * (R+N))
           McKelvey-Zavoina            0.0702
 
           N = # of observations, K = # of regressors
 
Algorithm converged.

 
 
                                      Parameter Estimates
 
                                                 Standard                 Approx
                Parameter        Estimate           Error    t Value    Pr > |t|
 
                Intercept       -2.823671        0.873096      -3.23      0.0012
                income           0.000561        0.347684       0.00      0.9987
                age              0.148701        0.040984       3.63      0.0003
                male             0.257911        0.125608       2.05      0.0400


Top

3.4 Binary Probit in LIMDEP (Probit$)

The LIMDEP Probit$ command estimates various probit models. Do not forget to include the ONE for the intercept.

PROBIT;
   Lhs=owncar;
   Rhs=ONE,income,age,male$

Normal exit from iterations. Exit status=0.
 
+---------------------------------------------+
| Binomial Probit Model                       |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 17, 2005 at 10:28:56PM.|
| Dependent variable               OWNCAR     |
| Weighting variable                 None     |
| Number of observations              437     |
| Iterations completed                  4     |
| Log likelihood function       -273.8174     |
| Restricted log likelihood     -282.9651     |
| Chi squared                    18.29542     |
| Degrees of freedom                    3     |
| Prob[ChiSqd > value] =         .3822542E-03 |
| Hosmer-Lemeshow chi-squared =   8.18372     |
| P-value=  .41573 with deg.fr. =       8     |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
          Index function for probability
 Constant    -2.823670829       .87309548   -3.234   .0012
 INCOME    .5612515407E-03      .34768423     .002   .9987     .61683982
 AGE          .1487005234   .40983697E-01    3.628   .0003     20.691076
 MALE         .2579111914       .12560848    2.053   .0400     .57208238
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
 
+----------------------------------------+
| Fit Measures for Binomial Choice Model |
| Probit   model for variable OWNCAR     |
+----------------------------------------+
| Proportions P0= .350114   P1= .649886  |
| N =     437 N0=     153   N1=     284  |
| LogL =  -273.81741 LogL0 =  -282.9651  |
| Estrella = 1-(L/L0)^(-2L0/n) = .04166  |
+----------------------------------------+
|     Efron |  McFadden  |  Ben./Lerman  |
|    .03984 |    .03233  |       .56327  |
|    Cramer | Veall/Zim. |     Rsqrd_ML  |
|    .04016 |    .07121  |       .04100  |
+----------------------------------------+
| Information  Akaike I.C. Schwarz I.C.  |
| Criteria        1.27148     571.95456  |
+----------------------------------------+
Frequencies of actual & predicted outcomes
Predicted outcome has maximum probability.
Threshold value for predicting Y=1 = .5000
            Predicted
------  ----------  +  -----
Actual      0    1  |  Total
------  ----------  +  -----
  0         5  148  |    153
  1         8  276  |    284
------  ----------  +  -----
Total      13  424  |    437

3.5 Binary Probit in SPSS

SPSS has the Probit command to fit the binary probit model. This command requires a variable (e.g., n in the following example) with constant 1.

COMPUTE n=1. PROBIT owncar OF n WITH income age male
   /LOG NONE /MODEL PROBIT
   /PRINT FREQ /CRITERIA ITERATE(20) STEPLIMIT(.1).



Up: Table of Contents
Next: Bivariate Logit/Probit Models
Prev: The Binary Logit Model