7. The Conditional Logit Regression Model

Imagine a choice of the travel modes among air flight, train, bus, and car. The data set and model here are adopted from Greene (2003). The model examines how the generalized cost measure (cost), terminal waiting time (time), and household income (income) affect the choice.

These independent variables are not characteristics of subjects (individuals), but attributes of the alternatives. Thus, the data arrangement of the conditional logit model is different from that of the multinomial logit model (Figure 2).

Figure 2. Data Arrangement for the Conditional Logit Model

  +------------------------------------------------------------------------------+
  | subject   mode   choice   air   train   bus   cost   time   income   air_inc |
  |------------------------------------------------------------------------------|
  |       1      1        0     1       0     0     70     69       35        35 |
  |       1      2        0     0       1     0     71     34       35         0 |
  |       1      3        0     0       0     1     70     35       35         0 |
  |       1      4        1     0       0     0     30      0       35         0 |
  |       2      1        0     1       0     0     68     64       30        30 |
  |------------------------------------------------------------------------------|
  |       2      2        0     0       1     0     84     44       30         0 |
  |       2      3        0     0       0     1     85     53       30         0 |
  |       2      4        1     0       0     0     50      0       30         0 |
  |       3      1        0     1       0     0    129     69       40        40 |
  |       3      2        0     0       1     0    195     34       40         0 |
  …       …      …        …     …       …     …      …      …        …         … …

The example data set has four observations per subject, each of which contains attributes of using air flight, train, bus, and car. The dependent variable choice is coded 1 only if a subject chooses that travel mode. The four dummy variables, air, train, bus, and car, are flagging the corresponding modes of transportation. See the appendix for details about the data set.

Top

7.1 Conditional Logit in STATA (.clogit)

STATA has the .clogit command to estimate the condition logit model. The group() option specifies the variable (e.g., identification number) that identifies unique individuals.

. clogit choice air train bus cost time air_inc, group(subject)

Iteration 0:   log likelihood =  -205.8187 
Iteration 1:   log likelihood = -199.23679 
Iteration 2:   log likelihood = -199.12851 
Iteration 3:   log likelihood = -199.12837 
Iteration 4:   log likelihood = -199.12837 
 
Conditional (fixed-effects) logistic regression   Number of obs   =        840
                                                  LR chi2(6)      =     183.99
                                                  Prob > chi2     =     0.0000
Log likelihood = -199.12837                       Pseudo R2       =     0.3160
 
------------------------------------------------------------------------------
      choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         air |   5.207443   .7790551     6.68   0.000     3.680523    6.734363
       train |   3.869043   .4431269     8.73   0.000      3.00053    4.737555
         bus |   3.163194   .4502659     7.03   0.000     2.280689    4.045699
        cost |  -.0155015    .004408    -3.52   0.000     -.024141    -.006862
        time |  -.0961248   .0104398    -9.21   0.000    -.1165865   -.0756631
     air_inc |    .013287   .0102624     1.29   0.195    -.0068269     .033401
--------------------------------------------------------------------------------------

Let us run the .listcoef command to compute factor changes in odds. For a one unit increase in the waiting time for a given travel mode, for example, we can expect a decrease in the odds of using that travel by 9 percent (or a factor of .9084), holding other variables constant.

. listcoef

clogit (N=840): Factor Change in Odds
 
  Odds of: 1 vs 0
 
--------------------------------------------------
      choice |      b         z     P>|z|    e^b 
-------------+------------------------------------
         air |   5.20744    6.684   0.000 182.6265
       train |   3.86904    8.731   0.000  47.8965
         bus |   3.16319    7.025   0.000  23.6460
        cost |  -0.01550   -3.517   0.000   0.9846
        time |  -0.09612   -9.207   0.000   0.9084
     air_inc |   0.01329    1.295   0.195   1.0134
--------------------------------------------------

Top

7.2 Conditional Logit in SAS

SAS has the MDC procedure to fit the conditional logit model. The TYPE=CLOGIT indicates the conditional logit model; the ID statement specifies the identification variable; and the NCHOICE=4 tells that there are four choices of the travel mode.

PROC MDC DATA=masil.travel;
   MODEL choice = air train bus cost time air_inc /TYPE=CLOGIT NCHOICE=4;
   ID subject;
RUN;

                                       The MDC Procedure
 
                                   Conditional Logit Estimates
 
Algorithm converged.
 
 
                                        Model Fit Summary
 
                           Dependent Variable                   choice
                           Number of Observations                  210
                           Number of Cases                         840
                           Log Likelihood                   -199.12837
                           Maximum Absolute Gradient        2.73152E-8
                           Number of Iterations                      5
                           Optimization Method          Newton-Raphson
                           AIC                               410.25674
                           Schwarz Criterion                 430.33938
 
 
                                   Discrete Response Profile
 
                            Index    CHOICE     Frequency    Percent
 
                              0           1            58      27.62
                              1           2            63      30.00
                              2           3            30      14.29
                              3           4            59      28.10
 
 
                                    Goodness-of-Fit Measures
 
           Measure                       Value    Formula
 
           Likelihood Ratio (R)         183.99    2 * (LogL - LogL0)
           Upper Bound of R (U)         582.24    - 2 * LogL0
           Aldrich-Nelson                0.467    R / (R+N)
           Cragg-Uhler 1                0.5836    1 - exp(-R/N)
           Cragg-Uhler 2                0.6225    (1-exp(-R/N)) / (1-exp(-U/N))
           Estrella                     0.6511    1 - (1-R/U)^(U/N)
           Adjusted Estrella            0.6212    1 - ((LogL-K)/LogL0)^(-2/N*LogL0)
           McFadden's LRI                0.316    R / U
           Veall-Zimmermann             0.6354    (R * (U+N)) / (U * (R+N))
 
           N = # of observations, K = # of regressors
 
 
                                  Conditional Logit Estimates
 
                                      Parameter Estimates
 
                                                 Standard                 Approx
               Parameter     DF     Estimate        Error    t Value    Pr > |t|
 
               air            1       5.2074       0.7791       6.68     <.0001
               train          1       3.8690       0.4431       8.73     <.0001
               bus            1       3.1632       0.4503       7.03     <.0001
               cost           1      -0.0155     0.004408      -3.52     0.0004
               time           1      -0.0961       0.0104      -9.21     <.0001
               air_inc        1       0.0133       0.0103       1.29     0.1954

Alternatively, you may use the PHREG procedure that estimates the Cox proportional hazards model for survival data and the conditional logit model.

In order to make the data set consistent with the survival analysis data, you need to create a failure time variable, failure=1–choice. The identification variable is specified in the STRATA statement. The NOSUMMARY option suppresses the display of the event and censored observation frequencies.

PROC PHREG DATA=masil.travel NOSUMMARY;
   STRATA subject;
   MODEL failure*choice(0)=air train bus cost time air_inc;
RUN;

                                      The PHREG Procedure
 
                                       Model Information
 
                             Data Set                 MASIL.TRAVEL
                             Dependent Variable       failure
                             Censoring Variable       choice
                             Censoring Value(s)       0
                             Ties Handling            BRESLOW
 
 
                            Number of Observations Read         840
                            Number of Observations Used         840
 
 
                                       Convergence Status
 
                         Convergence criterion (GCONV=1E-8) satisfied.
 
 
                                     Model Fit Statistics
 
                                             Without           With
                            Criterion     Covariates     Covariates
 
                            -2 LOG L         582.244        398.257
                            AIC              582.244        410.257
                            SBC              582.244        430.339
 
 
                            Testing Global Null Hypothesis: BETA=0
 
                    Test                 Chi-Square       DF     Pr > ChiSq
 
                    Likelihood Ratio       183.9869        6         <.0001
                    Score                  173.4374        6         <.0001
                    Wald                   103.7695        6         <.0001
 
 
                            Analysis of Maximum Likelihood Estimates
 
                          Parameter      Standard                                  Hazard
       Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio
 
       air          1       5.20743       0.77905       44.6799        <.0001     182.625
       train        1       3.86904       0.44313       76.2343        <.0001      47.896
       bus          1       3.16319       0.45027       49.3530        <.0001      23.646
       cost         1      -0.01550       0.00441       12.3671        0.0004       0.985
       time         1      -0.09612       0.01044       84.7778        <.0001       0.908
       air_inc      1       0.01329       0.01026        1.6763        0.1954       1.013

While the MDC procedure reports t statistics, the PHREG procedure computes chi-squared (e.g., 12.3671=-3.52^2). The PHREG presents the hazard ratio at the last column of the output, which is equivalent to the factor changes under the e^b column of the SPost .listcoef command.

Top

7.3 Conditional Logit in LIMDEP (Clogit$)

LIMDEP fits the conditional logit model using either the Clogit$ or the Logit$ command. The Clogit$ command has the Choices$ subcommand to list the choices available.

CLOGIT;
   Lhs=choice;
   Rhs=air,train,bus,cost,time,air_inc;
   Choices=air,train,bus,car$

Normal exit from iterations. Exit status=0.
 
+---------------------------------------------+
| Discrete choice (multinomial logit) model   |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 19, 2005 at 09:20:39PM.|
| Dependent variable               Choice     |
| Weighting variable                 None     |
| Number of observations              210     |
| Iterations completed                  6     |
| Log likelihood function       -199.1284     |
| Log-L for Choice   model =   -199.12837     |
| R2=1-LogL/LogL*  Log-L fncn  R-sqrd  RsqAdj |
| Constants only    -283.7588  .29825  .29150 |
| Response data are given as ind. choice.     |
| Number of obs.=   210, skipped   0 bad obs. |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
 AIR          5.207443299       .77905514    6.684   .0000
 TRAIN        3.869042702       .44312685    8.731   .0000
 BUS          3.163194212       .45026593    7.025   .0000
 COST     -.1550152532E-01  .44079931E-02   -3.517   .0004
 TIME     -.9612479610E-01  .10439847E-01   -9.207   .0000
 AIR_INC   .1328702625E-01  .10262407E-01    1.295   .1954
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

The Clogit$ command has the Ias$ subcommand to conduct the Hausman test for the IIA assumption (e.g., Ias=air,bus$). Unfortunately, the subcommand does not work in this model because the Hessian is not positive definite.

The Logit$ command takes the panel data analysis approach. The Pds$ subcommand specifies the number of time periods. The two commands produce the same result.

LOGIT;
   Lhs=choice;
   Rhs=air,train,bus,cost,time,air_inc;
   Pds=4$

              +--------------------------------------------------+
              | Panel Data Binomial Logit Model                  |
              | Number of individuals          =     210         |
              | Number of periods              =       4         |
              | Conditioning event is the sum of CHOICE          |
              | Distribution of sums over the  4 periods:        |
              | Sum        0     1     2     3     4     5     6 |
              | Number     0   210     0     0     0     5    10 |
              | Pct.     .00100.00   .00   .00   .00   .00   .00 |
              +--------------------------------------------------+
Normal exit from iterations. Exit status=0.
 
+---------------------------------------------+
| Logit Model for Panel Data                  |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 19, 2005 at 09:21:58PM.|
| Dependent variable               CHOICE     |
| Weighting variable                 None     |
| Number of observations              840     |
| Iterations completed                  6     |
| Log likelihood function       -199.1284     |
| Hosmer-Lemeshow chi-squared = 251.24482     |
| P-value=  .00000 with deg.fr. =       8     |
| Fixed Effects Logit Model for Panel Data    |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
 AIR          5.207443299       .77905514    6.684   .0000
 TRAIN        3.869042702       .44312685    8.731   .0000
 BUS          3.163194212       .45026593    7.025   .0000
 COST     -.1550152532E-01  .44079931E-02   -3.517   .0004
 TIME     -.9612479610E-01  .10439847E-01   -9.207   .0000
 AIR_INC   .1328702625E-01  .10262407E-01    1.295   .1954
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

Top

7.4 Conditional Logit in SPSS

Like the SAS PHREG procedure, the SPSS Coxreg command, which was designed for survival analysis data, provides a backdoor way of estimating the conditional logit model.

COXREG failure WITH air train bus cost time air_inc
   /STATUS=choice(1)
   /STRATA=subject.



Up: Table of Contents
Next: The Nested Logit Model
Prev: The Multinomial Logit Model