Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

### 7. The Conditional Logit Regression Model

Imagine a choice of the travel modes among air flight, train, bus, and car. The data set and model here are adopted from Greene (2003). The model examines how the generalized cost measure (cost), terminal waiting time (time), and household income (income) affect the choice.

These independent variables are not characteristics of subjects (individuals), but attributes of the alternatives. Thus, the data arrangement of the conditional logit model is different from that of the multinomial logit model (Figure 2).

Figure 2. Data Arrangement for the Conditional Logit Model

+------------------------------------------------------------------------------+
| subject   mode   choice   air   train   bus   cost   time   income   air_inc |
|------------------------------------------------------------------------------|
|       1      1        0     1       0     0     70     69       35        35 |
|       1      2        0     0       1     0     71     34       35         0 |
|       1      3        0     0       0     1     70     35       35         0 |
|       1      4        1     0       0     0     30      0       35         0 |
|       2      1        0     1       0     0     68     64       30        30 |
|------------------------------------------------------------------------------|
|       2      2        0     0       1     0     84     44       30         0 |
|       2      3        0     0       0     1     85     53       30         0 |
|       2      4        1     0       0     0     50      0       30         0 |
|       3      1        0     1       0     0    129     69       40        40 |
|       3      2        0     0       1     0    195     34       40         0 |

The example data set has four observations per subject, each of which contains attributes of using air flight, train, bus, and car. The dependent variable choice is coded 1 only if a subject chooses that travel mode. The four dummy variables, air, train, bus, and car, are flagging the corresponding modes of transportation. See the appendix for details about the data set.

Top

7.1 Conditional Logit in STATA (.clogit)

STATA has the .clogit command to estimate the condition logit model. The group() option specifies the variable (e.g., identification number) that identifies unique individuals.

. clogit choice air train bus cost time air_inc, group(subject)

Iteration 0:   log likelihood =  -205.8187
Iteration 1:   log likelihood = -199.23679
Iteration 2:   log likelihood = -199.12851
Iteration 3:   log likelihood = -199.12837
Iteration 4:   log likelihood = -199.12837

Conditional (fixed-effects) logistic regression   Number of obs   =        840
LR chi2(6)      =     183.99
Prob > chi2     =     0.0000
Log likelihood = -199.12837                       Pseudo R2       =     0.3160

------------------------------------------------------------------------------
choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
air |   5.207443   .7790551     6.68   0.000     3.680523    6.734363
train |   3.869043   .4431269     8.73   0.000      3.00053    4.737555
bus |   3.163194   .4502659     7.03   0.000     2.280689    4.045699
cost |  -.0155015    .004408    -3.52   0.000     -.024141    -.006862
time |  -.0961248   .0104398    -9.21   0.000    -.1165865   -.0756631
air_inc |    .013287   .0102624     1.29   0.195    -.0068269     .033401
--------------------------------------------------------------------------------------

Let us run the .listcoef command to compute factor changes in odds. For a one unit increase in the waiting time for a given travel mode, for example, we can expect a decrease in the odds of using that travel by 9 percent (or a factor of .9084), holding other variables constant.

. listcoef

clogit (N=840): Factor Change in Odds

Odds of: 1 vs 0

--------------------------------------------------
choice |      b         z     P>|z|    e^b
-------------+------------------------------------
air |   5.20744    6.684   0.000 182.6265
train |   3.86904    8.731   0.000  47.8965
bus |   3.16319    7.025   0.000  23.6460
cost |  -0.01550   -3.517   0.000   0.9846
time |  -0.09612   -9.207   0.000   0.9084
air_inc |   0.01329    1.295   0.195   1.0134
--------------------------------------------------

Top

7.2 Conditional Logit in SAS

SAS has the MDC procedure to fit the conditional logit model. The TYPE=CLOGIT indicates the conditional logit model; the ID statement specifies the identification variable; and the NCHOICE=4 tells that there are four choices of the travel mode.

PROC MDC DATA=masil.travel;
MODEL choice = air train bus cost time air_inc /TYPE=CLOGIT NCHOICE=4;
ID subject;
RUN;

The MDC Procedure

Conditional Logit Estimates

Algorithm converged.

Model Fit Summary

Dependent Variable                   choice
Number of Observations                  210
Number of Cases                         840
Log Likelihood                   -199.12837
Number of Iterations                      5
Optimization Method          Newton-Raphson
AIC                               410.25674
Schwarz Criterion                 430.33938

Discrete Response Profile

Index    CHOICE     Frequency    Percent

0           1            58      27.62
1           2            63      30.00
2           3            30      14.29
3           4            59      28.10

Goodness-of-Fit Measures

Measure                       Value    Formula

Likelihood Ratio (R)         183.99    2 * (LogL - LogL0)
Upper Bound of R (U)         582.24    - 2 * LogL0
Aldrich-Nelson                0.467    R / (R+N)
Cragg-Uhler 1                0.5836    1 - exp(-R/N)
Cragg-Uhler 2                0.6225    (1-exp(-R/N)) / (1-exp(-U/N))
Estrella                     0.6511    1 - (1-R/U)^(U/N)
Adjusted Estrella            0.6212    1 - ((LogL-K)/LogL0)^(-2/N*LogL0)
McFadden's LRI                0.316    R / U
Veall-Zimmermann             0.6354    (R * (U+N)) / (U * (R+N))

N = # of observations, K = # of regressors

Conditional Logit Estimates

Parameter Estimates

Standard                 Approx
Parameter     DF     Estimate        Error    t Value    Pr > |t|

air            1       5.2074       0.7791       6.68     <.0001
train          1       3.8690       0.4431       8.73     <.0001
bus            1       3.1632       0.4503       7.03     <.0001
cost           1      -0.0155     0.004408      -3.52     0.0004
time           1      -0.0961       0.0104      -9.21     <.0001
air_inc        1       0.0133       0.0103       1.29     0.1954

Alternatively, you may use the PHREG procedure that estimates the Cox proportional hazards model for survival data and the conditional logit model.

In order to make the data set consistent with the survival analysis data, you need to create a failure time variable, failure=1choice. The identification variable is specified in the STRATA statement. The NOSUMMARY option suppresses the display of the event and censored observation frequencies.

PROC PHREG DATA=masil.travel NOSUMMARY;
STRATA subject;
MODEL failure*choice(0)=air train bus cost time air_inc;
RUN;

The PHREG Procedure

Model Information

Data Set                 MASIL.TRAVEL
Dependent Variable       failure
Censoring Variable       choice
Censoring Value(s)       0
Ties Handling            BRESLOW

Number of Observations Used         840

Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Without           With
Criterion     Covariates     Covariates

-2 LOG L         582.244        398.257
AIC              582.244        410.257
SBC              582.244        430.339

Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio       183.9869        6         <.0001
Score                  173.4374        6         <.0001
Wald                   103.7695        6         <.0001

Analysis of Maximum Likelihood Estimates

Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

air          1       5.20743       0.77905       44.6799        <.0001     182.625
train        1       3.86904       0.44313       76.2343        <.0001      47.896
bus          1       3.16319       0.45027       49.3530        <.0001      23.646
cost         1      -0.01550       0.00441       12.3671        0.0004       0.985
time         1      -0.09612       0.01044       84.7778        <.0001       0.908
air_inc      1       0.01329       0.01026        1.6763        0.1954       1.013

While the MDC procedure reports t statistics, the PHREG procedure computes chi-squared (e.g., 12.3671=-3.52^2). The PHREG presents the hazard ratio at the last column of the output, which is equivalent to the factor changes under the e^b column of the SPost .listcoef command.

Top

7.3 Conditional Logit in LIMDEP (Clogit\$)

LIMDEP fits the conditional logit model using either the Clogit\$ or the Logit\$ command. The Clogit\$ command has the Choices\$ subcommand to list the choices available.

CLOGIT;
Lhs=choice;
Rhs=air,train,bus,cost,time,air_inc;
Choices=air,train,bus,car\$

Normal exit from iterations. Exit status=0.

+---------------------------------------------+
| Discrete choice (multinomial logit) model   |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 19, 2005 at 09:20:39PM.|
| Dependent variable               Choice     |
| Weighting variable                 None     |
| Number of observations              210     |
| Iterations completed                  6     |
| Log likelihood function       -199.1284     |
| Log-L for Choice   model =   -199.12837     |
| R2=1-LogL/LogL*  Log-L fncn  R-sqrd  RsqAdj |
| Constants only    -283.7588  .29825  .29150 |
| Response data are given as ind. choice.     |
| Number of obs.=   210, skipped   0 bad obs. |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
AIR          5.207443299       .77905514    6.684   .0000
TRAIN        3.869042702       .44312685    8.731   .0000
BUS          3.163194212       .45026593    7.025   .0000
COST     -.1550152532E-01  .44079931E-02   -3.517   .0004
TIME     -.9612479610E-01  .10439847E-01   -9.207   .0000
AIR_INC   .1328702625E-01  .10262407E-01    1.295   .1954
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

The Clogit\$ command has the Ias\$ subcommand to conduct the Hausman test for the IIA assumption (e.g., Ias=air,bus\$). Unfortunately, the subcommand does not work in this model because the Hessian is not positive definite.

The Logit\$ command takes the panel data analysis approach. The Pds\$ subcommand specifies the number of time periods. The two commands produce the same result.

LOGIT;
Lhs=choice;
Rhs=air,train,bus,cost,time,air_inc;
Pds=4\$

+--------------------------------------------------+
| Panel Data Binomial Logit Model                  |
| Number of individuals          =     210         |
| Number of periods              =       4         |
| Conditioning event is the sum of CHOICE          |
| Distribution of sums over the  4 periods:        |
| Sum        0     1     2     3     4     5     6 |
| Number     0   210     0     0     0     5    10 |
| Pct.     .00100.00   .00   .00   .00   .00   .00 |
+--------------------------------------------------+
Normal exit from iterations. Exit status=0.

+---------------------------------------------+
| Logit Model for Panel Data                  |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 19, 2005 at 09:21:58PM.|
| Dependent variable               CHOICE     |
| Weighting variable                 None     |
| Number of observations              840     |
| Iterations completed                  6     |
| Log likelihood function       -199.1284     |
| Hosmer-Lemeshow chi-squared = 251.24482     |
| P-value=  .00000 with deg.fr. =       8     |
| Fixed Effects Logit Model for Panel Data    |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
AIR          5.207443299       .77905514    6.684   .0000
TRAIN        3.869042702       .44312685    8.731   .0000
BUS          3.163194212       .45026593    7.025   .0000
COST     -.1550152532E-01  .44079931E-02   -3.517   .0004
TIME     -.9612479610E-01  .10439847E-01   -9.207   .0000
AIR_INC   .1328702625E-01  .10262407E-01    1.295   .1954
(Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

Top

7.4 Conditional Logit in SPSS

Like the SAS PHREG procedure, the SPSS Coxreg command, which was designed for survival analysis data, provides a backdoor way of estimating the conditional logit model.

COXREG failure WITH air train bus cost time air_inc
/STATUS=choice(1)
/STRATA=subject.