Stat/Math
Software Support
Software Consulting
Software Availability
Software Price
Contact

User Support
Documentation
Knowledge Base
Education
Consulting
Podcasts

Systems & Services
Cyberinfrastructure
Supercomputers
Grid Computing
Storage
Visualization
Digital Libraries & Data

Results & Impact
Publications
Grants & Grant Info
Events & Outreach
Economic Impact
Survey Results

Vision & Planning
News & Features

## Logistic Regression with SAS

#### LOGISTIC Procedure

Suppose the response variable Y is 0 or 1 binary (This is not a limitation. The values can be either numeric or character as long as they are dichotomous), and X1 and X2 are two regressors of interest. To fit a logistic regression, you can use:

proc logistic; model y=x1 x2; run;

SAS PROC LOGISTIC models the probability of Y=0 by default. In other words, SAS chooses the smaller value to estimate its probability. One way to change the default setting in order to model the probability of Y=1 in SAS is to specify the DESCENDING option on the PROC LOGISTIC statement. That is, use:

proc logistic descending;
Example 1: SAS Logistic Regression in PROC LOGISTIC (individual data)

The following data are from Cox (Cox, D. R., 1970. The Analysis of Binary Data, London, Methuen, p. 86). At the specified time (T) of heating, a number of ingots are tested for some temperature settings and whether an ingot is ready or not (S) for rolling is recorded. S=0 means not ready and S=1 means ready. You want to know if the time of heating affects whether an ingot is ready or not for rolling.

```             T      S
1      7      1
2      7      1
.
.
55      7      1
1     14      0
2     14      0
3     14      1
4     14      1
.
.
157     14      1
1     27      0
2     27      0
.
.
7     27      0
8     27      1
9     27      1
.
.
159     27      1
1     51      0
2     51      0
3     51      0
4     51      1
.
.
16     51      1
```

With this data set INGOT, you can use:

```  proc logistic data=ingot;
model s=t;
run;
```

As a result, you will have the following SAS output:

```                      Sample Program: Logistic Regression

The LOGISTIC Procedure

Data Set: WORK.INGOT
Response Variable: S
Response Levels: 2
Number of Observations: 387

Response Profile

Ordered
Value       S     Count

1       0        12
2       1       375

Model Fitting Information and Testing Global Null Hypothesis BETA=0

Intercept
Intercept        and
Criterion       Only       Covariates    Chi-Square for Covariates

AIC             108.988        99.375         .
SC              112.947       107.291         .
-2 LOG L        106.988        95.375       11.614 with 1 DF (p=0.0007)
Score              .             .          15.100 with 1 DF (p=0.0001)

Analysis of Maximum Likelihood Estimates

Parameter Standard    Wald       Pr >    Standardized     Odds
Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

INTERCPT 1    -5.4152   0.7275    55.4005     0.0001            .     .
T        1     0.0807   0.0224    13.0290     0.0003     0.442056    1.084

Association of Predicted Probabilities and Observed Responses

Concordant = 59.2%          Somers' D = 0.499
Discordant =  9.4%          Gamma     = 0.727
Tied       = 31.4%          Tau-a     = 0.030
(4500 pairs)                c         = 0.749
```

The result shows that the estimated logit is

where p is the probability of having an ingot not ready for rolling. The slope coefficient 0.0807 represents the change in log odds for a one unit increase in T (time of heating). Its odds ratio 1.084 is the ratio of odds for a one unit change in T. The odds ratio can be computed by exponentiating the log odds, i.e., exp(log odds), which is exp(0.0807)=1.084 in this example.

If you had used the DESCENDING option:

```  proc logistic descending;
model s=t;
run;
```

it would have yielded the following estimated logit:

where p is the probability of having an ingot ready for rolling.

You may have the same data set arranged in the following frequency format:

```         T     S      F
7     1     55
14     0      2
14     1    155
27     0      7
27     1    152
51     0      3
51     1     13
```

In this case, to have the same output as above, you can use the syntax:

```  proc logistic;
freq f;
model s=t;
run;
```

The LOGISTIC procedure also allows the input of binary response data that are grouped so that you can use:

```  proc logistic;
model r/n=x1 x2;
run;
```

where N represents the number of trials and R represents the number of events.

Example 2: SAS Logistic Regression in PROC LOGISTIC (grouped data)

The data set described in the previous example can be arranged in a different way. At the specified time(T) of heating, the number of ingots (N) tested and the number (R) not ready for rolling can be recorded. Now you have:

```         T     R      N
7     0     55
14     2    157
27     7    159
51     3     16
```

With this data set INGOT2, you can use:

```  proc logistic data=ingot2;
model r/n=t;
run;
```

The SAS output will be:

```                      Sample Program: Logistic Regression

The LOGISTIC Procedure

Data Set: WORK.INGOT2
Response Variable (Events): R
Response Variable (Trials): N
Number of Observations: 4
Response Profile

Ordered  Binary
Value  Outcome      Count

1  EVENT           12
2  NO EVENT       375

Model Fitting Information and Testing Global Null Hypothesis BETA=0

Intercept
Intercept        and
Criterion       Only       Covariates    Chi-Square for Covariates

AIC             108.988        99.375         .
SC              112.947       107.291         .
-2 LOG L        106.988        95.375       11.614 with 1 DF (p=0.0007)
Score              .             .          15.100 with 1 DF (p=0.0001)

Analysis of Maximum Likelihood Estimates

Parameter Standard    Wald       Pr >    Standardized     Odds
Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

INTERCPT 1    -5.4152   0.7275    55.4005     0.0001            .     .
T        1     0.0807   0.0224    13.0290     0.0003     0.442056    1.084

Association of Predicted Probabilities and Observed Responses

Concordant = 59.2%          Somers' D = 0.499
Discordant =  9.4%          Gamma     = 0.727
Tied       = 31.4%          Tau-a     = 0.030
(4500 pairs)                c         = 0.749
```

Sometimes you may be interested in the change in log odds, and thus the corresponding change in odds ratio for some amount other than one unit change in the explanatory variable. In this case, you can customize your own odds calculation. You can use the UNITS option:

```  proc logistic;
model y=x1 x2;
units x1=list;
run;
```

where list represents a list of units in change that are of interest for the variable X1. Each unit of change in a list has one of the following forms:

```  number
SD or -SD
number*SD
```

where number is any non-zero number and SD is the sample standard deviation of the corresponding independent variable X1.

Example 3: Customized Odds Computation

Using the same data set in Example 2, if you use:

```  proc logistic data=ingot2;
model r/n=t;
units t=10 -10 sd 2*sd;
run;
```

you will have the following result in addition to the output in Example 2:

```                             Conditional Odds Ratio

Odds
Variable        Unit       Ratio

T            10.0000       2.241
T           -10.0000       0.446
T             9.9361       2.230
T            19.8721       4.971
```

In this example, you calculated four different odd ratio, each corresponding to change in 10 unit increase, 10 unit decrease, 1 standard deviation increase, and 2 standard deviation increase in T, respectively.

From the SAS PROC LOGISTIC output, you can also obtain predicted probability values. Suppose you want to know the predicted probabilities of having an ingot not ready for rolling (Y=0) at each level of time of heating in the data set from Example 2. The predicted probability, p, can be computed from the formula:

Thus, for example, at T=7,

This computation can be easily obtained as a part of the SAS output by using the OUTPUT statement and PRINT procedure:

```  proc logistic;
model r/n=x1 x2;
output out=filename predicted=varname;
run;
proc print data=filename;
run;
```

where filename is the output data set name and varname is the variable name for predicted probabilities. The SAS output will show all the predicted probabilities for all observation points.

However, if you need to know the predicted probabilities at some levels of explanatory variables other than levels the data set provides, you need to do something different. You need to create a new SAS data set with missing values for the response variable. Then you merge the new data with the original data and run the logistic regression using the merged data set. Because the new data set has missing values for the response variable, they do not affect the model fit. But the predicted probabilities will be also calculated for the new observations.

Example 4: Predicted Probability Computation

Using the data in Example 2, if you use:

```  proc logistic data=ingot2;
model r/n=t;
output out=prob predicted=phat;
run;
proc print data=prob;
run;
```

you will have the following additional result to the output in Example 2:

```                      Sample Program: Logistic Regression

OBS     T    R     N       PHAT

1      7    0     55    0.00777
2     14    2    157    0.01358
3     27    7    159    0.03782
4     51    3     16    0.21422
```

Now suppose you want to compute the predicted probabilities at T=10,20,30,40,50, and 60. You can use the following syntax:

```  data ingot2;
input t r n;
cards;
7 0  55
14 2 157
27 7 159
51 3  16
;
data new;
input t @@;
r=.;
n=.;
cards;
10 20 30 40 50 60
;
data merged;
set ingot2 new;
run;
proc logistic data=merged;
model r/n=t;
output out=prob predicted=phat;
run;
proc print data=prob;
run;
```

You will have the following additional output to show the predicted probability at each level of T of interest:

```                      Sample Program: Logistic Regression

OBS     T    R     N       PHAT

1     7    0     55    0.00777
2    14    2    157    0.01358
3    27    7    159    0.03782
4    51    3     16    0.21422
5    10    .      .    0.00987
6    20    .      .    0.02185
7    30    .      .    0.04768
8    40    .      .    0.10089
9    50    .      .    0.20095
10    60    .      .    0.36045
```

#### PROBIT Procedure

You can even use the PROC PROBIT to fit a logistic regression by specifying LOGISTIC as the cumulative distribution type in the MODEL statement. To fit a logistic regression model, use:

```  proc probit;
class y;
model y=x1 x2 / d=logistic;
run;
```

or

```  proc probit;
model r/n=x1 x2 / d=logistic;
run;
```

depending on your data set. If a single response variable is given in the MODEL statement, it must be listed in a CLASS statement. Unlike the PROC LOGISTIC, the PROC PROBIT is capable of dealing with categorical variables as regressors as shown in the following syntax:

```  proc probit;
class x2;
model r/n=x1 x2 / d=logistic;
run;
```

where X2 is a categorical regressor.

Example 5: SAS Logistic Regression in PROC PROBIT

Using the data in Example 2, you may use:

```  proc probit data=ingot2;
model r/n=t / d=logistic;
run;
```

The resulting SAS output will be:

```                     Sample Program: Logistic Regression

Probit Procedure

Data Set          =WORK.INGOT2
Dependent Variable=R
Dependent Variable=N
Number of Observations=   4
Number of Events      =      12    Number of Trials =      387

Log Likelihood for LOGISTIC -47.68727905

Probit Procedure

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 -5.4151721 0.727541  55.40004  0.0001 Intercept
T          1 0.08069587 0.022356  13.02885  0.0003

Probit Model in Terms of Tolerance Distribution

MU         SIGMA
67.10594      12.39221

Estimated Covariance Matrix for Tolerance Parameters

MU             SIGMA

MU        121.813302         35.655509
SIGMA         35.655509         11.786672
```

#### GENMOD Procedure

The GENMOD procedure fits generalized linear models (Nelder and Wedderburn, 1972, "Generalized Linear Models," Journal of the Royal Statistical Society A, 135, pp. 370-384). Logistic regression can be modeled as a class of generalized linear model where the response probability distribution function is binomial and the link function is logit. To use PROC GENMOD for a logistic regression, you can use:

```  proc genmod;
model y=x1 x2 / dist=binomial link=logit;
run;
```

or

```  proc genmod;
model r/n=x1 x2 / dist=binomial link=logit;
run;
```
Example 6: SAS Logistic Regression in PROC GENMOD

Using the data in Example 2, you may use:

```  proc genmod data=ingot2;
run;
```

You will have the following SAS output:

```                       Sample Program: Logistic Regression

The GENMOD Procedure

Model Information

Description                     Value

Data Set                        WORK.INGOT2
Distribution                    BINOMIAL
Dependent Variable              R
Dependent Variable              N
Observations Used               4
Number Of Events                12
Number Of Trials                387

Criteria For Assessing Goodness Of Fit

Criterion             DF         Value      Value/DF

Deviance               2        1.0962        0.5481
Scaled Deviance        2        1.0962        0.5481
Pearson Chi-Square     2        0.6749        0.3374
Scaled Pearson X2      2        0.6749        0.3374
Log Likelihood         .      -47.6873             .

Analysis Of Parameter Estimates

Parameter    DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT     1     -5.4152      0.7275     55.4000  0.0001
T             1      0.0807      0.0224     13.0289  0.0003
SCALE         0      1.0000      0.0000           .       .

NOTE:  The scale parameter was held fixed.
```

PROC GENMOD is especially convenient when you need to use categorical or class variables as regressors. In this case, you can use:

```  proc genmod;
class x2;
model y=x1 x2 / dist=binomial link=logit;
run;
```

where X2 is a categorical regressor.

Example 7: SAS Logistic Regression in PROC GENMOD (categorical regressors)

This example is excerpted from a SAS manual (SAS, 1996, SAS/STAT Software Changes and Enhancements through Release 6.11, pp. 279-284). In an experiment comparing the effects of five different drugs, each drug was tested on a number of different's ubjects. The outcome of each experiment was the presence or absence of a positive response in a subject. The following data represent the number of responses R in the N subjects for the five different drugs, labeled A through E. The response is measured for different levels of a continuous covariate X for each drug. The drug type and the covariate X are explanatory variables in this experiment. The number of response R is modeled as a binomial random variable for each combination of the explanatory variable values, with the binomial number of trials parameter equal to the number of subjects N and the binomial probability equal to the probability of a response. The following DATA step creates the data set DRUG:

```  data drug;
input drug\$ x r n;
cards;
A  .1   1  10
A  .23  2  12
A  .67  1   9
B  .2   3  13
B  .3   4  15
B  .45  5  16
B  .78  5  13
C  .04  0  10
C  .15  0  11
C  .56  1  12
C  .7   2  12
D  .34  5  10
D  .6   5   9
D  .7   8  10
E  .2  12  20
E  .34 15  20
E  .56 13  15
E  .8  17  20
;
```

A logistic regression for these data is a generalized linear model with response equal to the binomial proportion R/N. PROC GENMOD can be used as follows:

```  proc genmod data=drug;
class drug;
model r/n=x drug / dist=binomial link=logit;
run;
```

You will have the SAS output:

```                 Sample Program: Logistic Regression

The GENMOD Procedure

Model Information

Description                     Value

Data Set                        WORK.DRUG
Distribution                    BINOMIAL
Dependent Variable              R
Dependent Variable              N
Observations Used               18
Number Of Events                99
Number Of Trials                237

Class Level Information

Class     Levels  Values

DRUG           5  A B C D E

Criteria For Assessing Goodness Of Fit

Criterion             DF         Value      Value/DF

Deviance              12        5.2751        0.4396
Scaled Deviance       12        5.2751        0.4396
Pearson Chi-Square    12        4.5133        0.3761
Scaled Pearson X2     12        4.5133        0.3761
Log Likelihood         .     -114.7732             .

Analysis Of Parameter Estimates

Parameter       DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT        1      0.2792      0.4196      0.4430  0.5057
X                1      1.9794      0.7660      6.6770  0.0098
DRUG       A     1     -2.8955      0.6092     22.5894  0.0001
DRUG       B     1     -2.0162      0.4052     24.7628  0.0001
DRUG       C     1     -3.7952      0.6655     32.5258  0.0001
DRUG       D     1     -0.8548      0.4838      3.1218  0.0773
DRUG       E     0      0.0000      0.0000           .       .
SCALE            0      1.0000      0.0000           .       .

NOTE:  The scale parameter was held fixed.
```

In this example, PROC GENMOD automatically generates five dummy variables for each value of the class variable DRUG. Therefore, the same result could be obtained without using PROC GENMOD, but employing PROC LOGISTIC:

```  if drug='A' then drugdum1=1; else drugdum1=0;
if drug='B' then drugdum2=1; else drugdum2=0;
if drug='C' then drugdum3=1; else drugdum3=0;
if drug='D' then drugdum4=1; else drugdum4=0;
if drug='E' then drugdum5=1; else drugdum5=0;
proc logistic data=drug2;
model r/n=x drugdum1 drugdum2 drugdum3 drugdum4 drugdum5;
run;
```

where the first five lines must be included in the DATA step to create a new data set DRUG2. Notice that one of the five dummy variables is redundant.

The resulting output will be:

```                      Sample Program: Logistic Regression

The LOGISTIC Procedure

Data Set: WORK.DRUG2
Response Variable (Events): R
Response Variable (Trials): N
Number of Observations: 18

Response Profile

Ordered  Binary
Value  Outcome      Count

1  EVENT           99
2  NO EVENT       138

Model Fitting Information and Testing Global Null Hypothesis BETA=0

Intercept
Intercept        and
Criterion       Only       Covariates    Chi-Square for Covariates

AIC             324.105       241.546         .
SC              327.573       262.355         .
-2 LOG L        322.105       229.546       92.558 with 5 DF (p=0.0001)
Score              .             .          82.029 with 5 DF (p=0.0001)

NOTE: The following parameters have been set to 0, since the variables are a
linear combination of other variables as shown.

DRUGDUM5 = 1 * INTERCPT - 1 * DRUGDUM1 - 1 * DRUGDUM2 - 1 * DRUGDUM3 - 1
* DRUGDUM4

Analysis of Maximum Likelihood Estimates

Parameter Standard    Wald       Pr >    Standardized     Odds
Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

INTERCPT 1     0.2792   0.4196     0.4430     0.5057            .     .
X        1     1.9794   0.7660     6.6772     0.0098     0.259740    7.238
DRUGDUM1 1    -2.8955   0.6092    22.5895     0.0001    -0.539417    0.055
DRUGDUM2 1    -2.0162   0.4052    24.7628     0.0001    -0.476082    0.133
DRUGDUM3 1    -3.7952   0.6654    32.5336     0.0001    -0.822382    0.022
DRUGDUM4 1    -0.8548   0.4838     3.1218     0.0773    -0.154773    0.425
DRUGDUM5 0          0        .      .          .                .     .

Association of Predicted Probabilities and Observed Responses

Concordant = 82.3%          Somers' D = 0.686
Discordant = 13.7%          Gamma     = 0.714
Tied       =  4.0%          Tau-a     = 0.335
(13662 pairs)               c         = 0.843
```

#### CATMOD Procedure

SAS CATMOD (CATegorical data MODeling) procedure fits linear models to functions of response frequencies and can be used for logistic regression. The basic syntax is:

```  proc catmod;
direct x1;
response logits;
model y=x1 x2;
run;
```

where X1 is a continuous quantitative variable and X2 is a categorical variable. You must specify your continuous regressors in the DIRECT statement. Because the CATMOD procedure is mainly designed for the analysis of categorical data, it is not recommended for use with a continuous regressor with a large number of unique values.

Example 8: SAS Logistic Regression in PROC CATMOD

Using the data in Example 1, if you use:

```  proc catmod data=ingot;
direct t;
response logits;
model s=t;
run;
```

you will see the result:

```                      Sample Program: Logistic Regression

CATMOD PROCEDURE

Response: S                           Response Levels (R)=     2
Weight Variable: None                 Populations     (S)=     4
Data Set: INGOT                       Total Frequency (N)=   387
Frequency Missing: 0                  Observations  (Obs)=   387

POPULATION PROFILES
Sample
Sample  T      Size
1    7        55
2   14       157
3   27       159
4   51        16

RESPONSE PROFILES

Response  S
1    0
2    1

MAXIMUM-LIKELIHOOD ANALYSIS

Sub        -2 Log     Convergence    Parameter Estimates
Iteration   Iteration   Likelihood    Criterion         1           2
0           0       536.49592       1.0000            0           0
1           0       152.59147       0.7156      -2.1503      0.0138
2           0       106.76794       0.3003      -3.5040      0.0361
3           0       96.711696       0.0942      -4.6746      0.0633
4           0       95.411914       0.0134      -5.2884      0.0779
5           0       95.374601     0.000391      -5.4109      0.0806
6           0       95.374558    4.5308E-7      -5.4152      0.0807
7           0       95.374558    6.605E-13      -5.4152      0.0807

MAXIMUM-LIKELIHOOD ANALYSIS-OF-VARIANCE TABLE

Source                   DF   Chi-Square      Prob
--------------------------------------------------
INTERCEPT                 1        55.40    0.0000
T                         1        13.03    0.0003

LIKELIHOOD RATIO          2         1.10    0.5781

ANALYSIS OF MAXIMUM-LIKELIHOOD ESTIMATES

Standard    Chi-
Effect            Parameter  Estimate    Error    Square   Prob
----------------------------------------------------------------
INTERCEPT                 1   -5.4152    0.7275    55.40  0.0000
T                         2    0.0807    0.0224    13.03  0.0003
```

Next: Logistic Regression with SPSS
Prev: Logistic Regression
Up: Contents