------------------------------------------------------------------------------------------------------------- log: d:\spost.stata8\do\st8ch4.log log type: text opened on: 26 May 2003, 12:48:29 . . // * . // * RM4STATA Ch 4: Models for Binary Outcomes - 5/26/2003 . // * . . // * Section 4.2: estimation using -logit- and -probit- . . use binlfp2, clear (Data from 1976 PSID-T Mroz) . desc lfp k5 k618 age wc hc lwg inc storage display value variable name type format label variable label ------------------------------------------------------------------------------- lfp byte %9.0g lfplbl Paid Labor Force: 1=yes 0=no k5 byte %9.0g # kids < 6 k618 byte %9.0g # kids 6-18 age byte %9.0g Wife's age in years wc byte %9.0g collbl Wife College: 1=yes 0=no hc byte %9.0g collbl Husband College: 1=yes 0=no lwg float %9.0g Log of wife's estimated wages inc float %9.0g Family income excluding wife's . summarize lfp k5 k618 age wc hc lwg inc Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- lfp | 753 .5683931 .4956295 0 1 k5 | 753 .2377158 .523959 0 3 k618 | 753 1.353254 1.319874 0 8 age | 753 42.53785 8.072574 30 60 wc | 753 .2815405 .4500494 0 1 -------------+-------------------------------------------------------- hc | 753 .3917663 .4884694 0 1 lwg | 753 1.097115 .5875564 -2.054124 3.218876 inc | 753 20.12897 11.6348 -.0290001 96 . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . estimates store logit . probit lfp k5 k618 age wc hc lwg inc, nolog Probit estimates Number of obs = 753 LR chi2(7) = 124.36 Prob > chi2 = 0.0000 Log likelihood = -452.69496 Pseudo R2 = 0.1208 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -.8747112 .1135583 -7.70 0.000 -1.097281 -.6521411 k618 | -.0385945 .0404893 -0.95 0.340 -.117952 .0407631 age | -.0378235 .0076093 -4.97 0.000 -.0527375 -.0229095 wc | .4883144 .1354873 3.60 0.000 .2227642 .7538645 hc | .0571704 .1240052 0.46 0.645 -.1858754 .3002161 lwg | .3656287 .0877792 4.17 0.000 .1935847 .5376727 inc | -.020525 .0047769 -4.30 0.000 -.0298875 -.0111626 _cons | 1.918422 .3806536 5.04 0.000 1.172355 2.66449 ------------------------------------------------------------------------------ . estimates store probit . estimates table logit probit, b(%9.3f) t label varwidth(30) -------------------------------------------------------- Variable | logit probit -------------------------------+------------------------ # kids < 6 | -1.463 -0.875 | -7.43 -7.70 # kids 6-18 | -0.065 -0.039 | -0.95 -0.95 Wife's age in years | -0.063 -0.038 | -4.92 -4.97 Wife College: 1=yes 0=no | 0.807 0.488 | 3.51 3.60 Husband College: 1=yes 0=no | 0.112 0.057 | 0.54 0.46 Log of wife's estimated wages | 0.605 0.366 | 4.01 4.17 Family income excluding wife's | -0.034 -0.021 | -4.20 -4.30 Constant | 3.182 1.918 | 4.94 5.04 -------------------------------------------------------- legend: b/t . . // * Section 4.2.1: predicting perfectly (using artificial data) . . use science2, clear (Note that some of the variables have been artificially constructed.) . gen college = 1-mmale (5 missing values generated) . gen vote = pub1>10 . tab vote college | college vote | 0 1 | Total -----------+----------------------+---------- 0 | 293 4 | 297 1 | 6 0 | 6 -----------+----------------------+---------- Total | 299 4 | 303 . logit vote college phd, nolog note: college != 0 predicts failure perfectly college dropped and 4 obs not used Logit estimates Number of obs = 299 LR chi2(1) = 0.23 Prob > chi2 = 0.6320 Log likelihood = -29.276794 Pseudo R2 = 0.0039 ------------------------------------------------------------------------------ vote | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- phd | -.1927085 .4023944 -0.48 0.632 -.9813871 .5959701 _cons | -3.293021 1.272882 -2.59 0.010 -5.787824 -.7982179 ------------------------------------------------------------------------------ . . // * Section 4.3.1: testing individual coefficients . . use binlfp2, clear (Data from 1976 PSID-T Mroz) . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . . * Wald test . test k5 ( 1) k5 = 0 chi2( 1) = 55.14 Prob > chi2 = 0.0000 . display sqrt(55.14) 7.4256313 . . * LR test . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . estimates store fmodel . logit lfp k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(6) = 58.00 Prob > chi2 = 0.0000 Log likelihood = -485.87503 Pseudo R2 = 0.0563 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k618 | -.0019764 .0636558 -0.03 0.975 -.1267396 .1227867 age | -.0171119 .0105955 -1.62 0.106 -.0378788 .0036549 wc | .6521281 .2157808 3.02 0.003 .2292056 1.075051 hc | .0285563 .1954934 0.15 0.884 -.3546038 .4117165 lwg | .6153404 .1457869 4.22 0.000 .3296033 .9010774 inc | -.0327874 .0076542 -4.28 0.000 -.0477893 -.0177854 _cons | .8184524 .5287458 1.55 0.122 -.2178702 1.854775 ------------------------------------------------------------------------------ . estimates store nmodel . lrtest fmodel nmodel likelihood-ratio test LR chi2(1) = 66.48 (Assumption: nmodel nested in fmodel) Prob > chi2 = 0.0000 . . // * Section 4.3.2: testing multiple coefficients . . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . . * Wald tests . test hc wc ( 1) hc = 0 ( 2) wc = 0 chi2( 2) = 17.66 Prob > chi2 = 0.0001 . test hc=wc ( 1) - wc + hc = 0 chi2( 1) = 3.54 Prob > chi2 = 0.0600 . . * LR tests . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . estimates store fmodel . logit lfp k5 k618 age lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(5) = 105.98 Prob > chi2 = 0.0000 Log likelihood = -461.88084 Pseudo R2 = 0.1029 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.373557 .1935464 -7.10 0.000 -1.752901 -.9942135 k618 | -.0793719 .0670363 -1.18 0.236 -.2107607 .0520169 age | -.065317 .012541 -5.21 0.000 -.0898969 -.0407372 lwg | .7709097 .1488494 5.18 0.000 .4791702 1.062649 inc | -.0240763 .0072281 -3.33 0.001 -.0382431 -.0099095 _cons | 3.149502 .6333594 4.97 0.000 1.90814 4.390864 ------------------------------------------------------------------------------ . estimates store nmodel . lrtest fmodel nmodel likelihood-ratio test LR chi2(2) = 18.50 (Assumption: nmodel nested in fmodel) Prob > chi2 = 0.0001 . . logit lfp, nolog Logit estimates Number of obs = 753 LR chi2(0) = 0.00 Prob > chi2 = . Log likelihood = -514.8732 Pseudo R2 = 0.0000 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | .275298 .0735756 3.74 0.000 .1310925 .4195036 ------------------------------------------------------------------------------ . estimates store intercept_only . lrtest fmodel intercept_only likelihood-ratio test LR chi2(7) = 124.48 (Assumption: intercept_only nested in fmodel) Prob > chi2 = 0.0000 . . // * Section 4.4: residuals and influence using -predict- . . * create artificial data . clear . set seed 315 . set obs 34 obs was 0, now 34 . gen y = (uniform()*4) in 1/30 (4 missing values generated) . gen x = (4 - (y)) + (uniform()) in 1/30 (4 missing values generated) . * change one point . replace y = 9 in 31 (1 real change made) . sum x Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x | 30 2.452685 1.093432 .3357002 4.646409 . sca meanx = r(mean) . replace x = meanx in 31 (1 real change made) . * compute the regression . reg y x Source | SS df MS Number of obs = 31 -------------+------------------------------ F( 1, 29) = 15.27 Model | 25.9683803 1 25.9683803 Prob > F = 0.0005 Residual | 49.3186356 29 1.70064261 R-squared = 0.3449 -------------+------------------------------ Adj R-squared = 0.3223 Total | 75.2870159 30 2.5095672 Root MSE = 1.3041 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | -.8654293 .2214704 -3.91 0.001 -1.318387 -.4124715 _cons | 4.327982 .5915426 7.32 0.000 3.118141 5.537822 ------------------------------------------------------------------------------ . * change two points . replace x = 0 in 32 (1 real change made) . replace x = 5 in 33 (1 real change made) . * compute predictions . predict yhat in 32/33 (option xb assumed; fitted values) (32 missing values generated) . predict res in 32/33, res (34 missing values generated) . gen y3 = 9 in 31 (33 missing values generated) . set textsize 150 . drop yhat res . predict yhat (option xb assumed; fitted values) (1 missing value generated) . predict res , res (3 missing values generated) . local yout = y[31] . local xout = x[31] . * graph outlier that is not influential . graph twoway (scatter y x) (line yhat x) /// > (scatteri `yout' `xout' (3) "outlier"), /// > title("Large outlier that is not influential") /// > legend( order( 2 ) label(2 "Regression line") ) /// > ytitle("y") xlabel( 0 5 10) name(reg1, replace) . * change the data . replace y = 6 in 31 (1 real change made) . replace x = 29.5 in 31 (1 real change made) . replace x = 30 in 34 (1 real change made) . local yout = y[31] . local xout = x[31] . * compute the regression . reg y x Source | SS df MS Number of obs = 31 -------------+------------------------------ F( 1, 29) = 6.21 Model | 7.6242044 1 7.6242044 Prob > F = 0.0187 Residual | 35.6046246 29 1.22774568 R-squared = 0.1764 -------------+------------------------------ Adj R-squared = 0.1480 Total | 43.228829 30 1.44096097 Root MSE = 1.108 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .1013236 .04066 2.49 0.019 .0181645 .1844827 _cons | 1.771662 .2405915 7.36 0.000 1.279598 2.263727 ------------------------------------------------------------------------------ . predict yhat2 (option xb assumed; fitted values) . replace yhat2 = . in 2/31 (30 real changes made, 30 to missing) . * graph the results . graph twoway (scatter y x) (line yhat2 x, sort clpattern(dash)) /// > (line yhat x if x<25, sort) /// > (scatteri `yout' `xout' (9) "influential observation ", /// > msymbol(S) mlabgap(*3)), /// > title("Smaller outlier that is influential") /// > legend( order( 2 3) label(2 "New Regression") /// > label(3 "Old regression") ) ytitle("y") name(reg2,replace) . graph combine reg1 reg2, col(1) ysize(4.31) xsize(3.287) iscale(*.9) . graph export 04residstata.eps, replace (file 04residstata.eps written in .eps format) . . // * Section 4.4.1: residuals . . use binlfp2, clear (Data from 1976 PSID-T Mroz) . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . predict rstd, rs . label var rstd "Standardized Residual" . sort inc, stable . generate index = _n . label var index "Observation Number" . . * index plot of std pearson residuals . graph twoway scatter rstd index, xlabel(0(200)800) ylabel(-4(2)4) /// > xtitle("Observation Number") yline(0) msymbol(Oh) /// > ysize(2.7051) xsize(4.0413) . graph export 04rstd.eps , replace (file 04rstd.eps written in .eps format) . . * index plot of std pearson residuals labeled with the index . graph twoway scatter rstd index, xlabel(0(200)800) ylabel(-4(2)4) /// > xtitle("Observation Number") yline(0) /// > msymbol(none) mlabel(index) mlabposition(0) /// > ysize(2.7051) xsize(4.0413) . graph export 04rstdcase.eps , replace (file 04rstdcase.eps written in .eps format) . . * list a single point . list in 142, noobs +------------------------------------------------------------------------------+ | lfp k5 k618 age wc hc lwg inc rstd index | |------------------------------------------------------------------------------| | inLF 1 2 36 NoCol NoCol -2.054124 11.2 3.191524 142 | +------------------------------------------------------------------------------+ . . * list large outliers . list rstd index if rstd>2.5 | rstd<-2.5 +-------------------+ | rstd index | |-------------------| 142. | 3.191524 142 | 345. | 2.873378 345 | 511. | -2.677243 511 | 555. | -2.871972 555 | 752. | 3.192648 752 | +-------------------+ . . // * Section 4.4.2: influential cases . . predict cook,dbeta . label var cook "Cook's Statistic" . graph twoway scatter cook index, xlabel(0(200)800) ylabel(0(.1).3) /// > xtitle("Observation Number") yline(.1 .2) /// > msymbol(none) mlabel(index) mlabposition(0) /// > ysize(2.7051) xsize(4.0413) . graph export 04cookcase.eps , replace (file 04cookcase.eps written in .eps format) . . // * Section 4.5: scalar measures of fit . . quietly logit lfp k5 k618 age wc hc lwg inc, nolog . estimates store model1 . quietly fitstat, save . gen agesq = age*age . logit lfp k5 age agesq wc inc, nolog Logit estimates Number of obs = 753 LR chi2(5) = 106.44 Prob > chi2 = 0.0000 Log likelihood = -461.65276 Pseudo R2 = 0.1034 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.379839 .1954677 -7.06 0.000 -1.762949 -.9967297 age | .0568824 .11411 0.50 0.618 -.166769 .2805339 agesq | -.0012928 .001294 -1.00 0.318 -.0038291 .0012434 wc | 1.093673 .1987386 5.50 0.000 .7041522 1.483193 inc | -.0323176 .0077281 -4.18 0.000 -.0474645 -.0171707 _cons | .9791676 2.458098 0.40 0.690 -3.838616 5.796951 ------------------------------------------------------------------------------ . estimates store model2 . estimates table model1 model2, b(%9.3f) t -------------------------------------- Variable | model1 model2 -------------+------------------------ k5 | -1.463 -1.380 | -7.43 -7.06 k618 | -0.065 | -0.95 age | -0.063 0.057 | -4.92 0.50 wc | 0.807 1.094 | 3.51 5.50 hc | 0.112 | 0.54 lwg | 0.605 | 4.01 inc | -0.034 -0.032 | -4.20 -4.18 agesq | -0.001 | -1.00 _cons | 3.182 0.979 | 4.94 0.40 -------------------------------------- legend: b/t . fitstat, dif Measures of Fit for logit of lfp Current Saved Difference Model: logit logit N: 753 753 0 Log-Lik Intercept Only: -514.873 -514.873 0.000 Log-Lik Full Model: -461.653 -452.633 -9.020 D: 923.306(747) 905.266(745) 18.040(2) LR: 106.441(5) 124.480(7) 18.040(2) Prob > LR: 0.000 0.000 0.000 McFadden's R2: 0.103 0.121 -0.018 McFadden's Adj R2: 0.092 0.105 -0.014 Maximum Likelihood R2: 0.132 0.152 -0.021 Cragg & Uhler's R2: 0.177 0.204 -0.028 McKelvey and Zavoina's R2: 0.182 0.217 -0.035 Efron's R2: 0.135 0.155 -0.020 Variance of y*: 4.023 4.203 -0.180 Variance of error: 3.290 3.290 0.000 Count R2: 0.677 0.693 -0.016 Adj Count R2: 0.252 0.289 -0.037 AIC: 1.242 1.223 0.019 AIC*n: 935.306 921.266 14.040 BIC: -4024.871 -4029.663 4.791 BIC': -73.321 -78.112 4.791 Difference of 4.791 in BIC' provides positive support for saved model. Note: p-value for difference in LR is only valid if models are nested. . . // * Section 4.6.1: predicted probabilities with predict . . * predictions from -logit- . use binlfp2, clear (Data from 1976 PSID-T Mroz) . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . predict prlogit (option p assumed; Pr(lfp)) . summarize prlogit Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- prlogit | 753 .5683931 .1944213 .0139875 .9621198 . label var prlogit "Logit: Pr(lfp)" . . dotplot prlogit, ylabel(0(.2)1) /// > ysize(2.7051) xsize(4.0413) . graph export 04dotpredict.eps, replace (file 04dotpredict.eps written in .eps format) . . * comparing -logit- and -probit- predictions . use binlfp2, clear (Data from 1976 PSID-T Mroz) . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . predict prlogit (option p assumed; Pr(lfp)) . label var prlogit "Logit: Pr(lfp)" . probit lfp k5 k618 age wc hc lwg inc, nolog Probit estimates Number of obs = 753 LR chi2(7) = 124.36 Prob > chi2 = 0.0000 Log likelihood = -452.69496 Pseudo R2 = 0.1208 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -.8747112 .1135583 -7.70 0.000 -1.097281 -.6521411 k618 | -.0385945 .0404893 -0.95 0.340 -.117952 .0407631 age | -.0378235 .0076093 -4.97 0.000 -.0527375 -.0229095 wc | .4883144 .1354873 3.60 0.000 .2227642 .7538645 hc | .0571704 .1240052 0.46 0.645 -.1858754 .3002161 lwg | .3656287 .0877792 4.17 0.000 .1935847 .5376727 inc | -.020525 .0047769 -4.30 0.000 -.0298875 -.0111626 _cons | 1.918422 .3806536 5.04 0.000 1.172355 2.66449 ------------------------------------------------------------------------------ . predict prprobit (option p assumed; Pr(lfp)) . label var prprobit "Probit: Pr(lfp)" . pwcorr prlogit prprobit | prlogit prprobit -------------+------------------ prlogit | 1.0000 prprobit | 0.9998 1.0000 . . * graphing predicted probabilities from -logit- and -probit- . graph twoway scatter prlogit prprobit, /// > xlabel(0(.25)1) ylabel(0(.25)1) /// > xline(.25(.25)1) yline(.25(.25)1) /// > plotregion(margin(zero)) msymbol(Oh) /// > ysize(4.0413) xsize(4.0413) . graph export 04logitprobit.eps, replace (file 04logitprobit.eps written in .eps format) . . // * Section 4.6.2: individual predicted probabilities with -prvalue- . . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . . * young, low income, low education families with young children. . prvalue, x(age=35 k5=2 wc=0 hc=0 inc=15) rest(mean) logit: Predictions for lfp Pr(y=inLF|x): 0.1318 95% ci: (0.0723,0.2282) Pr(y=NotInLF|x): 0.8682 95% ci: (0.7718,0.9277) k5 k618 age wc hc lwg inc x= 2 1.3532537 35 0 0 1.0971148 15 . . * highly education families with no children at home. . prvalue, x(age=50 k5=0 k618=0 wc=1 hc=1) rest(mean) logit: Predictions for lfp Pr(y=inLF|x): 0.7166 95% ci: (0.6266,0.7921) Pr(y=NotInLF|x): 0.2834 95% ci: (0.2079,0.3734) k5 k618 age wc hc lwg inc x= 0 0 50 1 1 1.0971148 20.128965 . . * an average person . prvalue, rest(mean) logit: Predictions for lfp Pr(y=inLF|x): 0.5778 95% ci: (0.5388,0.6159) Pr(y=NotInLF|x): 0.4222 95% ci: (0.3841,0.4612) k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 .2815405 .39176627 1.0971148 20.128965 . . // * Section 4.6.3: tables of predicted probabilities with -prtab- . . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates Number of obs = 753 LR chi2(7) = 124.48 Prob > chi2 = 0.0000 Log likelihood = -452.63296 Pseudo R2 = 0.1209 ------------------------------------------------------------------------------ lfp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 | -1.462913 .1970006 -7.43 0.000 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.342 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.000 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.000 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.588 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.000 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.000 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.000 1.919188 4.445092 ------------------------------------------------------------------------------ . . * using -prvalue- . prvalue, x(k5=0 wc=0) rest(mean) brief Pr(y=inLF|x): 0.6069 95% ci: (0.5558,0.6558) Pr(y=NotInLF|x): 0.3931 95% ci: (0.3442,0.4442) . prvalue, x(k5=1 wc=0) rest(mean) brief Pr(y=inLF|x): 0.2633 95% ci: (0.1994,0.3391) Pr(y=NotInLF|x): 0.7367 95% ci: (0.6609,0.8006) . prvalue, x(k5=2 wc=0) rest(mean) brief Pr(y=inLF|x): 0.0764 95% ci: (0.0388,0.1451) Pr(y=NotInLF|x): 0.9236 95% ci: (0.8549,0.9612) . prvalue, x(k5=3 wc=0) rest(mean) brief Pr(y=inLF|x): 0.0188 95% ci: (0.0064,0.0542) Pr(y=NotInLF|x): 0.9812 95% ci: (0.9458,0.9936) . . * or use -prtab- . prtab k5 wc, rest(mean) logit: Predicted probabilities of positive outcome for lfp ---------------------------- | Wife College: # kids < | 1=yes 0=no 6 | NoCol College ----------+----------------- 0 | 0.6069 0.7758 1 | 0.2633 0.4449 2 | 0.0764 0.1565 3 | 0.0188 0.0412 ---------------------------- k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 .2815405 .39176627 1.0971148 20.128965 . . * compute the differences: . di 0.6069-0.7758 -.1689 . di 0.2633-0.4449 -.1816 . di 0.0764-0.1565 -.0801 . di 0.0188-0.0412 -.0224 . . // * Section 4.6.4: graphing predicted probabilities with -prgen- . . * compute predictions at different ages . * age 30 . prgen inc, from(0) to(100) generate(p30) x(age=30) rest(mean) n(11) logit: Predicted values as inc varies from 0 to 100. k5 k618 age wc hc lwg inc x= .2377158 1.3532537 30 .2815405 .39176627 1.0971148 20.128965 . label var p30p1 "Age 30" . * age 40 . prgen inc, from(0) to(100) generate(p40) x(age=40) rest(mean) n(11) logit: Predicted values as inc varies from 0 to 100. k5 k618 age wc hc lwg inc x= .2377158 1.3532537 40 .2815405 .39176627 1.0971148 20.128965 . label var p40p1 "Age 40" . * age 50 . prgen inc, from(0) to(100) generate(p50) x(age=50) rest(mean) n(11) logit: Predicted values as inc varies from 0 to 100. k5 k618 age wc hc lwg inc x= .2377158 1.3532537 50 .2815405 .39176627 1.0971148 20.128965 . label var p50p1 "Age 50" . * age 60 . prgen inc, from(0) to(100) generate(p60) x(age=60) rest(mean) n(11) logit: Predicted values as inc varies from 0 to 100. k5 k618 age wc hc lwg inc x= .2377158 1.3532537 60 .2815405 .39176627 1.0971148 20.128965 . label var p60p1 "Age 60" . . * -list- and -graph- predictions . list p30p1 p40p1 p50p1 p60p1 p60x in 1/11 +--------------------------------------------------+ | p30p1 p40p1 p50p1 p60p1 p60x | |--------------------------------------------------| 1. | .8575829 .7625393 .6313345 .4773258 0 | 2. | .8101358 .6947005 .5482202 .3928797 10 | 3. | .7514627 .6172101 .462326 .3143872 20 | 4. | .6817801 .5332655 .3786113 .2452419 30 | 5. | .6028849 .4473941 .3015535 .187153 40 | |--------------------------------------------------| 6. | .5182508 .36455 .2342664 .1402662 50 | 7. | .4325564 .289023 .1781635 .1036283 60 | 8. | .3507161 .2236366 .1331599 .0757174 70 | 9. | .2768067 .1695158 .0981662 .0548639 80 | 10. | .2133547 .1263607 .071609 .0395082 90 | |--------------------------------------------------| 11. | .1612055 .0929622 .0518235 .0283215 100 | +--------------------------------------------------+ . graph twoway connected p30p1 p40p1 p50p1 p60p1 p60x, /// > ytitle("Pr(In Labor Force)") ylabel(0(.25)1) /// > xtitle("Income") /// > ysize(2.7051) xsize(4.0413) . graph export 04ageincome.eps, replace (file 04ageincome.eps written in .eps format) . . * another example of -prgen- comparing those who do and do not attend college . // This example is not in the book . prgen age, from(30) to(60) generate(wc1) x(wc=1) rest(mean) n(13) logit: Predicted values as age varies from 30 to 60. k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 1 .39176627 1.0971148 20.128965 . label var wc1p1 "Attended College" . prgen age, from(30) to(60) generate(wc0) x(wc=0) rest(mean) n(13) logit: Predicted values as age varies from 30 to 60. k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 0 .39176627 1.0971148 20.128965 . label var wc0p1 "Did Not Attend College" . graph twoway connected wc1p1 wc0p1 wc1x, /// > xtitle("Age") ytitle("Pr(In Labor Force)") /// > ylabel(0(.25)1) xlabel(30(10)60) . . // * Section 4.6.5: changes in predicted probabilities . . * discrete change with -prchange- . prchange age, x(wc=1 age=40) logit: Changes in Predicted Probabilities for lfp min->max 0->1 -+1/2 -+sd/2 MargEfct age -0.3940 -0.0017 -0.0121 -0.0971 -0.0121 NotInLF inLF Pr(y|x) 0.2586 0.7414 k5 k618 age wc hc lwg inc x= .237716 1.35325 40 1 .391766 1.09711 20.129 sd(x)= .523959 1.31987 8.07257 .450049 .488469 .587556 11.6348 . mfx compute, at(wc=1 age=40) Marginal effects after logit y = Pr(lfp) (predict) = .74140317 ------------------------------------------------------------------------------ variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------- k5 | -.2804763 .04221 -6.64 0.000 -.363212 -.197741 .237716 k618 | -.0123798 .01305 -0.95 0.343 -.037959 .013199 1.35325 age | -.0120538 .00245 -4.92 0.000 -.016855 -.007252 40 wc*| .1802113 .04742 3.80 0.000 .087269 .273154 1 hc*| .0212952 .03988 0.53 0.593 -.056866 .099456 .391766 lwg | .1159345 .03229 3.59 0.000 .052643 .179226 1.09711 inc | -.0066042 .00163 -4.05 0.000 -.009802 -.003406 20.129 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1 . prchange, help logit: Changes in Predicted Probabilities for lfp min->max 0->1 -+1/2 -+sd/2 MargEfct k5 -0.6361 -0.3499 -0.3428 -0.1849 -0.3569 k618 -0.1278 -0.0156 -0.0158 -0.0208 -0.0158 age -0.4372 -0.0030 -0.0153 -0.1232 -0.0153 wc 0.1881 0.1881 0.1945 0.0884 0.1969 hc 0.0272 0.0272 0.0273 0.0133 0.0273 lwg 0.6624 0.1499 0.1465 0.0865 0.1475 inc -0.6415 -0.0068 -0.0084 -0.0975 -0.0084 NotInLF inLF Pr(y|x) 0.4222 0.5778 k5 k618 age wc hc lwg inc x= .237716 1.35325 42.5378 .281541 .391766 1.09711 20.129 sd(x)= .523959 1.31987 8.07257 .450049 .488469 .587556 11.6348 Pr(y|x): probability of observing each y for specified x values Avg|Chg|: average of absolute value of the change across categories Min->Max: change in predicted probability as x changes from its minimum to its maximum 0->1: change in predicted probability as x changes from 0 to 1 -+1/2: change in predicted probability as x changes from 1/2 unit below base value to 1/2 unit above -+sd/2: change in predicted probability as x changes from 1/2 standard dev below base to 1/2 standard dev above MargEfct: the partial derivative of the predicted probability/rate with respect to a given independent variable . prchange k5 age wc lwg inc, fromto logit: Changes in Predicted Probabilities for lfp from: to: dif: from: to: dif: from: to: dif: from: x=min x=max min->max x=0 x=1 0->1 x-1/2 x+1/2 -+1/2 x-1/2sd k5 0.6596 0.0235 -0.6361 0.6596 0.3097 -0.3499 0.7398 0.3971 -0.3428 0.6675 age 0.7506 0.3134 -0.4372 0.9520 0.9491 -0.0030 0.5854 0.5701 -0.0153 0.6382 wc 0.5216 0.7097 0.1881 0.5216 0.7097 0.1881 0.4775 0.6720 0.1945 0.5330 lwg 0.1691 0.8316 0.6624 0.4135 0.5634 0.1499 0.5028 0.6493 0.1465 0.5340 inc 0.7326 0.0911 -0.6415 0.7325 0.7256 -0.0068 0.5820 0.5736 -0.0084 0.6258 to: dif: x+1/2sd -+sd/2 MargEfct k5 0.4826 -0.1849 -0.3569 age 0.5150 -0.1232 -0.0153 wc 0.6214 0.0884 0.1969 lwg 0.6204 0.0865 0.1475 inc 0.5283 -0.0975 -0.0084 NotInLF inLF Pr(y|x) 0.4222 0.5778 k5 k618 age wc hc lwg inc x= .237716 1.35325 42.5378 .281541 .391766 1.09711 20.129 sd(x)= .523959 1.31987 8.07257 .450049 .488469 .587556 11.6348 . . * discrete change using -prvalue- . prvalue, x(age=30) save brief Pr(y=inLF|x): 0.7506 95% ci: (0.6771,0.8121) Pr(y=NotInLF|x): 0.2494 95% ci: (0.1879,0.3229) . prvalue, x(age=40) dif brief Current Saved Difference Pr(y=inLF|x): 0.6162 0.7506 -0.1345 Pr(y=NotInLF|x): 0.3838 0.2494 0.1345 . . * discrete change using -prchange- with -delta- and -uncentered- options . prchange age, x(age=30) uncentered delta(10) rest(mean) brief min->max 0->1 +delta +sd MargEfct age -0.4372 -0.0030 -0.1345 -0.1062 -0.0118 . . // * Section 4.7: odds ratios using -listcoef- . . listcoef, help logit (N=753): Factor Change in Odds Odds of: inLF vs NotInLF ---------------------------------------------------------------------- lfp | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- k5 | -1.46291 -7.426 0.000 0.2316 0.4646 0.5240 k618 | -0.06457 -0.950 0.342 0.9375 0.9183 1.3199 age | -0.06287 -4.918 0.000 0.9391 0.6020 8.0726 wc | 0.80727 3.510 0.000 2.2418 1.4381 0.4500 hc | 0.11173 0.542 0.588 1.1182 1.0561 0.4885 lwg | 0.60469 4.009 0.000 1.8307 1.4266 0.5876 inc | -0.03445 -4.196 0.000 0.9661 0.6698 11.6348 ---------------------------------------------------------------------- b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bStdX = exp(b*SD of X) = change in odds for SD increase in X SDofX = standard deviation of X . listcoef, reverse logit (N=753): Factor Change in Odds Odds of: NotInLF vs inLF ---------------------------------------------------------------------- lfp | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- k5 | -1.46291 -7.426 0.000 4.3185 2.1522 0.5240 k618 | -0.06457 -0.950 0.342 1.0667 1.0890 1.3199 age | -0.06287 -4.918 0.000 1.0649 1.6612 8.0726 wc | 0.80727 3.510 0.000 0.4461 0.6954 0.4500 hc | 0.11173 0.542 0.588 0.8943 0.9469 0.4885 lwg | 0.60469 4.009 0.000 0.5462 0.7010 0.5876 inc | -0.03445 -4.196 0.000 1.0350 1.4930 11.6348 ---------------------------------------------------------------------- . listcoef, percent logit (N=753): Percentage Change in Odds Odds of: inLF vs NotInLF ---------------------------------------------------------------------- lfp | b z P>|z| % %StdX SDofX -------------+-------------------------------------------------------- k5 | -1.46291 -7.426 0.000 -76.8 -53.5 0.5240 k618 | -0.06457 -0.950 0.342 -6.3 -8.2 1.3199 age | -0.06287 -4.918 0.000 -6.1 -39.8 8.0726 wc | 0.80727 3.510 0.000 124.2 43.8 0.4500 hc | 0.11173 0.542 0.588 11.8 5.6 0.4885 lwg | 0.60469 4.009 0.000 83.1 42.7 0.5876 inc | -0.03445 -4.196 0.000 -3.4 -33.0 11.6348 ---------------------------------------------------------------------- . . capture log close