------------------------------------------------------------------------------------------------------------- log: d:\spost.stata8\do\st8ch2.log log type: text opened on: 26 May 2003, 12:47:43 . . // * . // * RM4STATA Ch 2: Introduction to Stata - 5/26/2003 . // * . . // * Section 2.7: using and saving datasets . . use nomocc2.dta, clear (1982 General Social Survey) . save nomocc3.dta, replace file nomocc3.dta saved . . // * Section 2.9: command or do files . . * example.do: short do file . /* > log using example, replace > use binlfp2, clear > tabulate hc wc, row nolabel > log close > */ . . * example2.do: short do file using comments . /* > > ==> short simple do file > ==> for didactic purposes > > log using example, replace // this text is ignored > * next we load the data > use binlfp2, clear > * tabulate husband's and wife's education > tabulate hc wc, /// the next line is the continuation of this one > row nolabel > * close up > log close > * make sure there is a cr at the end! > */ . . * long lines using #delimit . use gsskidvalue2.dta (1993 and 1994 General Social Survey) . #delimit ; delimiter now ; . recode income91 1=500 2=1500 3=3500 4=4500 5=5500 6=6500 7=7500 8=9000 > 9=11250 10=13750 11=16250 12=18750 13=21250 14=23750 15=27500 16=32500 > 17=37500 18=45000 19=55000 20=67500 21=75000 *=. ; (income91: 4598 changes made) . #delimit cr delimiter now cr . . * Tip: long lines . recode income91 1=500 2=1500 3=3500 4=4500 5=5500 6=6500 7=7500 8=9000 /// > 9=11250 10=13750 11=16250 12=18750 13=21250 14=23750 15=27500 16=32500 /// > 17=37500 18=45000 19=55000 20=67500 21=75000 *=. (income91: 4103 changes made) . . // * Section 2.9.5: recommended structure of do files . . /* > * Note: version number ensures compatibility with later Stata releases > version 8 > * Note: if a log file is open, close it > capture log close > * Note: don't pause when output scrolls off the page > set more off > * Note: log results to file myfile.log > log using myfile, replace text > * myfile.do - written 29 jan 2003 to illustrate do files > > * Note: your commands go here > > * Note: close the log file. > log close > */ . . // * Section 2.11: syntax of stata commands . . use binlfp2, clear (Data from 1976 PSID-T Mroz) . tabulate hc wc if age>40, row nokey Husband | Wife College: 1=yes College: | 0=no 1=yes 0=no | NoCol College | Total -----------+----------------------+---------- NoCol | 263 23 | 286 | 91.96 8.04 | 100.00 -----------+----------------------+---------- College | 58 91 | 149 | 38.93 61.07 | 100.00 -----------+----------------------+---------- Total | 321 114 | 435 | 73.79 26.21 | 100.00 . . // * Section 2.11.2: variable lists . . sum age inc k5 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 753 42.53785 8.072574 30 60 inc | 753 20.12897 11.6348 -.0290001 96 k5 | 753 .2377158 .523959 0 3 . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- lfp | 753 .5683931 .4956295 0 1 k5 | 753 .2377158 .523959 0 3 k618 | 753 1.353254 1.319874 0 8 age | 753 42.53785 8.072574 30 60 wc | 753 .2815405 .4500494 0 1 -------------+-------------------------------------------------------- hc | 753 .3917663 .4884694 0 1 lwg | 753 1.097115 .5875564 -2.054124 3.218876 inc | 753 20.12897 11.6348 -.0290001 96 . sum k* Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- k5 | 753 .2377158 .523959 0 3 k618 | 753 1.353254 1.319874 0 8 . . // * Section 2.11.3: if and in . . use gsskidvalue2, clear (1993 and 1994 General Social Survey) . sum income if age>=25 & age<=65 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- income | 3148 38060.99 22186.26 1000 75000 . sum income if age>=25 & age<=65 & female==1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- income | 1747 36415.71 22371.64 1000 75000 . sum income if (age<25 | age>65) & age~=. & female==1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- income | 539 21203.15 18255.8 1000 75000 . . // * Section 2.12.2: getting information . . use binlfp2, clear (Data from 1976 PSID-T Mroz) . describe Contains data from binlfp2.dta obs: 753 Data from 1976 PSID-T Mroz vars: 8 30 Apr 2001 16:17 size: 13,554 (99.9% of memory free) (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- lfp byte %9.0g lfplbl Paid Labor Force: 1=yes 0=no k5 byte %9.0g # kids < 6 k618 byte %9.0g # kids 6-18 age byte %9.0g Wife's age in years wc byte %9.0g collbl Wife College: 1=yes 0=no hc byte %9.0g collbl Husband College: 1=yes 0=no lwg float %9.0g Log of wife's estimated wages inc float %9.0g Family income excluding wife's ------------------------------------------------------------------------------- Sorted by: lfp . sum age, detail Wife's age in years ------------------------------------------------------------- Percentiles Smallest 1% 30 30 5% 30 30 10% 32 30 Obs 753 25% 36 30 Sum of Wgt. 753 50% 43 Mean 42.53785 Largest Std. Dev. 8.072574 75% 49 60 90% 54 60 Variance 65.16645 95% 56 60 Skewness .150879 99% 59 60 Kurtosis 1.981077 . tab hc Husband | College: | 1=yes 0=no | Freq. Percent Cum. ------------+----------------------------------- NoCol | 458 60.82 60.82 College | 295 39.18 100.00 ------------+----------------------------------- Total | 753 100.00 . tab hc, nolabel Husband | College: | 1=yes 0=no | Freq. Percent Cum. ------------+----------------------------------- 0 | 458 60.82 60.82 1 | 295 39.18 100.00 ------------+----------------------------------- Total | 753 100.00 . tab hc wc Husband | Wife College: 1=yes College: | 0=no 1=yes 0=no | NoCol College | Total -----------+----------------------+---------- NoCol | 417 41 | 458 College | 124 171 | 295 -----------+----------------------+---------- Total | 541 212 | 753 . tab1 hc wc -> tabulation of hc Husband | College: | 1=yes 0=no | Freq. Percent Cum. ------------+----------------------------------- NoCol | 458 60.82 60.82 College | 295 39.18 100.00 ------------+----------------------------------- Total | 753 100.00 -> tabulation of wc Wife | College: | 1=yes 0=no | Freq. Percent Cum. ------------+----------------------------------- NoCol | 541 71.85 71.85 College | 212 28.15 100.00 ------------+----------------------------------- Total | 753 100.00 . dotplot age . graph export 02dotplot.eps, replace (file 02dotplot.eps written in .eps format) . codebook age ------------------------------------------------------------------------------------------------------------- age Wife's age in years ------------------------------------------------------------------------------------------------------------- type: numeric (byte) range: [30,60] units: 1 unique values: 31 missing .: 0/753 mean: 42.5378 std. dev: 8.07257 percentiles: 10% 25% 50% 75% 90% 32 36 43 49 54 . . // * Section 2.13.1: generate . . use binlfp2, clear (Data from 1976 PSID-T Mroz) . generate age2 = age . summarize age2 age Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age2 | 753 42.53785 8.072574 30 60 age | 753 42.53785 8.072574 30 60 . gen age3 = age if age>40 (318 missing values generated) . sum age3 age Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age3 | 435 48.3977 4.936509 41 60 age | 753 42.53785 8.072574 30 60 . gen agesq = age^2 . gen lnage = ln(age) . . // * Section 2.13.2: replace . . gen age4 = age . replace age4 = 40 if age<40 (298 real changes made) . sum age4 age Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age4 | 753 44.85126 5.593896 40 60 age | 753 42.53785 8.072574 30 60 . . // * Section 2.13.3: recode . . use recodedata2.dta, clear . recode origvar (1=2) (3=4), generate(myvar1) (23 differences between origvar and myvar1) . recode origvar (2=1) (*=0), gen(myvar2) (100 differences between origvar and myvar2) . recode origvar (2=1) (nonmissing=0), gen(myvar3) (89 differences between origvar and myvar3) . recode origvar (1/4=2), gen(myvar4) (40 differences between origvar and myvar4) . recode origvar (1 3 4 5=7), gen(myvar5) (55 differences between origvar and myvar5) . recode origvar (min/5=min), gen(myvar6) (56 differences between origvar and myvar6) . recode origvar (missing=9), gen(myvar7) (11 differences between origvar and myvar7) . recode origvar (.=-999) (1/3=-999) (7=-999) (origvar: 56 changes made) . recode origvar (-999=.), gen(myvar8) (56 differences between origvar and myvar8) . . // * Secton 2.13.4: common transformations of rhs variables . . * breaking a categorical variable into a set of dummy variables . use gsskidvalue2, clear (1993 and 1994 General Social Survey) . * example 1 - tab, gen() . tab degree, gen(edlevel) rs highest | degree | Freq. Percent Cum. ---------------+----------------------------------- lt high school | 801 17.47 17.47 high school | 2,426 52.92 70.40 junior college | 273 5.96 76.35 bachelor | 750 16.36 92.71 graduate | 334 7.29 100.00 ---------------+----------------------------------- Total | 4,584 100.00 . sum edlevel* Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- edlevel1 | 4584 .1747382 .3797845 0 1 edlevel2 | 4584 .5292321 .4991992 0 1 edlevel3 | 4584 .059555 .2366863 0 1 edlevel4 | 4584 .1636126 .369964 0 1 edlevel5 | 4584 .0728621 .2599384 0 1 . tab degree edlevel1, missing rs highest | degree==lt high school degree | 0 1 . | Total ---------------+---------------------------------+---------- lt high school | 0 801 0 | 801 high school | 2,426 0 0 | 2,426 junior college | 273 0 0 | 273 bachelor | 750 0 0 | 750 graduate | 334 0 0 | 334 . | 0 0 14 | 14 ---------------+---------------------------------+---------- Total | 3,783 801 14 | 4,598 . * example 2 - gen if . gen hsdeg = (degree==1 | degree==2) if degree<. (14 missing values generated) . gen coldeg = (degree==3) if degree<. (14 missing values generated) . gen graddeg = (degree==4) if degree<. (14 missing values generated) . tab degree coldeg, missing rs highest | coldeg degree | 0 1 . | Total ---------------+---------------------------------+---------- lt high school | 801 0 0 | 801 high school | 2,426 0 0 | 2,426 junior college | 273 0 0 | 273 bachelor | 0 750 0 | 750 graduate | 334 0 0 | 334 . | 0 0 14 | 14 ---------------+---------------------------------+---------- Total | 3,834 750 14 | 4,598 . * more examples of creating binary variables . use ordwarm2, clear (77 & 89 General Social Survey) . * example 1 - a single binary variable . gen ed12plus = (ed>=12) if ed<. . * example 2 - three indicator variables . gen edlt13 = (ed<=12) if ed<. . gen ed1316 = (ed>=13 & ed<=16) if ed<. . gen ed17plus = (ed>17) if ed<. . * example 3 - recode to a binary outcome . gen wrmagree = warm . recode wrmagree 1=0 2=0 3=1 4=1 (wrmagree: 2293 changes made) . tab wrmagree warm | Mom can have warm relations with child wrmagree | SD D A SA | Total -----------+--------------------------------------------+---------- 0 | 297 723 0 0 | 1,020 1 | 0 0 856 417 | 1,273 -----------+--------------------------------------------+---------- Total | 297 723 856 417 | 2,293 . . * nonlinear transformations . use gsskidvalue2, clear (1993 and 1994 General Social Survey) . gen agesq = age*age . gen lnincome = ln(income) (495 missing values generated) . sum age agesq income lnincome Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 4598 46.12375 17.33162 18 99 agesq | 4598 2427.72 1798.477 324 9801 income | 4103 34790.7 22387.45 1000 75000 lnincome | 4103 10.16331 .8852605 6.907755 11.22524 . . * interaction terms . gen feminc = female * income (495 missing values generated) . . // * Section 2.14.1: variable label . . * adding a label . label variable agesq "Age squared" . describe agesq storage display value variable name type format label variable label ------------------------------------------------------------------------------- agesq float %9.0g Age squared . . * dropping a label . label variable agesq . describe agesq storage display value variable name type format label variable label ------------------------------------------------------------------------------- agesq float %9.0g . . // * Section 2.14.2: value labels . . * defining labels . label define yesno 1 yes 0 no . label define posneg4 1 veryN 2 negative 3 positive 4 veryP . label define agree4 1 StrongA 2 Agree 3 Disagree 4 StrongD . label define agree5 1 StrongA 2 Agree 3 Neutral 4 Disagree 5 StrongD . * assigning labels . label values female yesno . label values black yesno . label values anykids yesno . describe female black anykids storage display value variable name type format label variable label ------------------------------------------------------------------------------- female byte %9.0g yesno Female black byte %9.0g yesno Black anykids byte %9.0g yesno R have any children? . tab anykids R have any | children? | Freq. Percent Cum. ------------+----------------------------------- no | 1,267 27.64 27.64 yes | 3,317 72.36 100.00 ------------+----------------------------------- Total | 4,584 100.00 . * defining and assigning labels . label define degree 0 "no_hs" 1 "hs" 2 "jun_col" 3 "bachelor" 4 "graduate" . label values degree degree . tab degree rs highest | degree | Freq. Percent Cum. ------------+----------------------------------- no_hs | 801 17.47 17.47 hs | 2,426 52.92 70.40 jun_col | 273 5.96 76.35 bachelor | 750 16.36 92.71 graduate | 334 7.29 100.00 ------------+----------------------------------- Total | 4,584 100.00 . . // * Section 2.14.3: notes . . * assign notes . notes: small General Social Survey extract for Stata book . notes income: self-reported family income, measured in dollars . notes income: refusals coded as missing . * list notes . notes _dta: 1. small General Social Survey extract for Stata book income: 1. self-reported family income, measured in dollars 2. refusals coded as missing . . // * Section 2.15: macros . . * global macros . use binlfp2, clear (Data from 1976 PSID-T Mroz) . global myopt = ", ce m nol ch nokey" . tab lfp wc $myopt Paid Labor | Wife College: 1=yes Force: | 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 257 68 | 325 | 34.13 9.03 | 43.16 -----------+----------------------+---------- 1 | 284 144 | 428 | 37.72 19.12 | 56.84 -----------+----------------------+---------- Total | 541 212 | 753 | 71.85 28.15 | 100.00 Pearson chi2(1) = 14.7804 Pr = 0.000 . tab lfp hc $myopt Paid Labor | Husband College: Force: | 1=yes 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 207 118 | 325 | 27.49 15.67 | 43.16 -----------+----------------------+---------- 1 | 251 177 | 428 | 33.33 23.51 | 56.84 -----------+----------------------+---------- Total | 458 295 | 753 | 60.82 39.18 | 100.00 Pearson chi2(1) = 1.9751 Pr = 0.160 . tab wc hc $myopt Wife | Husband College: College: | 1=yes 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 417 124 | 541 | 55.38 16.47 | 71.85 -----------+----------------------+---------- 1 | 41 171 | 212 | 5.44 22.71 | 28.15 -----------+----------------------+---------- Total | 458 295 | 753 | 60.82 39.18 | 100.00 Pearson chi2(1) = 213.1042 Pr = 0.000 . tab lfp wc, ce m nol ch nokey Paid Labor | Wife College: 1=yes Force: | 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 257 68 | 325 | 34.13 9.03 | 43.16 -----------+----------------------+---------- 1 | 284 144 | 428 | 37.72 19.12 | 56.84 -----------+----------------------+---------- Total | 541 212 | 753 | 71.85 28.15 | 100.00 Pearson chi2(1) = 14.7804 Pr = 0.000 . tab lfp hc, ce m nol ch nokey Paid Labor | Husband College: Force: | 1=yes 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 207 118 | 325 | 27.49 15.67 | 43.16 -----------+----------------------+---------- 1 | 251 177 | 428 | 33.33 23.51 | 56.84 -----------+----------------------+---------- Total | 458 295 | 753 | 60.82 39.18 | 100.00 Pearson chi2(1) = 1.9751 Pr = 0.160 . tab wc hc, ce m nol ch nokey Wife | Husband College: College: | 1=yes 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 417 124 | 541 | 55.38 16.47 | 71.85 -----------+----------------------+---------- 1 | 41 171 | 212 | 5.44 22.71 | 28.15 -----------+----------------------+---------- Total | 458 295 | 753 | 60.82 39.18 | 100.00 Pearson chi2(1) = 213.1042 Pr = 0.000 . * local macros . local myopt = ", ce m nol ch nokey" . tab lfp wc `myopt' Paid Labor | Wife College: 1=yes Force: | 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 257 68 | 325 | 34.13 9.03 | 43.16 -----------+----------------------+---------- 1 | 284 144 | 428 | 37.72 19.12 | 56.84 -----------+----------------------+---------- Total | 541 212 | 753 | 71.85 28.15 | 100.00 Pearson chi2(1) = 14.7804 Pr = 0.000 . tab lfp hc `myopt' Paid Labor | Husband College: Force: | 1=yes 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 207 118 | 325 | 27.49 15.67 | 43.16 -----------+----------------------+---------- 1 | 251 177 | 428 | 33.33 23.51 | 56.84 -----------+----------------------+---------- Total | 458 295 | 753 | 60.82 39.18 | 100.00 Pearson chi2(1) = 1.9751 Pr = 0.160 . tab wc hc `myopt' Wife | Husband College: College: | 1=yes 0=no 1=yes 0=no | 0 1 | Total -----------+----------------------+---------- 0 | 417 124 | 541 | 55.38 16.47 | 71.85 -----------+----------------------+---------- 1 | 41 171 | 212 | 5.44 22.71 | 28.15 -----------+----------------------+---------- Total | 458 295 | 753 | 60.82 39.18 | 100.00 Pearson chi2(1) = 213.1042 Pr = 0.000 . * macro functions . global wclabel : variable label wc . display "$wclabel" Wife College: 1=yes 0=no . . // * Section 2.16.1: the graph command . . use lfpgraph2, clear (Sample predictions to plot.) . list income kid0p1 kid1p1 +------------------------------+ | income kid0p1 kid1p1 | |------------------------------| 1. | 10 .7330963 .3887608 | 2. | 18 .6758616 .3256128 | 3. | 26 .6128353 .2682211 | 4. | 34 .54579 .2176799 | 5. | 42 .477042 .1743927 | |------------------------------| 6. | 50 .409153 .1381929 | 7. | 58 .3445598 .1085196 | 8. | 66 .285241 .0845925 | 9. | 74 .2325117 .065553 | 10. | 82 .18698 .0505621 | |------------------------------| 11. | 90 .1486378 .0388569 | +------------------------------+ . . * a simple graph . graph twoway scatter kid0p1 kid1p1 kid2p1 income . . graph export 02graphsimple.eps, replace (file 02graphsimple.eps written in .eps format) . . graph twoway (connected kid0p1 income) /// > (scatter kid1p1 kid2p1 income) . . graph export 02graphsimplec.eps, replace (file 02graphsimplec.eps written in .eps format) . . graph twoway (connected kid0p1 kid1p1 kid2p1 income), /// > ytitle("Probability") /// > title("Predicted Probability of Female LFP") /// > subtitle("(as predicted by logit model)") /// > xtitle("Family income, excluding wife's") /// > caption("Data from 1976 PSID-T Mroz") . graph export 02graphsimpled.eps, replace (file 02graphsimpled.eps written in .eps format) . . graph twoway (connected kid0p1 kid1p1 kid2p1 income), /// > xlabel(10 "minimum" 50 "median" 90 "maximum") . graph export 02graphsimpleg.eps, replace (file 02graphsimpleg.eps written in .eps format) . . graph twoway (connected kid0p1 kid1p1 kid2p1 income), /// > ytitle("Probability") /// > title("Predicted Probability of Female LFP") /// > subtitle("(as predicted by logit model)") /// > xtitle("Family income, excluding wife's") /// > caption("Data from 1976 PSID-T Mroz") /// > xlabel(10 20 30 40 50 60 70 80 90) /// > legend(symxsize(9)) name(graph1, replace) . graph export 02graphsimpleh.eps, replace (file 02graphsimpleh.eps written in .eps format) . . * make graphs for graph combine example . use ordwarm2, clear (77 & 89 General Social Survey) . ologit warm yr89 male white age ed prst, nolog Ordered logit estimates Number of obs = 2293 LR chi2(6) = 301.72 Prob > chi2 = 0.0000 Log likelihood = -2844.9123 Pseudo R2 = 0.0504 ------------------------------------------------------------------------------ warm | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- yr89 | .5239025 .0798988 6.56 0.000 .3673037 .6805013 male | -.7332997 .0784827 -9.34 0.000 -.8871229 -.5794766 white | -.3911595 .1183808 -3.30 0.001 -.6231815 -.1591374 age | -.0216655 .0024683 -8.78 0.000 -.0265032 -.0168278 ed | .0671728 .015975 4.20 0.000 .0358624 .0984831 prst | .0060727 .0032929 1.84 0.065 -.0003813 .0125267 -------------+---------------------------------------------------------------- _cut1 | -2.465362 .2389126 (Ancillary parameters) _cut2 | -.630904 .2333155 _cut3 | 1.261854 .2340179 ------------------------------------------------------------------------------ . prgen age, from(20) to(80) generate(w89) x(male=0 yr89=1) ncases(13) ologit: Predicted values as age varies from 20 to 80. yr89 male white age ed prst x= 1 0 .8765809 44.935456 12.218055 39.585259 . label var w89p1 "SD" . label var w89p2 "D" . label var w89p3 "A" . label var w89p4 "SA" . label var w89s1 "SD" . label var w89s2 "SD or D" . label var w89s3 "SD, D or A" . * step 1: graph predicted probabilities . graph twoway connected w89p1 w89p2 w89p3 w89p4 w89x, /// > title("Panel A: Predicted Probabilities") /// > xtitle("Age") xlabel(20(10)80) ylabel(0(.25).50) /// > yscale(noline) ylabel("") xline(44.93) /// > ytitle("") name(graph1, replace) . * step 2: graph cumulative probabilities . graph twoway connected w89s1 w89s2 w89s3 w89x, /// > title("Panel B: Cumulative Probabilities") /// > xtitle("Age") xlabel(20(10)80) ylabel(0(.25)1) /// > yscale(noline) ylabel("") xline(44.93) name(graph2, replace) /// > ytitle("") . * step 3: combine graphs . graph combine graph1 graph2, iscale(*.9) imargin(small) . graph display, xsize(8) ysize(4) . graph export 02graphssimplee.eps, replace (file 02graphssimplee.eps written in .eps format) . . graph combine graph1 graph2, iscale(*.9) imargin(small) /// > ysize(3.9) xsize(3.5405) col(1) . graph display, xsize(6) ysize(8) . graph export 02graphssimplef.eps, replace (file 02graphssimplef.eps written in .eps format) . . // * Section 2.17: tutorial - see st8ch2tutorial.do . . log close log: d:\spost.stata8\do\st8ch2.log log type: text closed on: 26 May 2003, 12:48:27 -------------------------------------------------------------------------------------------------------------