// Task 1 routine setup capture log close set more off estimates clear log using cda04c-ex4done.log, replace text version 9.2 set scheme s2mono // pgm: cda04c-ex4done.do // task: 4 - Binary Regression - Excercise // project: CDA Lab Guide // author: your name // date: today's date use science3, clear // Task 2: Keep jobclass, female, enrol and phd, Examine the variables // drop missing cases. // Step 2.1 keep the four variables and explore them. keep jobclass female enrol phd desc sum tab1 jobclass female enrol phd, missing // or: tab1 _all , missing // Step 2.2 drop missing cases drop if jobclass>=. | female>=. | enrol>=. | phd>=. // Step 2.3 verify the last step. // different ways to verify tab1 _all, missing sum vardesc , style(missing) // Task 3: Create a binary dependent variable hijob from jobclass. // // hijob=1 if jobclass>2, else 0. Missing values remain missing. // // Verify your transformation. Add variable and values labels. // Step 3.1 Create a binary dependent variable hijob, from jobclass. gen hijob = (jobclass>2) if jobclass<. // Step 3.2 Add variable and values labels. label var hijob "Is job high prestige?" label define hijobfmt 0 "0_low prestige job" 1 "1_high prestige job" label values hijob hijobfmt // Step 3.3 verify variable creation tab hijob jobclass, m // Task 4: Estimate a probit of hijob on female, enrol and phd. // List coefficients and compute predict probabilities for each observation. // Label the variables created by predict. // Step 4.1 Estimate the probit model. probit hijob female enrol phd // Step 4.2 List coefficients. listcoef, help // Step 4.3 Predict probabilities. predict prprobit // Step 4.4 Label the variable created by predict. label var prprobit "Probit: Predicted Probability" // Task 5: Do the same with a logit model. // Step 5.1 Estimate the logit model. logit hijob female enrol phd // List coefficients. listcoef, help // Step 5.3 Predict probabilities. predict prlogit // Step 5.4 Label the variable created by predict label var prlogit "Logit: Predicted Probability" // Task 6: Compare logit and probit predictions with a graph. twoway scatter prprobit prlogit, msymbol(O) xlabel(0(.1)1,grid) /// ylabel(0(.1)1,grid) ysize(1) xsize(1) graph export cda04c-ex4-fig1.emf, replace // Task 7: Interpret the logit regression logit // you don't have to rerun it // Task 7a: compute predicted probability with variables at their means prvalue, rest(mean) // Task 7b: predicted probability for males with other variables at mean prvalue, x(female=0) rest(mean) // Task 7c: for females prvalue, x(female=1) rest(mean) // Task 7d: differences for males and females prvalue, x(female=0) rest(mean) save label(male) prvalue, x(female=1) rest(mean) diff label(female) // Task 7e: discrete and marginal change prchange, rest(mean) // Task 7f: discrete and marginal change for females with other // variables at the mean. prchange, x(female=1) rest(mean) // Task 7g: compute predictions as phd varies, first for female=1 // with other variables held at mean; then for males. prgen phd, from(1) to(5) generate(women) x(female=1) rest(mean) label var womenp1 "Pr(HiJob=1|Women)" prgen phd, from(1) to(5) generate(men) x(female=0) rest(mean) label var menp1 "Pr(HiJob=1|Men)" scatter womenp1 womenx, c(l) ytitle("Pr(HiJob=1)") /// || scatter menp1 womenx, c(l) graph export cda04c-ex4-fig2.emf, replace // Or, using a single scatter command. scatter womenp1 menp1 womenx, connect(l l) yt("Pr(HiJob=1)") graph export cda04c-ex4-fig3.emf, replace // Task 8: [Optional] Compare the logit and probit coefficients. // a) Why are the unstandardized coefficients so different? // b) How different are they? // c) Why are the standardized coefficients similar? // d) Why aren't they exactly the same? /* Type your answer here a) Because the logit and probit models assume different Var(e). b) By a ratio of approximately 1.6. c) Because the difference in variances is removed by standardization. d) Because the logistic and normal distributions differ slightly. */ // Task 9: [Optional] Use foreach to make transformations. // // Example: There are 3 variables (jobclass, felclass, phdclass) // that have similar coding. Suppose that we want to turn all // of these variables into dummy variables with 3 and 4 recoded to 1 // and 1 and 2 recoded to 0. Then run a logit regressions. use science3, clear // Method 1: Without looping * Generate label define hilofmt 1 "1_hi prestige" 0 "0_lo prestige" gen hijob=(jobclass>2) if jobclass<. label variable hijob "jobclass > 2" label values hijob hilofmt gen hifel=(felclass>2) if felclass<. label variable hifel "felclass > 2" label values hifel hilofmt gen hiphd=(phdclass>2) if phdclass<. label variable hiphd "phdclass > 2" label values hiphd hilofmt * Verify tab jobclass hijob , m tab felclass hifel , m tab phdclass hiphd , m * Logit logit hijob female predict prhijob, p logit hifel female predict prhifel, p logit hiphd female predict prhiphd, p // Method 2: With looping using the foreach command use science3, clear label define hilofmt 1 "1_hi prestige" 0 "0_lo prestige" foreach name in job fel phd { gen hi`name' = (`name'class > 2) if `name'class<. label variable hi`name' "`name'class > 2" label values hi`name' hilofmt tab `name'class hi`name' , m logit hi`name' female predict prhi`name', p } log close exit