1. Here's a project I did, where I had to conduct an econometric analysis to identify demographic, political, or economic factors impacting respondents’ ratings of assessments on the agenda setting influence scale of 0-5. The project involved significant raw data manipulation prior to empirical analysis. The policy document and code is provided below.
Policy document: click here
Preliminary hypothesis testing - manual construction:
STATA code
*Main data file imported from CSV file to Stata
use "C:\Users\Sankalp\Dropbox\Job Applications\March 2016\William and Mary\data\data1.dta", clear
set more off
*Dropping empty ID numbers
drop if countryid == ""
*Method 1: Testing hypothesis only considering the respondents.
*Create For loop to test all 103 questions simultaenously.
foreach x of var q31_* {
egen mean_`x'=mean(`x')
egen sd_`x'=sd(`x')
egen count_`x'=count(`x')
gen zstat_`x'=(mean_`x'-2.5)/(sd_`x'/sqrt(count_`x'))
}
*Loop to check those questions, which fail to reject the null hypothesis to determine
foreach x of var zstat_q31_*{
gen sig_`x'="fail-to-reject" if `x' > -1.645
}
*Method 2: Incorporating non-response weights
*For Loop for weighted sum:
foreach x of var q31_* {
*Multiply weights with respondents
gen w_`x'= `x'*(non_res_w_2)
*Find sum of weighted respondents
egen w_sum_`x'= sum(w_`x')
*Separate the weights for each separate j
gen nrs_`x'=.
replace nrs_`x' = (non_res_w_2) if w_`x'!=.
*Find the observations in each assessment
*Sum the weights, which is the denominator
egen s_nrs_`x' = sum(nrs_`x')
*Find weighted mean
gen w_mean_`x'=w_sum_`x'/ s_nrs_`x'
*Find b
gen b_n_`x'= (s_nrs_`x')^2
egen b_d_`x'=sum((nrs_`x')^2)
gen wn_`x'=b_n_`x'/b_d_`x'
*Find standard error
gen w_num_var_`x'=(nrs_`x'*(`x'-w_mean_`x')^2)
egen w_n2_var_`x'= sum(w_num_var_`x')
gen w_sde_`x'= sqrt(w_n2_var_`x'/(s_nrs_`x'-1))
gen w_zstat_`x'=(w_mean_`x'-2.5)/(w_sde_`x'/sqrt(s_nrs_`x'))
}
*Loop to check those questions, which fail to reject the null hypothesis to determine
foreach x of var w_zstat_q31_*{
gen w_sig_`x'="fail-to-reject" if `x' > -1.645
}
*******************************************************************************
*Question 3:
*De-string Country ID
encode wbg_region, gen(wbgr)
*Combined mean of Q31 assessments for all IDs: (without item non-response weights)
egen q31mean = rowmean(q31_*)
*Generated weighted mean by each region
gen wght = (non_res_w_2)
gen w_t_i = wght*q31mean
bysort wbgr: egen w_sum = sum(w_t_i)
bysort wbgr: egen w_sum_nrs= sum(wght)
gen w_mean_r = w_sum/w_sum_nrs
*Find standard error
gen w_r_num_var=(wght*(q31mean-w_mean_r)^2)
bysort wbgr: egen w_r_n2_var=sum(w_r_num_var)
gen w_r_sde= sqrt(w_r_n2_var/(w_sum_nrs-1))
*Separate out the variables to perform individual tests
*Means
gen r_mean_ssa = .
replace r_mean_ssa = w_mean_r if wbgr == 6
egen r_mean_ssa1 = mean(r_mean_ssa) /*The point of taking mean here is that it populates all the cells.*/
gen r_mean_eap=.
replace r_mean_eap = w_mean_r if wbgr == 1
egen r_mean_eap1 = mean(r_mean_eap)
gen r_mean_sa=.
replace r_mean_sa = w_mean_r if wbgr == 5
egen r_mean_sa1 = mean(r_mean_sa)
Econometric analysis
encode(cname), gen(cname1)
gen pop = exp(ln_pop)
collapse (mean) polity2 (mean) gdp_pc (mean) pop (mean) cname1, by(ccode)
gen ln_pop = log(pop)
*Encode countryID, Stakeholder group, q1
encode countryid, gen(countryid2)
encode stakeholdergroup, gen(sh1)
encode q1, gen(q1e)
encode q9_v1, gen(q9v1)
encode wbg_region, gen(wbgr)
*Stakeholdergroup levels:
*CSO/NGO = 1,
*Country-Expert = 2,
*Development Partners = 3
*Host government = 4,
*Private sector = 5
*Q1 levels:
*0-4 years = 1,
*10-14 years = 2,
*15-20 years = 3
*20 or more years = 4,
*5-9 years
*Q9_v1:
*Agriculture and rural development = 1
* Anti-corruption and transparency =2
* Business regulatory environment=3
*Customs=4
* Decentralization=5
* Democracy =6
*Education =7
* Energy and mining=8
* Environmental protection=9
* Family and gender =10
*Finance, credit, and banking =11
* Foreign policy=12
* Health=13
*I did not have a particular area of foc=14
* Infrastructure =15
* Investment=16
* Justice and security=17
*Labor =18
* Land=19
* Macroeconomic management =20
*Public expenditure management =21
* Social protection and welfare=22
*Tax=23
* Trade =24
***********************************************************
*Create dependent variable: mean of id responses.
gen wght = (non_res_w_2)
gen w_t_i = wght*q31mean
bysort countryid2: egen w_sum_country = sum(w_t_i)
bysort countryid2: egen w_sum_nrs_country= sum(wght)
gen w_mean_r_country = w_sum_country/w_sum_nrs_country
*Collapse survey data by country in order to merge with covariates data
collapse (mean) sh1 (mean) q1e (mean) q9v1 (mean) wbgr (mean) y_work (mean) w_mean_r_country, by(ccode)
merge 1:1 ccode using "C:\Users\Sankalp\Dropbox\Job Applications\March 2016\William and Mary\data\data2m.dta"
drop if sh1 == .
Rounding-off categorical variables
replace sh1 = round(sh1, 1)
replace q1e = round(q1e, 1)
replace q9v1 = round(q9v1, 1)
replace y_work = round(y_work, 1)
*Regression Model
gen ln_w_mean = log(w_mean_r_country)
gen ln_gdp = log(gdp_pc)
drop if wbgr!=1 & wbgr!=2 & wbgr!=3 & wbgr!=4 & wbgr!=5 & wbgr!=6
*Linear Model with log outcome variable
xi: regress ln_w_mean polity2 ln_gdp ln_pop i.wbgr i.sh1 i.q1e i.q9v1 i.y_work, robust cluster(wbgr)