1. Here's a project I did, where I had to conduct an econometric analysis to identify demographic, political, or economic factors impacting respondents’ ratings of assessments on the agenda setting influence scale of 0-5. The project involved significant raw data manipulation prior to empirical analysis. The policy document and code is provided below. 

Policy document: click here 

Preliminary hypothesis testing - manual construction:

STATA code

*Main data file imported from CSV file to Stata

use "C:\Users\Sankalp\Dropbox\Job Applications\March 2016\William and Mary\data\data1.dta", clear

set more off


*Dropping empty ID numbers

drop if countryid == ""


*Method 1: Testing hypothesis only considering the respondents.


*Create For loop to test all 103 questions simultaenously. 


foreach x of var q31_* {

  egen mean_`x'=mean(`x')

 egen sd_`x'=sd(`x')

 egen count_`x'=count(`x')

 gen zstat_`x'=(mean_`x'-2.5)/(sd_`x'/sqrt(count_`x'))

 }

 

*Loop to check those questions, which fail to reject the null hypothesis to determine 


foreach x of var zstat_q31_*{

 gen sig_`x'="fail-to-reject" if `x' > -1.645

 }



*Method 2: Incorporating non-response weights

*For Loop for weighted sum:


foreach x of var q31_* {


 *Multiply weights with respondents

 gen w_`x'= `x'*(non_res_w_2)

 

 *Find sum of weighted respondents

 egen w_sum_`x'= sum(w_`x')

 

 *Separate the weights for each separate j

 gen nrs_`x'=.

 replace nrs_`x' = (non_res_w_2) if w_`x'!=.

 

 *Find the observations in each assessment

 

 *Sum the weights, which is the denominator

 egen s_nrs_`x' = sum(nrs_`x')

 

 *Find weighted mean 

 gen w_mean_`x'=w_sum_`x'/ s_nrs_`x'

 

 *Find b 

 gen b_n_`x'= (s_nrs_`x')^2

 egen b_d_`x'=sum((nrs_`x')^2)

 gen wn_`x'=b_n_`x'/b_d_`x'

 

 *Find standard error

 gen w_num_var_`x'=(nrs_`x'*(`x'-w_mean_`x')^2)

 egen w_n2_var_`x'= sum(w_num_var_`x')

 gen w_sde_`x'= sqrt(w_n2_var_`x'/(s_nrs_`x'-1))

 gen w_zstat_`x'=(w_mean_`x'-2.5)/(w_sde_`x'/sqrt(s_nrs_`x'))

 }

 

*Loop to check those questions, which fail to reject the null hypothesis to determine 

 

foreach x of var w_zstat_q31_*{

 gen w_sig_`x'="fail-to-reject" if `x' > -1.645

 }

 

*******************************************************************************

*Question 3:

*De-string Country ID

encode wbg_region, gen(wbgr)


*Combined mean of Q31 assessments for all IDs: (without item non-response weights)

egen q31mean = rowmean(q31_*)


*Generated weighted mean by each region

gen wght = (non_res_w_2)

gen w_t_i = wght*q31mean

bysort wbgr: egen w_sum = sum(w_t_i)

bysort wbgr: egen w_sum_nrs= sum(wght)

gen w_mean_r = w_sum/w_sum_nrs


*Find standard error

gen w_r_num_var=(wght*(q31mean-w_mean_r)^2)

bysort wbgr: egen w_r_n2_var=sum(w_r_num_var)

gen w_r_sde= sqrt(w_r_n2_var/(w_sum_nrs-1))


*Separate out the variables to perform individual tests

*Means

gen r_mean_ssa = .

replace r_mean_ssa = w_mean_r if wbgr == 6

egen r_mean_ssa1 = mean(r_mean_ssa) /*The point of taking mean here is that it populates all the cells.*/


gen r_mean_eap=.

replace r_mean_eap = w_mean_r if wbgr == 1

egen r_mean_eap1 = mean(r_mean_eap)


gen r_mean_sa=.

replace r_mean_sa = w_mean_r if wbgr == 5

egen r_mean_sa1 = mean(r_mean_sa)

Econometric analysis

encode(cname), gen(cname1)

gen pop = exp(ln_pop)

collapse (mean) polity2 (mean) gdp_pc (mean) pop (mean) cname1, by(ccode)

gen ln_pop = log(pop)


*Encode countryID, Stakeholder group, q1


encode countryid, gen(countryid2)

encode stakeholdergroup, gen(sh1)

encode q1, gen(q1e)

encode q9_v1, gen(q9v1)

encode wbg_region, gen(wbgr)


*Stakeholdergroup levels:

*CSO/NGO = 1, 

*Country-Expert = 2, 

*Development Partners = 3

*Host government = 4, 

*Private sector = 5


*Q1 levels:

*0-4 years = 1, 

*10-14 years = 2, 

*15-20 years = 3

*20 or more years = 4, 

*5-9 years 


*Q9_v1:

 *Agriculture and rural development = 1

 * Anti-corruption and transparency =2

 * Business regulatory environment=3

 *Customs=4

 * Decentralization=5

 * Democracy =6

 *Education =7

 * Energy and mining=8

 *  Environmental protection=9

 *  Family and gender =10

 *Finance, credit, and banking =11

 * Foreign policy=12

 * Health=13

 *I did not have a particular area of foc=14

 * Infrastructure =15

 * Investment=16

 * Justice and security=17

 *Labor =18

 * Land=19

 * Macroeconomic management =20

 *Public expenditure management =21

 * Social protection and welfare=22

 *Tax=23

 *  Trade =24

*********************************************************** 


 *Create dependent variable: mean of id responses.

gen wght = (non_res_w_2)

gen w_t_i = wght*q31mean

bysort countryid2: egen w_sum_country = sum(w_t_i)

bysort countryid2: egen w_sum_nrs_country= sum(wght)

gen w_mean_r_country = w_sum_country/w_sum_nrs_country


*Collapse survey data by country in order to merge with covariates data


collapse (mean) sh1 (mean) q1e (mean) q9v1 (mean) wbgr (mean) y_work (mean) w_mean_r_country, by(ccode)


merge 1:1 ccode using "C:\Users\Sankalp\Dropbox\Job Applications\March 2016\William and Mary\data\data2m.dta"

drop if sh1 == .

Rounding-off categorical variables


replace sh1 = round(sh1, 1)

replace q1e = round(q1e, 1)

replace q9v1 = round(q9v1, 1)

replace y_work = round(y_work, 1)


*Regression Model

gen ln_w_mean = log(w_mean_r_country)

gen ln_gdp = log(gdp_pc)

drop if wbgr!=1 & wbgr!=2 & wbgr!=3 & wbgr!=4 & wbgr!=5 & wbgr!=6 


*Linear Model with log outcome variable

xi: regress ln_w_mean polity2 ln_gdp ln_pop i.wbgr i.sh1 i.q1e i.q9v1 i.y_work, robust cluster(wbgr)