Elijah Kipchumba - Coding Nerd

Once in a while, I put on my statistics hat - to collect and manage data.

Programming mobile data collection platforms with ODK and wrangling survey data with STATA

STATA: Regression Output

Coefficient plots (multiple models and multiple coefficients)

Often you find yourself working on a family of outcomes, multiple survey waves and multiple treatment arms. And more importantly, you would wish to extract (and present!) sub-group coefficient estimates for each of your outcomes.

Solution: xlincom + coefplot. Both are user-written packages.

a) Store your models but using outcome variable names and prefixes corresponding to the survey wave.

b) Use xlincom to extract coefficients combination of interest. Also, store them similar to (a)

c) Coefplot but using swapnames eqrename to rid prefixes in (b)

Complete code available here

STATA: Data Management

Task: Extract a rectangular dataset from 40,000 .txt files

Sometimes lab-in-the-field experiments (behavioural games) rely on custom-built applications that capture data in .txt format.

Each participant plays multiple rounds (say 3 rounds). Suppose for every play, the app records a distinct number of files (say 3 .txt files). And also suppose you have a large number of participants (say 4,800). Additionally, suppose these files have to be extracted daily due to device memory among other concerns. And finally, you have multiple labs running concurrently.

Solution: We have 3*3*4800 files to work with!

a) Be creative with folder structure - Nest the daily lab session files.

Folder structure: Lab `i' (subfolder) within Day `j'.

Extract .txt files to Lab `i' subfolder

b) Create a rectangular dataset for each Day `j'.

Start with an empty dataset for Day `j'

Loop and iteratively append .txt extracts. User-written command fs is handy in extracting file names within the subfolder

c) Append all the datasets in (b)

I have a more detailed explanation here

ODK: Lessons from the Field

Task: In a follow-up survey construct Plot-level Panel nested within Household Panel

Land rental markets can be fluid with cultivation rights varying between cropping seasons. A non-trivial survey challenge: the plots a farmer cultivates vary between survey waves. Additionally, we needed to understand the cultivation rights of our pre-existing plots while updating farmer's plots to capture productivity at the time of the survey.

Solution

a) Create and pre-load two csv files - one to identify the farmer and the other for pre-existing plots.

b) Use pulldata function to loop over a farmer's pre-existing plots and determine whether the farmer still has cultivation rights to each one of them.

c) Establish number of new plots that the farmer has cultivation rights.

d) Loop over the union of (b) and (c) to capture productivity, filtering out plots the farmer no-longer has rights over them.

Truncated solution here

Caveat: I assume intermediate knowledge of ODK (including pulldata function). Introductory guide here.

Page updated

Google Sites

Report abuse