Some code snippets that I developed as part of my research, and that other people may find useful.

Fast nearest neighbours in Stata (github, do)

This is an implementation of a kd-tree in Stata's matrix programming language Mata. It facilitates the fast computation of the k nearest neighbours of a set of points according to the Euclidean distance metric. This algorithm is orders of magnitudes faster than a brute-force computation if the data are sufficiently large (i.e., thousands of data points). The function can be used e.g. to compute the average of some variable among an observation's nearest neighbours. It is only thoroughly tested for the two-dimensional case.

Syntax:

void knn(real matrix query_coords, real matrix data_coords,  real scalar k, 
         real matrix kni, real matrix knd)

where

query_coords is an M x n matrix of M n-dimensional query points,

data_coords is an N x n matrix of N n-dimensional points to query against (could be the same as query_coords),

k is the number of nearest neighbours (including self) to be computed,

kni is the M x k matrix of nearest neighbour indices for each of the M query points that will be computed, and

knd is the M x k matrix of nearest neighbour distances

Example usage:

version 15.1
mata: mata clear
mata: mata set matastrict on
run "https://raw.githubusercontent.com/robertaue/knearest/master/mata_knn.do"
mata:
    N = 10000
    k = 5
    query_coords = runiform(N,2)
    data_coords = runiform(N,2)
    knn(query_coords, data_coords, k, kni=., knd=.)
end

stabest: Two-sided estimation of preferences in school markets (code)

This is an R package developed with Thilo Klein, and described in our joint paper with Josue Ortega. Its aim is to estimate students' preferences over schools, and vice versa, in a school choice market when rank order lists are submitted strategically. The method provides a way to estimate preferences under different identifying assumptions. It can be installed from Github (requires R build tools):

library(devtools)
devtools::install_github("robertaue/stabest", build_vignettes = TRUE, upgrade=FALSE)

Then, you can use it to estimate students' and schools' preferences jointly, under the identifying assumptions of stability and undominated strategies:

library(stabest)
fit <- stabest(choice_rk~-1 + distance + stu_score:sch_mscore + sch_FE,
               sch_rk_observed~stu_score-1,
               data=schoolmarket200, nSeats=schoolmarket200$sch_capacity[schoolmarket200$stu_id==1],
               student.id='stu_id',college.id='sch_id',match.id='sch_assignment',
               niter=20000, burnin=1000, thin=10
)
summary(fit)

For more details, read our joint paper, or see the package vignette: vignette("stabest-vignette")