Projects

Finding Patterns in Protein Chip Data to Probe the Immunological Mechanisms of Nut Allergies (July 2016)

Peptide micro-arrays are an important tool used to investigate various biological reactions, such as the interactions between the immune system and food allergens. However, peptide micro-array data is very noisy and human labeling of bad spots is time-consuming and costly. In this paper, we utilize an approach from machine learning, Positive Unlabeled learning, to predict bad spots when only some bad spots, and no good spots, have been labeled. Our contribution is two-fold: firstly, we demonstrate the use of a distance metric induced by the random forest to identify points dissimilar to known bad spots, and then perform supervised learning to discriminate between known bad spots and predicted good spots. Secondly, we perform extensive feature engineering to create novel features that fully utilize the design of the micro-array. Our results with leave-one-out cross validation show that PU learning is able to predict bad spots given only labeled bad spots with 96% accuracy. The addition of engineered features further improves prediction accuracy by 2%.

Economic Globalization - A Cause or Solution to Africa's Poverty (May 2016)

This paper contributes to the on-going debate on the effect of economic globalization on developing countries. In particular, it addresses the relationship of trade liberalization on education and poverty levels of African countries. Confirmatory factor analysis is used to pick up the necessary indicators of poverty, education, health, among other latent variables. The relationship among these latent variables and economic globalization is then modeled with a structural equation. Results show that economic globalization has a direct and indirect effect through literacy level on the poverty level of an African country.

Geospatial Analysis of Population of Ghana (May 2015)

Ghana covers a total land mass of 238,535 sq km (92,099 sq mi) and had a population of about 24 million over 170 administrative districts. The analysis for the Ghana spatial data focuses on the population density (population divided by the area) of the districts. Thus the goal of this project is to investigate the spatial distribution of Ghanaians by districts given the geographical contiguity of the districts. Spatial autoregression and eigenvector spacial filtering will be employed to explain the spatial autocorrelation that exist between the population densities. Results show that there are spatial clusters of high population densities in the southern part of Ghana - specifically areas surrounding the capital city (Accra) - and areas around the central part, that is Kumasi. Also, there is large cluster of low population densities in the Northern and Upper West Regions of Ghana