Project Archive

Rapid statistical classification on the Medline database of biomedical literature

My Masters Thesis was on Rapid statistical classification on the Medline database of biomedical literature in July 2008.

My studies took place University of Cape Town under Prof. Cathal Seoighe, now at NUI Galway, with some Stanford coursework under the Stanford-South Africa Biomedical Informatics (SSABMI) partnership.

I developed a method for classifying Medline records as relevant or irrelevant to a subject defined by a sample of records, for example the citations in a bibliography. It trained a Naive Bayes classifier using provided records as the positive class, and the remainder of Medline's then-16 million records for the background negative class. By using compressed representations, mathematical rearrangement of Bayes theorem and the precomputed distribution of terms in Medline, the method could classify all 16 million Medline records in under a minute on a single thread. Distributing the process would have accelerated it further.

I also developed a novel improvement to the Laplace smoothing method of estimating feature frequencies for training a Naive Bayes classifier, which I call Split-Laplace Smoothing. This method unbiases the estimator, obtaining unbiased estimates given even extremely biased training data - such as a handful of positive examples and millions of negative examples.

The MScanner web application was hosted on a now-decommissioned web server at Stanford, but the code is still available at https://github.com/gpoulter/mscanner, albeit in need of some work to run with modern versions of Pythons and dependent libraries.

Publications

G.Poulter - 2008 - M.Sc. Thesis - Rapid Statistical Classification on Medline.pdf

Impact Forces in Injection Molding

My 2005 Applied Mathematics Honours research project was "Impact Forces in Injection Molding" , supervised by Prof. Tim Myers.

I used analytic and numerical methods to investigate the forces in an injection molding system when there is insufficient material in the mold to slow the piston before it impacts the housing. Finite element methods fail to account for the behaviour of the squeeze film of lubricant that forms between the piston and flang, which tends to reduce the maximum force exerted. I started with a basic model of the lubricant as a Newtonian fluid, which I expanded to include a non-Newtonian component in which the fluid viscosity increases exponentially with pressure. Finally, we also included a first-order model of the deformation of the piston surface due to elastohydrodynamic lubrication and found that it reduces the maximum load during impact.

The most difficult part was the first-order deformation model which suffers from numerical instability. The system is riding close to an asymptote at which pressure goes to infinity, and the gradient is so steep that even the tiniest step by the numerical integration can send the system over the asymptote (and into the complex plane).

Looking back from 2017, I wish I'd done some kind of experimental verification: although applied mathematics deals with the real world, hence "applied", it remains a theoretical exercise. If any experiment is to happen, it is left as an exercise for the physicists and engineers.

G.Poulter - 2005 - Applied Maths Honours Project - Impact Forces in Injection Molding.pdf

Modelling Infectious Diseases on Networks

My final-year Applied Mathematics research project in 2004 was "Modelling Infectious Diseases on Networks" supervised by Dr Gareth Witten. Two years later my supervisor later published the same text as "Simulations of infectious diseases on networks" in the journal Computers in Biology and Medicine.

I reviewed developments in modelling epidemics using networks of infectious contacts then implemented fully-mixed susceptible-Infective-Removed (SIR), in continuous and stochastic differential equation forms, followed by SIR simulated epidemics on contact networks of various structures. The nodes of the networks represent individuals, and edges between individuals represent regular potentially infective contacts. Percolation models test the parameter regimes in which an outbreak would turn into an epidemic.

Witten G, Poulter GL: Simulations of infectious diseases on networks. Computers in Biology and Medicine 2006, 37:195-205 - publication of my 2004 project using Python to model epidemics over networks of infectious contacts.

G.Poulter - 2006 - Third-Year Project Published - Simulations of Infectious Diseases on Networks.pdf
G.Poulter - 2004 - Applied Maths Final Year Project - Modelling Infectious Diseases on Networks.pdf