Course listing
ST 498-001, ST489-002 (Summer 2011): Statistical Genetics Practicum

CUSP: Computation for Undergraduates in Statistics Program

Alison Motsinger-Reif, PhD

Computing Instructor

David M. Reif, PhD

Teaching Assistants

Gina Pomann

Hao Wu


Meeting Times
(See Calendar page for exact hours, field trips, special events, etc.)

May 23 - June 3, 2011: Lecture M,T,W,Th,F @ 9am-12pm (20 Winston Hall), excluding Memorial Day
June 10- July 29, 2011: Lab F @ 9am-11am (20 Winston Hall)


Detecting association between genetic variants and common, complex diseases is a central goal of the
field of human genetics. Finding “the genes” that are associated with human conditions such as heart
disease, diabetes, etc., or that predict such important outcomes as adverse events as a response to
medication or dose requirements promises to advance medical care towards truly “personalized medicine”.
In discovering and understanding such genetic associations, more informed decisions on preventative and
interventional medicine can be made.

However, the explosion of genetic information, driven by rapid advances in genotyping technologies,
over the last decade presents an analytical bottleneck for genetic association studies. As the number
of genetic variables examined per individual increases, both variable selection and statistical modeling
tasks must be performed during this “gene mapping” analysis to identify “the genes” associated with
human traits. While these tasks could be performed separately, coupling them is necessary to select
meaningful variables that effectively model the data. This challenge is heightened due to the complex
nature of the traits under study, and the complex underlying genetic models that cause these traits.
For example, it is not unintuitive that a disease such as hypertension is due to a myriad of factors –
with a number known genetic, environmental and lifestyle factors that contribute to an individual’s
risk of the disease. Additionally, the mechanism of action of the genetic risk factors is not simple, as
the complex interactions and connections amongst biological pathways contribute to the trait.

The challenges presented by such data, with very large numbers of potential predictor variables, limited
sample sizes in clinical studies, and very complex underlying models, quickly diminishes the usefulness
of traditional, parametric statistical methods. To address these challenges, a number of novel
data-mining methods have been developed to detect and model such complex genetic associations.
Since it is unlikely that any one analytical method will be ideal in all situations, it is important to empirically evaluate the strengths and weaknesses of a variety of computational approaches in order to properly apply such methodologies to real data applications.

During the CUSP program, students will learn about computer-intensive data-mining tools for gene-mapping in human genetics, and explore the relative performance of these methods within research projects based on simulated data.  Student projects will focus on specific methodological questions, and perform comparative analysis to offer guidance and insight for the application of the methods.  Additionally, they will learn about real data studies in human genetics, and interact with groups that collect the type of data the methods are designed to be applied to.


The program runs for ten weeks  from May 23 through July 29, and will include introductions to key concepts in genetics, statistics, and bioinformatics, and introductions to and labs in R programming and Unix/Linux computing.  Students will be introduced to computational tools using local computing, as well as high-performance computing using NCSU’s supercomputing cluster.  Additionally, students will take field trips to see how genetic data is collected, as well as to super-computing facilities to see how high performing computing is made possible.  Students will work in teams of 2 on the research projects.