The data-intensive nature of the 21st-century biology makes it very important for scientists to have a basic proficiency in statistics. Whether it is thousands of gene expression levels as measured by a microarray, millions of polymorphisms that have been genotyped in a case-control study or more general questions of how to properly design an experiment, you will constantly be confronted with how to collect, analyze and interpret data. This course provides the key statistical concepts and methods necessary for extracting biological insights from data.
A common misconception about "doing" statistics is that it is useful only for analyzing data after an experiment has been performed. In fact, statistical methods are an integral part of designing experiments as well. How small of an effect size do you want to be able to detect? What sample size will you need? What is the power of your experiment?
In this ten week course, we will not be able to cover every specific topic that might arise in the course of your research. Thus, we will focus on rigorous understanding of fundamental concepts that will provide you with the tools necessary to address routine statistical analyses and the foundation to learn about more specialized topics.
Throughout this course, we will often make use of the freely available statistical software R (http://www.r-project.org). R has become one of the most widely used platforms for statistical analysis in genomics because it is powerful, free and capable of making publication-quality graphics. Problem sets will be much easier to complete using R, though you are free to use other tools if you prefer. We strongly recommend you take the time to learn R.
The primary objective of this course is to provide a strong foundation in fundamental statistical concepts, particularly as they relate to genomics, and thus to better prepare you for research. The key here is not "getting the right answer," but rather learning how to get the right answer.
Lectures and R tutorials are available here. There is no required textbook for this course.
A note about grading: the point of the assignments is not just to get the right answer. Many statistics problems come with choices: Which test should I choose? How should I best visualize these data? What distribution fits my sample? Many of these questions have no single "correct" answer. The point of the homework is to gain the skills to answer these questions on your own. Grading will reflect the learning process, not just the end result.
Feel free to work in groups; in fact, discussion of the homework assignments is encourage. However, each student must turn in his or her own assignment. Please don't copy homework; you won't learn anything.
The class meets twice a week on Mondays and Wednesdays from 9:00-10:20 AM in Foege S040.
Each class will last for 80 minutes and be primarily lecture based. We will interrupt lectures to work on problems in small groups, and will also work through analyses using R. Some classes will include approximately 20 minutes of interactive work with R designed around applying key concepts learned in class to real data. Therefore, students are urged to bring a laptop with R installed.