MRC DTP iCASE/ SCM DTP-MR

Core computational skills programme

2022 - 2023

Bioinformatics Training Facility

Contact:

Martin van Rongen

(Biostatistics Initiative Lead / Bioinformatics Training Facility, University of Cambridge)

mv372@cam.ac.uk

The Bioinformatics Training Facility offers a broad range of undergraduate and postgraduate hands-on training courses focused on bioinformatics, biostatistics and computational biology. These training activities aim at enabling life scientists to effectively handle and interpret biological data.

Overview

The MRC DTP iCASE/ SCM DTP-MR Core Computational Skills Programme is intended to provide a strong foundation in the use of data manipulation and visualisation, practical statistics using the R software environment, and to introduce techniques related to undertaking reproducible research.

After the programme you should feel confident

using R/RStudio to manipulate and visualise data,
selecting and implementing common statistical techniques using R, and moreover know when, and when not, to apply these techniques,
understanding the issues around reproducibility and replicability, designing reproducible experiments, and undertaking replicable analyses using R markdown and GitHub

To view the Programme Timetable, please visit the Delivery and Timetable page.

Description

The programme consists of four separate modules (detailed below). In all instances the underlying philosophy is to treat these concepts as practical skills that you should be able to use in your lab rather than theoretical subjects, and as such the courses focus on methods for addressing real-life problems within the life-sciences discipline.

Module 1: Introduction to R

This module consists of 2 full-day sessions.

This module is an introduction to R designed for participants with no programming experience. We will start from scratch by introducing how to begin programming in R and progress to learn how to read and write to files, manipulate data and visualise it by creating different plots - all the fundamental tasks you need to get you started analysing your data. We will be working with one of the most popular packages in R; tidyverse, which will allow you to manipulate your data effectively, and visualise it to a publication level standard

Module 2: Core Statistics

This module consists of 3 full-day sessions.

This module covers classical statistical analysis techniques, starting with simple hypothesis testing and building up to linear models and power analysis. The focus of the module is on practical implementation of these techniques and developing robust statistical analysis skills rather than on the underlying statistical theory.

After the module you should feel confident to be able to select and implement common statistical techniques using R, and moreover know when, and when not, to apply these techniques.

Module 3: Advanced Statistics

This module consists of 3 half-day sessions.

This module builds upon the core statistics block and introduces techniques relating to generalised linear models, resampling and permutation techniques as well as clustering and classification methods.

Module 4: Reproducible Research Skills

This module consists of 2 half-day sessions.

This module introduces concepts about reproducibility that can be used when you are programming in R. We will discuss issues around reproducibility as well as provide practical advice on data organisation and management. We will explore how to create notebooks - a way to integrate your R analyses into reports using Rmarkdown. The course also introduces the concept of version control. We will learn how to create a repository on GitHub and how to work together on the same project collaboratively without creating conflicting versions of files.

After the module you should be aware of the issues and causes of irreproducibility and have an awareness of the methodological approaches for ensuring that your work is reproducible. You should also know how to improve the replicability of your analyses.

Page updated

Report abuse