Course Schedule

Schedule.pdf

*   Day 1: Introduction to general computing tools 

Day 1 of the course will be an introduction to general computing tools, such as the Unix command line environment. We will go through bash commands (less, nano, ls, ll, wc, |, tail, head, mkdir, cat, grep), regular expressions, basic scripting, and running python scripts from the Unix shell with a series of examples. Exercises and assignments will be based on the “Learning Unix” Github repository. There will also be a presentation of useful bioinformatics software.

Part of day 1 will also be concerned with working on a remote server, using the Swedish National Computing Infrastructure's UPPMAX cluster as a training tool. Students will be provided with guest accounts.  This portion of the course is to refresh the students’ knowledge of the command line environment and the shell, a tool for interacting with the computer through typed instructions at the command line. Exercise sessions will be carried out in pairs to encourage collaborative problem solving. All lectures will be made dynamic through live demonstrations of the command line. Detailed course material including commands and scripts will be available through the course web page.  This part of the course corresponds to learning outcome 2a: "Ability to use basic commands in the Unix command line environment" (reformatting data with regular expressions, basic scripting, running python scripts from the Unix shell)


*   First half of Day 2: Quality control of short-read data

This section consists of a short lecture about the typical analysis steps of population genomic data sets, after which we will begin the analysis with a hands-on exercise on quality-control of short-read Illumina data.


*   Second half of Day 2 - First half of Day 3: Introduction to Genome assembly

We will discuss the differences between short- and long-read DNA/RNA sequencing approaches, and how they are applied to solve different assembly problems. In the afternoon Day 2 practical you will work on assembling the herring genome (which will run overnight) and explore the effect of different settings on the assembly result. The assembly will then be evaluated in the morning of Day 3.


*    Second half of Day 3 : Mapping and variant calling 

 This part of the course focuses on mapping short-read data to an assembled genome (the herring genome) and examining the results in order to investigate structural variants. The will also be a focus on learning to interpret and extract data from the SAM/BAM and vcf file formats.

 

*      Day 4: Seascape analysis of genotyping data using R.

Format: Lecture and live demonstrations of software. Students will follow instructions analyzing data using R on their own computers.

Topics covered:

- exploring and displaying population structure on a map

- finding genotype-environment associations

- mapping divergent adaptation


 *      Day 5: Class project: students will perform analysis of a new genotyping dataset and a new gene expression dataset on their own, using skills learned in the class. 

 This part of the course corresponds to learning outcome 2b: "Ability to use different software tools to analyse sequence data from restriction-site digested DNA (data cleaning steps, clustering of reads, mapping to reference genomes, extracting and filtering genotype data, population genomic analysis".


---------------------------------------------


Before the course: Students will need to bring their own computers, with software to connect to a remote server pre-installed, as well as a Text Editor and R.


Recommended software:


For PC: mobaXTerm:    http://mobaxterm.mobatek.net/

                 Notepad++:    http://notepad-plus-plus.org/

                 R                        http://www.r-project.org/


For MAC: CyberDuck:     http://cyberduck.en.softonic.com/mac

                 TextWrangler: http://www.barebones.com/products/textwrangler/

                 R                         http://www.r-project.org/