Next-generation sequencing (NGS) is an emerging technology to determine DNA/RNA sequences for whole genome or specific regions of interest at a much lower cost than traditional Sanger sequencing. Combined with other technologies such as RNA extraction (RNA-Seq), enrichment for exome (Exome-seq) or other genomic regions of interest, chromatin immuno-precipitation (ChIP-Seq), and bisulfate conversion (BS-seq), NGS can provide rich information about genetic variants, transcriptome dynamics, transcription factor binding profile, epigenetic modifications, and other information. The applications of NGS are rapidly expanding, and this calls for efficient and creative data storage, analysis, and visualization methods. We are actively involved in data analysis for a broad range of NGS applications, and have mature analysis pipelines for RNA-Seq data, detection of rare variants, and ChIP-Seq data. We routinely use in-house programs, as well as multiple commercial and open-source tools for different steps of the NGS data analysis, from base calling, sequence alignment, to downstream statistical analysis to suit various experimental designs. Moreover, we are devoted to developing novel and useful statistical tools for NGS data analysis. We carefully examine possible sources of abnormalities in data processing and searching for ways to overcome inherent bias in NGS data analysis. This course is aimed at experimental or bench-based researchers working in the molecular life sciences who have little or no previous experience of NGS analysis. An undergraduate knowledge of a subject related to the life sciences would be an advantage.
1. Basics of Scripting language
2. Unix & High-Performance Computing
3. Basic Linux Command lines and their advantages
4. Basics of R and Bioconductor Programming
5. Data Manipulation in R Basics of R and Bioconductor Programming
6. Usage of important bioinformatics toolkits
7. Programming Using data visualization and interpretation
8. Case studies: Example data
1. Introduction to sequencing technologies from a data analysts view
2. Comparison of sequencing platforms
3. Sequencing library preparation
4. Sequence file formats
5. Sequencing data analysis pipeline development
6. Evaluation of sequencing platforms and report generation
7. Case studies: Example Data
Module: 3 RNA-Seq/ Transcriptome-Seq Data Analysis (12 hours)
1. Introduction to RNA/Transcriptome Sequencing
2. Sequence data resources and Raw sequence files (FASTQ format)
3. Preprocessing of raw reads: quality control (FastQC), adapter clipping, quality trimming
4. Introduction to read mapping (Alignment methods, Mapping heuristics)
5. Understand split-read mapping (TopHat, STAR)
6. Mapping output (SAM/BAM format)
7. Mapping statistics, Visualization of mapped reads (IGV, UCSC)
8. Understand the Tuxedo Suite (Cufflinks, Cuffcompare, Cuffmerge, Cuffdiff, etc.)
9. Understand the statistics behind DEseq2 and DIEGO
10 Quantify exons/genes/transcripts
11. Predict
Differential splicing using DIEGO
Differential gene expression using DEseq2
Differential isoform expression using cuffdiff
12. Create extensive diagnostic graphics with R
13. Apply your new skills by working on challenging exercises
14. Case Studies: Real Data
Module: 4 Whole Exome Sequencing Data Analysis (12 hours)
1. Understanding Exome Sequencing and Data generation
2. Sequence Alignment
3. Alignment of reads using reference Genome (BWA/Bowtie)
4. Understanding Mapping Output (SAM/BAM, SAMtools & Bedtools)
5. Variant detection using GATK & SAMtools
6. Visualization of variation with IGV
7. Complete annotation and variant effect prediction (SnpEff, SNPDB etc.)
8. Predict the effects of coding non-synonymous variants on protein function using the SIFT algorithm
9. Case studies: Real data
UG & PG Students: Rs: 6000/-
PhD Scholars: Rs: 8000/-
Professors & Teaching Professionals: Rs: 10000/-
Industry Professionals: Rs: 12000/-
Registration form: https://forms.gle/gkb4KKY8WVMj42BE8