Bioinformatics is an interdisciplinary field that combines biology, computer science, mathematics and/or statistics to analyze and interpret biological data. Biology as a field is becoming more and more data intensive, and therefore bioinformatics literacy is a key skill to acquire during your PhD.
Here are some workshops and resources on bioinformatics training around campus:
UCD Bioinformatics Core: The UC Davis Bioinformatics Core offers a wide range of high-quality bioinformatics workshops. In these workshops, they cover topics such as RNA-seq analysis, scRNA-seq analysis and genome assembly. Usually, there is a fee associated with these courses so talk with your PI if you want to attend one of them. The complete list of workshops open for registration can be found here. Fortunately, if you cannot attend the workshops, but you are still interested in the topics covered, then you can access all the information via their GitHub webpage https://ucdavis-bioinformatics-training.github.io/.
DataLab: Former Data Science Initiative. The DataLab seeks to provide training, advice and collaboration to all researchers across UC Davis to facilitate the acquisition of technical skills in the managements and analysis of data. As such, they offer workshops covering topics such programming in R, Python and Amazon web services. Additionally, they host seminars where they invite speakers to share their research. They also provide research assistance in data analysis to all researchers on campus. You can sign up to their mailing list to get news about incoming activities.
DIBSI: The Data Intensive Biology Summer Institute is a two-week long training opportunity organized by the Data Intensive Biology Lab at UC Davis, usually taking place during July. During these weeks they cover R and Linux, as well as specific bioinformatics analysis, such as RNA-seq, scRNA-seq, and genome assembly. If you are interested in attending this workshop, you can join their mailing list.
Meet and Analyze Data: MAD is weekly meet up of data scientist in the field of biology, traditionally happening in the Bennett Conference room on the Veterinary Medicine campus from 3-5 pm on Wednesdays. People in need of help with their biological data can find answers within a group of volunteers available to help. During this meetups, they usually present at the beginning about a relevant topic in the field.
Davis R Users Group: D-RUG is a community of R-Users at UC Davis who support each other in using R for science and research. They usually meet weekly in the DataLab classroom (Shields Library 360) during the academic year. On each session, a different topic related to R is presented.
Data Carpentry: The Data Carpentry is an external organization (with headquarters in Davis) that develops and teaches workshops on the fundamental data skills needed to conduct research. In case you cannot attend in-person workshops, all their lessons are available online. You can check out their genomics lessons, which are very complete and informative.
BIS 180L: This is an undergraduate level Lab Course at UC Davis to introduce Genetics and Genomics majors to Bioinformatics. It covers all the major tools use in computational biology nowadays. Even if you don't take the class, the website includes al laboratories and plenty of resources.
Aggie Farm tutorials: website with interdisciplinary tutorial resources curated by UC Davis College of Biological Sciences. It includes R Studio, Python, statistics, machine learning, and genomics resources.
IGG courses: Our very own IGG includes training in bioinformatics. GG201B covers many topics in genomics computation, from theory to practice.
Working with genomics data in a cluster, usually requires knowledge of Linux/Unix systems (aka Shell scripting). Additionally, it is usually handy to have some knowledge on R or Python (or both!). If you are completely new to this field, The Biologist’s Guide to Computing is a great resource to study from A to Z. To keep sharpening your coding skills, here are some selected online tutorials tailored to the language of your choice.
Shell scripting is basically writing code so a Linux/Unix machine can understand you. This is extremely helpful since most genomics analyses are performed in Linux computing clusters. Here are two courses that will help you get into the fundamentals of shell scripting and the Linux console:
R is a language and environment for statistical computing and graphics. All IGG students will encounter R during their educational career, and nearly all students will use it at some point during research.
One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including a a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R has a user-friendly graphical interface called R-Studio, which can be downloaded here.
There are a number of ways students can gain proficiency in R. Here are some recommended resources:
Code Academy: https://www.codecademy.com/learn/learn-r
Software Carpentry: Software Carpentry, a Data Carpentry's sister organization, has this online tutorial covering the fundamentals of R. It is based in the analysis of a clinical dataset. If you learn by example, this can be a good course for you.
Hands-On Programming with R: If you are the kind of person that prefers to read a book to learn, this book will help you in your R journey. It covers the fundamentals of R from the very basics to speed up your R code. Here is a pdf version.
R for Data Scientists: This online book (also available as a printed version) is the essential book to learn some of the most powerful tools in R for data science, Tidyverse and Ggplot. If you want to level-up your R skills, this book is for you.
If you have not had an undergraduate statistics course, consider taking STA 100. If you want to go the next level, STA 106, 108 and 141 are recommended for intensive R instruction.
Python is a general purpose and high level programming language. It is widely used in multiple areas from scientific computing and machine learning to software engineering. Its main feature is its flexibility and wide range of applications. You don't need to know both, R and Python, but learning Python can really be helpful for your career, especially, if your work will be computationally intensive.
Here are some resources for learning Python:
Software Carpentry: Software Carpentry offers this online class covering the fundamentals of Python syntax and how to use it in a cluster. If you are starting your Python journey, you can start here.
Google Python Class: If you like more video tutorials, then Google got you covered. This is a free class for people with a little bit of programming experience who want to learn Python.
Analysis pipelines with Python: One of the most exciting applications of Python in bioinformatics is to use it to automate bioinformatics pipelines with Snakemake. This tutorial will cover the fundamentals of Python and how to use it with Snakemake to streamline workflows for data analysis.
Real Python: this website has plenty of tutorial in all-things Python, covering from introductory topics to fairly advanced ones. If you one to learn specific elements of the language, you can search in this website.
Markdown language: Markdown is a light-weight markup language which is basically plain-text that can be formatted by certain programs. It is relevant because tools like GitHub use them to format their READMEs. Also, tools like Jupyter notebooks and R markdown have similar syntax. To learn more about Markdown, you can take this online tutorial: https://www.markdowntutorial.com/
Version control with Git/GitHub: A good practice in bioinformatics and computational genomics is to version control your notebooks, scripts and workflows. This tutorial will help you learn more about this topic.
Machine learning: ML learning is finding a broad set of applications in genomics. If your goal is to dive deeper into bioinformatics and ML, then Google offers a great Crash Course to learn ML. If you are specifically interest in Deep Learning, then there is this free Audacity course to learn TensorFlow.