This module was a student-led initiative created by Samira Adus (MD student, University of Toronto), Tara Upshaw (MD student, University of Calgary), and Jillian Macklin (MD/PhD student, University of Toronto).
Git is a popular version control system, meaning that you can create, store and track changes in a set of files on your computer. GitHub is the online version that allows you to manage your work remotely. GitHub is useful when conducting analysis (documenting your code and any changes as you go) and for collaborative data projects (multiple users can work on code and automatically merge any changes). A few terms to know:
Repository (repo for short) - essentially a project directory, containing all files, folders and content, including previous versions, modifications to code, deletions, etc. Repos can be shared and used by others.
Branch - a version of the main repository. Branches can be used as a new repo or to test changes to code.
Fork - this action creates a copy of the repository.
Pull Request - a user can initiate a pull request to the repository owner if they have changed code in a separate branch. The owner reviews these changes and can accept (merge) the changes to the main repo.
Commit - this saves the project or code at that specific time point
Merge - this action takes changes from one branch and adds them to the main repository (or another branch).
To set up Git on a Mac:
Open Terminal.
For OS 10.9 or higher, type the following command: git --version
Follow the prompts if you do not have it installed.
Alternatively, you can install using the Homebrew package (https://git-scm.com/download/mac).
To set up Git on Windows:
Download here: https://git-scm.com/download/win
After you install Git on your computer, create an online GitHub account HERE.
Beginner Resources
Python is one of the most popular programming languages. It is general purpose, meaning it can be used for many different tasks (e.g. analysis, data science, website building, machine learning, software development, data visualization, etc.). Python is also relatively beginner friendly and open source (free), with a variety of user-developed packages to help with all types of tasks.
Python can be downloaded to your computer here: https://www.python.org/downloads/
Beginner courses & resources
R is another free and commonly used programming language for statistical computing and graphics/visualizations. RStudio is a user-friendly interface that uses the R language within the workspace environment. R is open source, with a huge library of user-developed packages available for almost every task; however, it can be slightly more difficult to learn at first.
To install R on your computer (Mac or Windows), click on the relevant link at the top of the page: https://cran.r-project.org
Beginner courses & resources
Additional Resources
Stack Overflow represents a community of developers learning and sharing knowledge. If you're starting a new coding language or stuck on an analytic problem, the answer will likely be here! Sign up for a free account to post questions, comment and save your favourite tags and filters.
One of the largest open online learning platforms, Coursera partners with universities around the world to offer short courses, certificates, and degrees. They have an extensive data science catalog, including a variety of Bachelor and Master degrees majoring in data and computer science. Many individual courses are free, but certificates and degrees require a monthly fee (financial aid is available for current students).
An easy, online way to learn Python, R, SQL, and AI methods for all types of skill levels. Free access is limited and a monthly fee is required for more course content (approximately $30/month for the Standard subscription).
There's no better way to learn coding skills than by working on real problems. Kaggle offers short tutorials with practice exercises, open datasets, and competitions for all skill levels - some even offer cash prizes! Both Python and R can be used.
This is a 7-week summer training program for a small cohort of individuals identifying as women across Canada who are interested in developing skills in AI. Currently, the Lab is held in Edmonton and Montreal (potentially with a virtual option), with expansion continuing across Canada.
Applications for the next cohort will open in January 2022, with the Lab running in the spring of 2022.
A basic knowledge of linear algebra is necessary for data science, especially in machine learning and deep learning. While you don't need to be a mathematician to learn data science, linear algebra can help deepen an understanding of how machine learning algorithms work.
Beginner courses & resources
Every Data Scientist Should Know the Basics of Linear Algebra, written by Maurizio Sluijmers
Analytics Vidhya: A Comprehensive Beginners Guide to Linear Algebra for Data Scientists
YouTube video series by 3Blue1Brown