R is a well-established, open-source collection of statistical software. R is widely used by statisticians and data scientists in both academia and industry.
R itself should be obtained from the CRAN repository Comprehensive R Archive Network. Just select your platform (Windows/Linux/MacOS), download and install.
RStudio is a highly recommended, integrated development environment that can be installed after installing R itself.
tidyverse is "an opinionated collection of R packages designed for data science, sharing an underlying design philosophy, grammar, and data structures". It includes the excellent graphics library ggplot2.
caret (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. caret loads packages as needed (including ggplot2) and assumes that they are installed. If a package is missing, there is a prompt to install it.
Learning R: introduction to R (from official R site), machine learning in R (excellent pedagogical site of J. Brownlee), R for epidemiology and public health (this handbook can be downloaded)
Books: Kuhn & Johnson, Applied Predictive Modeling, Springer , 2013. (all data, figures available)
Octave is an open source version of Matlab.
Download from the official site. Can also be run from within a Jupyter notebook (see below), using
⇒ pip install octave kernel
⇒ jupyter -> New -> Octave
The official documentation is here. And can be downloaded - warning: it's over 1000 pages long...
Python is a large collection of librairies and modules for scientific computations and machine learning. In this course we will be using a number of libraries for scientific computations and machine learning. There is no need to learn them beforehand, since we will "learn by doing", i.e. by coding (many) simple (and not so simple) examples.
Install Miniconda, a mini version of Anaconda that includes only conda and its dependencies.
If you prefer to have conda plus over 7,500 open-source packages, install Anaconda.
Follow the instructions for Windows, macOS, Linux.
We will install additional packages as needed:
scipy
numpy
matplotlib, seaborn
scikit-learn
pytorch (see below)
...
Learning Python: Python Numerical Methods (from UC Berkeley), Scipy lecture notes (learn numerics, science, and data with Python) machine learning in python (excellent pedagogical site of J. Brownlee), learn PyTorch.
Books: Linge & Langtangen, Programming for Computations, Springer, 2020. (free download)
PyTorch is an END-TO-END machine learning framework that enables fast, flexible experimentation and efficient production through a user-friendly front-end, distributed training, and ecosystem of tools and libraries.
get started : https://pytorch.org/get-started/locally/
run some basic tutorials (these will be further detailed in the framework of the Advanced Lectures series)
Just follow the instructions for your platform.
Learning PyTorch: learn PyTorch.
Books: