Resources

Code & Data

Code - Gitlab Team

Data - Box Folder

Flux

MDST has shared computing resources available from the University's HPC cluster, called Flux. All MDST members have access to these resources by request. To gain access, follow the steps in the "Getting Access to Flux" document.

Getting Access to Flux

Flux Policy Notebook

Data Science Resource List

All resources here were recommended by MDST members and include testimonials! To add your own, fill out this google form.


Webpages

How to Learn Machine Learning

Covers: Learning how to learn ML!

Source: http://karlrosaen.com/ml/

Endorsed by: @thealex -- “This is from a longtime developer who read "The Master Algorithm" and caught the ML bug. He put is VP of Tech / Product Dev / Software Engineering life on hold and took the summer off to study machine learning. He maintained a learning log during his study and ultimately got a job as a Research Engineer in the Ford Autonomous Vehicles Lab. Former MDST regular.”

Distill

Covers: Theory presented in a very approachable way.

Source: http://distill.pub/

Endorsed by: @thealex -- “Incredibly high standards for clarity. Once you know enough about ML to learn fringe concepts, working through these pages can be both enjoyable and enlightening.” @stroud -- "Interactive figures are super useful and intuitive.”

Understanding LSTM Networks

Covers: LSTMs and RNNs

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Endorsed by: stroud -- “Best tutorial on LSTMs I've ever seen. Really cool figures that get the point across well.”

Neural Network Playground

Covers: Neural Networks

Source: http://playground.tensorflow.org/

Endorsed by: stroud -- “So much fun”

An Overview of Gradient Descent Optimization Algorithms

Covers: SGD et al.

Source: http://ruder.io/optimizing-gradient-descent/

Endorsed by: samtenka -- “Nesterov, Adam, RMSProp... what a mess! This unsystematic but insightful comparison helps us master the menagerie of gradient-based optimizers. It's often more clear than then corresponding papers, too. ”

MSAIL

Covers: MSAIL---michigan ML club

Source: http://msail.github.io

Endorsed by: samtenka -- “Website is terrible. Ugly. Outdated. But the club's pretty fun.”


MOOCs

Applied Data Science with Python Specialization

Covers: Data science, Python, pandas, machine learning, social network analysis, natural language processing

Source: https://www.coursera.org/specializations/data-science-python

Endorsed by: jpgard -- “This is a great intermediate introduction to both Python and its use for solving applied data science problems. Taught by UM professors, this specialization has a fairly high bar but features high-quality video and engaging interactive programming and end-of-course assignments that will push you to fully develop your data science skills.”

Machine Learning

Covers: Basic machine learning topics.

Source: https://www.coursera.org/learn/machine-learning

Endorsed by: pgad -- “Andrew Ng teaches it very well and the MOOC comes with programming exercises. A great starter course.”


Tutorial/Coding Demos

CUDA C/C++ Basics

Covers: CUDA

Source: Link

Endorsed by: samtenka -- “Get your hands dirty with low-level GPU computing as quickly as possible by following these slides! I thought they were fun, and so can you.”

The Unreasonable Effectiveness of Recurrent Neural Networks

Covers: RNNs

Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Endorsed by: samtenka -- “Turned me on to Recurrent Neural Nets.”

jaredaw -- “Karpathy is an expert on recurrent neural networks and put a lot of time into this explanation of them. The visualizations and examples are simple and effective. This post really helped me understand RNNs.”

Computational Statistics in Python

Covers: Computational statistics

Source: https://people.duke.edu/~ccc14/sta-663/index.html

Endorsed by: xinyutan -- “A very nice introduction to some "advanced" topics which are rarely seen introduced at such an introductory level.”

Probabilistic Programming and Bayesian Methods for Hackers

Covers: Bayesian techniques explored through iPython Notebooks

Source: Github

Endorsed by: thealex -- “It's a set of iPython Notebooks! Download them before a long flight and browse at your leisure. I found the section of picking good priors to be especially helpful because I do research on bandit problems.”

Python Data Science Handbook

Covers: Bayesian techniques explored through iPython Notebooks

Source: Github

Endorsed by: pktan -- “Covers most of the python packages that beginners need to get started. It's actually a book, but the author decided to open-source it, so do ask the members to buy the book and support the author if they like it.”

MNIST for ML Beginners

Covers: Tensorflow for beginners, MNIST, Neural Networks

Source: https://www.tensorflow.org/get_started/mnist/beginners

Endorsed by: stroud -- “Eases you into the basics of Tensorflow with good figures and examples. Suitable for total beginners.”

Deep MNIST for Experts

Covers: Tensorflow Basics

Source: https://www.tensorflow.org/get_started/mnist/pros

Endorsed by: samtenka -- “If you understand CNNs in theory, this rapid yet clear tutorial will get you started with their practical implementation. Pairs nicely with LeCun's paper introducing CNNs.”

stroud -- “Gentle introduction to CNNs in Tensorflow for people with machine learning experience. Maintained by the Tensorflow team, so it will change as the tools change.”

Neural Networks

Covers: PyTorch, CNNs

Source: Link

Endorsed by: stroud -- “Covers all the basics of CNNs in PyTorch.”

Unsupervised Feature Learning and Deep Learning

Covers: Unsupervised Learning and Deep Learning

Source: http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial

Endorsed by: pgad -- “A nice tutorial with coding exercises. Covers Autoencoders.”

Variable Sharing in Tensorflow

Covers: Variable Sharing in Tensorflow

Source:https://jasdeep06.github.io/posts/variable-sharing-in-tensorflow/

Endorsed by: samtenka -- “Finally! A clear and correct explanation!”


Textbooks

Machine Learning: A Probabilistic Perspective

Covers: General topics in Machine Learning

Source: Amazon

Endorsed by: pgad -- “Murphy explains all the topics he covers really well, and gives great statistical perspective.”

Learning from Data

Covers: Statistical learning theory

Source: https://work.caltech.edu/textbook.html

Endorsed by: samtenka -- “Grounds the theoretically oriented beginner in the philosophies and tools of machine learning. I would highly recommend this book to physicists, cows, and those who ask "why?" more than "how?". The book might or might not be available for free online.”

An Introduction to Statistical Learning (with Applications in R)

Covers: Supervised and unsupervised learning, data mining, R

Source: http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf

Endorsed by: jpgard -- “This is a great overview of many of the core machine learning techniques, and doubles as a user-friendly introduction to R. The book is well-written and filled with insights. The same authors also have a great series of videos aligned with the book chapters (available on YouTube) and, for more advanced reading or more in-depth coverage, a similar book entitled Elements of Statistical Learning.”

R Graphics Cookbook: Practical Recipes for Visualizing Data

Covers: R, GGplot2

Source: Amazon

Endorsed by: jpgard -- “As a hardcore R user, I often find myself looking for a reference to adjust the same things on my visualizations -- how to I remove axis ticks, add text labels to graphical elements, or adjust legends? This is my go-to resource. A "free" PDF is floating around the internet.”

Python for Data Analysis

Covers: pandas, numpy

Source: http://www3.canisius.edu/~yany/python/Python4DataAnalysis.pdf

Endorsed by: xinyutan -- “Pandas is very confusing to me at first. This book is written by Wes McKinney, the main author of the pandas library. He introduces some logics and reasoning of pandas design, making it easier to remember (at least some core functions) and use pandas.”

Convex Optimization

Covers: Convex Optimization

Source: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf

Endorsed by: pgad -- “Advanced. The reader can learn about stuff like gradient descent, Newton's algorithm, Lagrange Duals and other stuff used in Machine Learning in great detail.”

Statlect

Covers: Probability and Measure Theory

Source: statlect.com

Endorsed by: samtenka -- “It is formal and rigorous.”


Papers

Wasserstein GAN

Covers: Generative Adversarial Deep Learning without tears

Source: https://arxiv.org/abs/1701.07875

Endorsed by: samtenka -- “WGANs are the future. This seminal paper both explains the technique lucidly and inspires a more general understanding of neural net training. I'd recommend anyone who is excited about deep unsupervised learning to peruse this paper.”


Lecture Videos

Learning: Support Vector Machines

Covers: SVM

Source: YouTube

Endorsed by: samtenka -- “Avuncular and expert, Patrick Winston takes us on a leisurely stroll through the intuition and implementation of SVMs. Let this video (played at 1.5 speed) be your guide to this important class of models.”


Lecture Slides

Deep Learning Software

Covers: Deep Learning Software/Hardware

Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf

Endorsed by: stroud -- “Very in-depth, covers advantages and disadvantages of deep learning frameworks.”.

Theoretical Foundations of Machine Learning

Covers: Statistical Learning Theory

Source: http://web.eecs.umich.edu/~jabernet/eecs598course/fall2015/web/

Endorsed by: pgad -- “Advanced. A nice introduction to Statistical Learning Theory. Homeworks available.”.

Data Visualisation

Covers: Data Visualisation

Source: http://courses.cs.washington.edu/courses/cse512/14wi/

Endorsed by: acell -- “The website is quite comprehensive, complete with assignments and readings by topic and a resource list that has tools, tutorials, data sets, and links to blogs / other courses.”