ML and Data Science

Online Courses

AI and Machine Learning

  • Machine Learning - Stanford University via Coursera - completed

    • Machine learning is the science of getting computers to act without being explicitly programmed. This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition.

  • Introduction to Artificial Intelligence - Stanford University via Udacity - in progress

    • This class introduces students to the basics of Artificial Intelligence, which includes machine learning, probabilistic reasoning, robotics, and natural language processing.

      • Overview of AI

      • Statistics, Uncertainty, and Bayes networks

      • Machine Learning

      • Logic and Planning

      • Markov Decision Processes and Reinforcement Learning

      • Hidden Markov Models and Filters

      • Adversarial and Advanced Planning

      • Image Processing and Computer Vision

      • Robotics and robot motion planning

      • Natural Language Processing and Information Retrieval

  • Machine Learning Video Library

  • Artificial Intelligence for Robotics - Stanford University via Udacity

    • This class introduces students to the basics of Artificial Intelligence, which includes machine learning, probabilistic reasoning, robotics, and natural language processing.

Data Science

Source for picture: InsideBigData

Documentation and Wiki Sites

Foundations of Data Science by Microsoft Research - John Hopcroft and Ravindran Kannan

You can download the book here.

List of chapters:

  1. Introduction

  2. High-Dimensional Space

  3. Best-Fit Subspaces and Singular Value Decomposition (SVD)

  4. Random Graphs

  5. Random Walks and Markov Chains

  6. Learning and VC-dimension

  7. Algorithms for Massive Data Problems

  8. Clustering

  9. Topic Models, Hidden Markov Process, Graphical Models, and Belief

  10. Other Topics

    1. Rankings

    2. Hare System for Voting

    3. Compressed Sensing and Sparse Vectors

    4. Applications

    5. Gradient

    6. Linear Programming

    7. Integer Optimization

    8. Semi-Definite Programming

An Introduction to Statistical Learning with Applications in R (4th Edition) - G. Casella, S. Fienberg, I. Olkin

This book provides an introduction to statistical learning methods.

It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences.

The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

You can download the book here

For more informations and resources: http://www-bcf.usc.edu/~gareth/ISL/

Tidy Data - Hadley Wickham

A huge amount of effort is spent cleaning data to get it ready for analysis [...]

This paper tackles a small, but important, component of data cleaning: data tidying. [...]

You can download the research paper here

The Field Guide to Data Science - Booz, Allen, Hamilton

This field guide came from the passion our team feels for its work.

It is not a textbook nor is it a superficial treatment. Senior leaders will walk away with a deeper understanding of the concepts at the heart of Data Science.

Practitioners will add to their toolbox. We hope everyone will enjoy the journey.

You can download the book here

Mining of Massive Datasets (2nd edition) - Jure Leskovec, Anand Rajaraman, Jeff Ullman - NEW

Big-data is transforming the world. Here you will learn data mining and machine learning techniques to process large datasets and extract valuable knowledge from them.

You can download the book here

http://www.boozallen.com/media/file/The-Field-Guide-to-Data-Science.pdf

Unsupervised Feature Learning and Deep Learning - Wiki (Stanford University)

This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. This tutorial assumes a basic knowledge of machine learning (specifically, familiarity with the ideas of supervised learning, logistic regression, gradient descent).

Software

Project Name: Pylearn2

Language: Python

Project URL: http://deeplearning.net/software/pylearn2/

Description:

Pylearn2 is a machine learning library. Most of its functionality is built on top of Theano. This means you can write Pylearn2 plugins (new models, algorithms, etc) using mathematical expressions, and Theano will optimize and stabilize those expressions for you, and compile them to a backend of your choice (CPU or GPU).

Project Name: DeepLearnToolbox

Language: Matlab/Octave

Project URL: https://github.com/rasmusbergpalm/DeepLearnToolbox/

Description:

Matlab/Octave toolbox for deep learning.

Includes Deep Belief Nets, Stacked Autoencoders, Convolutional Neural Nets, Convolutional Autoencoders and vanilla Neural Nets.

Each method has examples to get you started.

Project Name: DeepDive

Language: PostgreSQL, Scala, Python2

Project URL: http://deepdive.stanford.edu/

Description:

DeepDive is a new type of system that enables developers to analyze data on a deeper level than ever before. DeepDive is a trained system: it uses machine learning techniques to leverage on domain-specific knowledge and incorporates user feedback to improve the quality of its analysis.

Project Name: MADlib

Language: C++, Python

Project URL: http://madlib.net/ + https://github.com/madlib/madlib/

Description:

MADlib is an open-source library for scalable in-database analytics.

It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.

Apache UIMA.

Unstructured Information Management (UIMA) is a standard for performing analysis on textual content.

OpenCog ...

OAQA ...

Web Sites

A.I. and Machine Learning

URL: http://deeplearning.net/

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. This website is intended to host a variety of resources and pointers to information about Deep Learning.

URL: http://startup.ml/ + http://startup.ml/blog/

Machine Learning Accelerator.

Data Science

Data Science Central is the industry's online resource for big data practitioners.

From Analytics to Data Integration to Visualization, Data Science Central provides a community experience that includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology, tools and trends and industry job opportunities.

OpenCPU

OpenCPU is a system for embedded scientific computing and reproducible research.

The OpenCPU server provides a reliable and interoperable HTTP API for data analysis based on R. You can either use the public servers or host your own.

Demo site: https://public.opencpu.org/ocpu/test/

Data Visualization

Data visualization with the HTML5 <canvas>