Setup your Environment

A Word About "Requirements"

There are two types of requirements when it comes to configuring your environment for this class. The first type is a"hard" requirement. These are configuration options that you must adhere to in order to pass the course (there will not be many of these). The second type is a "soft" requirement or "suggestion". Soft requirements will allow you to have an environment that is maximally similar to my own, and by virtue of using the standard, maximally similar to your classmates. These two attributes will allow for the course staff to better support you, as well as allowing you to more smoothly collaborate with other students.

Operating System

I suggest that you use Ubuntu (specifically 14.04). Using Ubuntu will allow you to install relevant libraries and packages with minimal friction. Other Linux distributions will probably work, but only go that route if you are very familiar with that specific distro. If you decide not to use Linux, you will probably have luck with Windows or Mac OSX, but for either of these two alternates, you will certainly want to use Anaconda (see packages and setup for details on Anaconda).

Programming Environment

Language

This course will be taught in Python. You are required to program in Python except when explicitly indicated.

Packages and Setup

The following is an (almost assuredly incomplete) list of packages that we will be using throughout the class:

- scikit-learn
- pandas
- numpy
- seaborn
- matplotlib
- scipy
- ipython
- jupyter (formerly ipython) notebook

One way to get all this and more is to use the very cool Anaconda Python distribution from Continuum Analytics. To install this, click on the link and follow the directions. You will want to then use the package manager conda to install any missing packages. When I tried this recently, I only had to run the following command to get all of the packages listed above:

$ conda install seaborn

A word of caution: if you install Anaconda, you will be asked if you want to put the Anaconda python interpreter in your path. If you do this, you will use Anaconda by default instead of other Python versions that have been installed on your computer. This can mess with existing software. If you want to get around this, simply say "no" to adding the Anaconda interpreter to you path. You can manually decide to use the anaconda interpreter by creating the following shell script (let's assume we put it at ~/use_anaconda.sh; of course you will also want to make sure the path is to your installed location of Anaconda).

#!/bin/bash

export PATH=/home/pruvolo/anaconda2/bin:$PATH

To use Anaconda, you would simply type:

$ source ~/use_anaconda.sh

Then you can startup a Python interpreter as you would like.

IDE

We will be using a mixture of a conventional Python IDE and Jupyter notebook. For the conventional IDE, I recommend Sublime Text (I use version 2).

Version Control

We will be using Git and Github for version control. To get setup, please fork the base class repo. Once you have successfully forked the repo, you should clone your repo to your computer and then add a new remote that points back to the base repo. This remote will allow you to pull changes from the base repo as I add them.

$ cd ~/your_git_repo_location

$ git remote add upstream https://github.com/paulruvolo/DataScience16

I will be using Github pull requests to provide feedback on your code, so it is important that you do your work in a forked version of the base repo. I will be adding additional repos (that you will then also fork) for each of the four class projects.

Page updated

Google Sites

Report abuse