Home

CSC392-2(CSC310) - Programming for Data Science - Spring 2017

Instructor

Prof.  Lutz Hamel

Email: lutzhamel@uri.edu

Office: Tyler Hall Rm 251

Office hours: TuTh 11-noon, or by appointment


Announcements


[3/10/17] Important: The Midterm is due Monday 3/27 @ 11:55pm in Sakai.


Individual Analysis Projects: Please submit a report in PDF that has at least 3 pages (single spaced and excluding cover page and references) and follows the format for reports covered in class. You also upload any scripts you have developed as part of the project.


Team App Projects: Please submit a report in PDF that has at least 3 pages (single spaced and excluding cover page and references) that covers the functionality of your app and the data you are using. Your report should also discuss one complete use case, from accessing the data, processing it, and interacting with the user. You also need to upload your source code as a zip file as part of your project report. Each team member has to submit their own report.

[3/7/17] Just posted a link to the UCI data repo in 'Documents of Interest'

[3/7/17] The midterm proposal is due Thursday 3/9 @ 1:45pm in Sakai.

[3/2/17] Here is a link to the interactive tree visualizer.

[3/2/17] Here is a really nice pyplot tutorial.

[2/23/17] Here is a script that reads the iris dataset and then builds a model and eval's it:

import pandas as pd
from sklearn import tree
from csc310viz import tree_print
from model_error import model_error

df = pd.read_csv("iris.csv")
features_df = df.drop(['id','Species'],axis=1)
target_df = df['Species']

dtree = tree.DecisionTreeClassifier(criterion='entropy',max_depth=1)
dtree.fit(features_df,target_df)
tree_print(dtree,features_df)
predict_df = pd.DataFrame(dtree.predict(features_df)).iloc[:,0]


print("Tree predicts training data: {}".format(predict_df.equals(target_df)))

accuracy = 1 - model_error(target_df,predict_df)
print("Model Accuracy: {}".format(accuracy))

[2/23/17] Here is a function that computes the model error given the target and predicted values:

def model_error(target,predict):
    target = list(target)
    predict = list(predict)

    if len(target) != len(predict):
        raise ValueError

    error = 0
    for t,p in zip(target,predict):
        if t != p:
            error += 1

    return error/len(target)

[2/15/17] The numeric play tennis file and the tree visualizer are at the bottom of the page for you to download.
[2/8/17] Here is some code that is sensitive to the datatype of the column of a data frame:


df.select_dtypes(include=['float64']).describe() 
df.select_dtypes(include=['int64']).describe()
df.select_dtypes(include=['object']).describe()

[2/6/17] There seem to a lot of problems with AWS VMs in terms of disk space and VNC Server access.  Until I figure these problems out you can download anaconda3 (it has to be 3) to your laptop and work locally.  You can download anaconda from here:

https://www.continuum.io/downloads

The graphical installer on the Mac installing just for me sets up the environment correctly so I think it will probably do that for windows and linux as well.  One you installed it you should open a command line window and type python.  Then you should see something like this:

Python 3.6.0 |Anaconda 4.3.0 (x86_64)| (default, Dec 23 2016, 13:19:00)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.


The important part it that you are using the python that comes with anaconda3 otherwise none of the packages will work.

Hope this helps.

[2/1/17] Posted assignment 3.

[2/1/17] Hint for assignment 3 - data frame display function:

import os
import time
import pandas

def display_df(df):
    "clear the screen, display the contents of a dataframe, wait for 1sec"
    os.system('clear')
    rows = df.shape[0]
    cols = df.shape[1]
    for i in range(rows):
        for j in range(cols):
            print(df.iloc[i,j],end=' ')
        print()
    time.sleep(1)

[1/26/17] A couple of additional notes on the AWS instances:

  • If something goes seriously wrong you can use the Actions->Instance State->Reboot menu point
  • Is you want to shut the instance down Action->Instance Sate->Stop
  • In order to delete an instance: Action->Instance State->Terminate
  • To shut down the vncserver type 'vncserver -kill :1' at the command prompt of the instance terminal and just type 'vncserver' to restart it.

[1/25/17] Added a lot of info to 'Documents of Interest' - take a look!

[1/23/17] Please sign up for an Amazon Web Services (AWS) account. It is free. You will need to it to run our virtual development machines.

[1/23/17] Welcome!




    Ċ
    Lutz Hamel,
    Mar 7, 2017, 9:04 AM
    ċ
    csc310viz.py
    (2k)
    Lutz Hamel,
    Feb 15, 2017, 7:28 PM
    ċ
    tennis_numeric.csv
    (0k)
    Lutz Hamel,
    Feb 15, 2017, 7:28 PM