Search this site
Embedded Files
Skip to main content
Skip to navigation
CMPS290T- Spring 2019
Home
Lectures
Lecture 1
Lecture 2
Lecture 3
Lecture 4
Lecture 5
Lecture 6
Lecture 7
Lecture 8
Lecture 9
Lecture 10
Lecture 11
Lecture 12
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Assignments
Assignment 1
Readings
Syllabus
Inquiries
CMPS290T- Spring 2019
Home
Lectures
Lecture 1
Lecture 2
Lecture 3
Lecture 4
Lecture 5
Lecture 6
Lecture 7
Lecture 8
Lecture 9
Lecture 10
Lecture 11
Lecture 12
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Assignments
Assignment 1
Readings
Syllabus
Inquiries
More
Home
Lectures
Lecture 1
Lecture 2
Lecture 3
Lecture 4
Lecture 5
Lecture 6
Lecture 7
Lecture 8
Lecture 9
Lecture 10
Lecture 11
Lecture 12
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Assignments
Assignment 1
Readings
Syllabus
Inquiries
Assignment 1
Due: April 26, 2019 11:59pm PDT
Overview
Colab Notebook describing entire ML pipeline on a dataset
Use course notebooks as reference
Dataset: Rotten Tomatoes: 480,000 Labeled Critic Reviews [
Link
]
Starter Colab notebook [
Link
]
Submit using Crowdgrader [
Link
]
Task
The dataset associated with this task consists of critic reviews for a large number of movies.
Each review is associated with a 'freshness' label which is either '
rotten
' or '
fresh
'.
This can, therefore, be represented as a binary classification task on text reviews.
Using the methods followed in the notebooks discussed during the lecture, develop a notebook for the complete pipeline.
The required architectures include:
Count vectorization of the reviews followed by two classification models:
Logistic Regression model with L1 and L2 regularization (see resources for help)
Three layer neural network ( Input-> Dense -> Dense -> Dense -> Output)
Sequential text based learning retaining the order information followed by two classification models:
LSTM based model similar to the one discussed in the class
GRU based model where LSTM is replaced by a GRU
Start by creating a copy of the
starter notebook
and keep it in your google drive.
The notebook already contains the code to load the dataset.
Use sample size in data during development, but final version should use complete file (480,000 samples)
Train, val, test split has to be 70%, 10%, 20%
Submission
Please use Crowdgrader to submit the assignment.
On the assigment notebook, make sure that all the required outputs are visible.
Go to 'File', and click 'Download .ipynb'
Submit the downloaded file on Crowdgrader
For enrolled students, crowdgrader access should be visible using ucsc.edu email address.
You can also use this link to access the assignment on Crowdgrader -
https://www.crowdgrader.org/crowdgrader/venues/join/4311/cibavy_vybywu_bomibo_vadivy
Resources
Starter code [
Link
]
ML Pipeline 1 [
Link
]
ML Pipeline 2 [
Link
]
Machine Learning Crash Course by Google [
Link
]
Keras [
Link
]
Regularizers [
Link
]
Blog on LSTM [
Link
]
Report abuse
Report abuse