Assignment 1

Due: April 26, 2019 11:59pm PDT

Overview

The dataset associated with this task consists of critic reviews for a large number of movies.
Each review is associated with a 'freshness' label which is either 'rotten' or 'fresh'.
This can, therefore, be represented as a binary classification task on text reviews.
Using the methods followed in the notebooks discussed during the lecture, develop a notebook for the complete pipeline.
The required architectures include:
- Count vectorization of the reviews followed by two classification models:
  1. Logistic Regression model with L1 and L2 regularization (see resources for help)
  2. Three layer neural network ( Input-> Dense -> Dense -> Dense -> Output)
- Sequential text based learning retaining the order information followed by two classification models:
  1. LSTM based model similar to the one discussed in the class
  2. GRU based model where LSTM is replaced by a GRU
Start by creating a copy of the starter notebook and keep it in your google drive.
The notebook already contains the code to load the dataset.
Use sample size in data during development, but final version should use complete file (480,000 samples)
Train, val, test split has to be 70%, 10%, 20%

Please use Crowdgrader to submit the assignment.
On the assigment notebook, make sure that all the required outputs are visible.
Go to 'File', and click 'Download .ipynb'
Submit the downloaded file on Crowdgrader
For enrolled students, crowdgrader access should be visible using ucsc.edu email address.
You can also use this link to access the assignment on Crowdgrader - https://www.crowdgrader.org/crowdgrader/venues/join/4311/cibavy_vybywu_bomibo_vadivy

Report abuse