Due date: 11:00AM, 12/20/2020
Late Policy: 5% of the total points will be deducted for the first day past due and every 10% of the total points will be deducted for every extra day afterwards.
Report format: Write a report with >1,500 words including main sections: a) abstract, b) introduction, c) method, d) experiment, e) conclusion, and f) references. You can follow the paper format as e.g leading machine learning journals such as Journal of Machine Learning Research (http://www.jmlr.org/) or IEEE Trans. on Pattern Analysis and Machine Intelligence (http://www.computer.org/web/tpami), or leading conferences like NeurIPS (https://papers.nips.cc/) and ICML (http://icml.cc/2016/?page_id=151). Please submit your report and attach your code as supplementary materials to ted (you can alternatively provide a link to your github page in your report too).
Bonus points: If you feel that your work deserves bonus points due to reasons such as: a) novel ideas and applications, b) large efforts in your own data collection/preparation, c) state-of-the-art results on your applications, or d) new algorithms or neural network architecture, please create a "Bonus Points" section to specifically describe why you deserve bonus points. In general, we evaluation your justifications based on the review guidelines based on e.g. CVPR/NeurIPS/ICCV/ICLR.
In addition, there will be optional presentation for the final project to receive bonus points. You can either submit a 3-5 minutes of short video clip to ted as supplementary materials to your report, or to have a physical presence for you presentation (time and location to be determined).
Note that requirement for the word count (>1,500) only applies to a single-student project. For team-based projects, each team only needs to write one final report but the role of each team member needs to be clearly defined ans specified. The final project report is also supposed to be much longer than 1,500 words, depending upon how many (maximum 2) members there are in your team.
Word count:
One-person team: >1,500
Two-persons team: > 2,200
See below a link about writing a scientific paper: http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtoc.html The format of your references can be of any kind that is adopted in the above journals or conferences.
Grading: The merit and grading of your project can be judged from aspects described below that are common when reviewing a paper:
1. Interestingness of the problem you are studying. (10 points).
2. How challenging and large is the dataset you are studying? (10 points)
3. Any aspects that are new in terms of algorithm development, uniqueness of the data, or new applications? (20 points)
Note that we encourage you to think something beyond just downloading existing code and train on a standard benchmarks. Basically, you are expected to complete a report that is worth reading (even not publishable to some extent). If you have done a good job of relative thorough investigation of e.g. different architectures and different parameters, it is considered to be somewhat "new". The experiences you have are worth reading for the others who have not tried them before. When someone is reading your report, he/she will feel like something worthy is there including your own attempts for algorithms, a non-standard dataset or application, general conclusion about your parameter tuning, what the neural network structure might be a better choice etc.
In a nutshell, this definition of "new" is somewhat different from the aspect of "being novel" when reviewing a paper that is submitted to e.g. NeurIPS. Will add this part to the project description.
4. Is your experimental design comprehensive? Have you done thoroughly experiments in tuning hyper parameters? (30 points)
Tuning hyper-parameters in your final project will need to be more comprehensive that what was done in HW4.
For example, if you are performing CNN classification on the Tiny ImageNet dataset, some options to consider include
a. Comparing two different architectures chosen from e.g. LeNet, AlexNet, VGG, GoogleNet or ResNet
b. Trying to vary the number of layers.
c. Trying to adopt different optimization methods, for example Adam vs. stocahstic gradient descent
d. Trying different pooling functions, average pooling, max pooling, stochastic pooling
e. Trying to use different activation functions such as ReLu, Sigmoid etc.
See e.g. how the significance was justified in the ResNet paper (you don't have to follow this paper though): https://arxiv.org/pdf/1512.03385.pdf
5. Is your report written in a professional way with sections including abstract, introduction, method/architecture description, experiments ( data and problem description, hyper-parameters, training process etc.), conclusion, and references? (30 points)
6. Bonus points will be assigned to projects that have adopted new methods, worked on novel applications, and/or have done a thorough comparison against the existing methods and possible choices.
There will be three options for the final project (if you have your own idea, please come to talk to me):
Option (1): (Individual only, no team work) You can try to write your own code to perform the structured prediction problem. Try to implement at least TWO methods out of the following choices including structural SVM, conditional random fields, auto-context, and fixed point for the OCR. You may consider this being an extended version for homework assignment 2. In your final project, you are supposed to vary the window size and perform the recognition on the full word level (using the sliding window strategy as discussed in the class). Give a thorough comparison to the methods you have implemented in terms of training and testing errors w.r.t. different window sizes against the training time and the test time. Try to vary the number of training and testing samples using different splits including 1,000/4,000, 2,500/2,500, and 4,000/1,000. You can additionally choose the POS dataset to make your experiments more comprehensive.
Option (2): (Individual only, no team work) You can try to use the existing framework to perform the structured prediction problem. Try to adopt at least THREE methods out of the following choices including structural SVM, maximum margin Markov networks, conditional random fields, auto-context/fixed point for the OCR. You may consider this being an extended version for homework assignment 2. In your final project, you are supposed to vary the window size and perform the recognition on the full word level (using the sliding window strategy as discussed in the class). Give a thorough comparison to the methods you adopted in terms of training and testing errors w.r.t. different window sizes against the training time and test time. Try to vary the number of training and testing samples using different splits including 1,000/4,000, 2,500/2,500, and 4,000/1,000. You can additionally choose the POS dataset to make your experiments more comprehensive.
Option (3): (Individual only, no team work) Char RNN. You can read more about “The Unreasonable Effectiveness of Recurrent Neural Networks” at the link http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Code are available at https://github.com/karpathy/char-rnn or https://github.com/jcjohnson/torch-rnn or https://github.com/sherjilozair/char-rnn-tensorflow (You may also use any other char-rnn implementation). Tiny shakespeare dataset is available at https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt and complete Sherlock Holmes is available at https://sherlock-holm.es/stories/plain-text/cnus.txt You can also try out other interesting applications as described in the post, such as Wikipedia, Algebraic Geometry (Latex) or Linux Source Code, but you need to collect the dataset by yourself. You need to work on at least one dataset/application and try to produce meaningful results by using the char rnn model.
Option (4): Train generative models chosen from the following models including: DC-GAN, WGAN-GP, WINN, VAE, and WAE. Widely used datasets in generative modeling include CelebA face dataset (http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html), CIFAR-10 dataset (https://www.cs.toronto.edu/~kriz/cifar.html), LSUN dataset (http://lsun.cs.princeton.edu/2017/). Compare the models of your choice on at least TWO datasets. Try to compare the models qualitatively and quantitatively (e.g., use Inception Score: https://arxiv.org/pdf/1606.03498.pdf). Another good candidate is the new model for generating high-quality faces: https://github.com/tkarras/progressive_growing_of_gans.
Option (5): (please discuss it with the instructor when planning). A topic of your own about visual, acoustic, language and other data modeling using modern deep learning techniques including convolutional neural networks, recurrent neural networks, auto-encoder etc. You can look for interesting topics on recent NeurIPS, CVPR, ICLR, ICCV, ACL, AAAI, etc.
Some interesting project you can consider working on:
Language modeling:
https://github.com/andrewt3000/DL4NLP/blob/master/README.md
BERT/ALBERT: https://github.com/google-research/albert
Transformers: https://github.com/huggingface/transformers
Recurrent Neural Networks:
char-rnn (http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
Audio:
Google's recent release of a massive audio dataset: https://research.google.com/audioset/
Image Captioning:
https://github.com/karpathy/neuraltalk2
Show-and-tell (https://github.com/tensorflow/models/tree/master/research/im2txt)
Show-attend-and-tell (https://github.com/yunjey/show-attend-and-tell)
Graph Neural Networks:
https://github.com/deepmind/graph_nets
https://github.com/williamleif/GraphSAGE
Project reports from the Stanford cs231n class: