For your final project, you will be writing an ACL-style "short paper" and giving a short presentation on a work of your choice. Your grade will be split across a short proposal "pitch" (5%), a literature review (10%), participation in peer review (10%), a presentation (25%), and a final paper (50%). Projects may be done in groups of up to three students; however, the expectations do scale up in terms of the results produced as your group grows.
Due Wednesday, March 24 on Gradescope (no group submissions). For Thursday, March 4, bring some starting ideas!
Your assignment is to write two one-paragraph project pitches. Each pitch should clearly state what research question you're attempting to answer, what data you'll use, what experiments you want to run, and what metrics you'll use to evaluate your experiments' output.
You'll write this up using LaTeX; please submit both the PDF and LaTeX file on Gradescope so I can conveniently pull out the text of your proposals for the shared projects list. The file names of these don't matter; even though it's a "coding assignment", there's no autograder. Since you'll use it later for your projects, I recommend taking this template to start from. You are welcome to get rid of all of the formatting except the title and authors for now; if you want some general tips for LaTeX formatting, you can find them in the template document or this sample document for CS 140.
It's a tricky thing to come up with a project from scratch, particularly when you don't have a lot of time. Here are some tips for the task:
Make your research question clear and concrete. Getting a good research question is a subtle thing: borrowing terms from this helpful advice post, you want it to be (a) clear to an audience of your classmates, (b) focused enough that you will be able to address it with one or two narrow experiments, (c), concise enough to state within the first few sentences of a paragraph, (d) complex enough that the answer isn't immediately evident, and (e) arguable, in the sense it should be possible to provide evidence that supports or rejects an answer to your research question.
Pick a clean, ready-to-use dataset. Dataset processing takes a long time. You'll probably need to do some no matter what (and I encourage you to borrow code from the labs to process XML or use spaCy to do so), but you should probably plan to use a dataset that is already prepared for processing. Good sources of these datasets include leveraging existing shared tasks (for instance, SemEval and CoNLL tasks and the GLUE benchmark) or datasets that have been used for lots of NLP processing in the past (e.g. this list of ten corpora). There are also some sites that are "easier" to get data from, like anything related to Wikimedia (e.g. Wikipedia), StackExchange (e.g. StackOverflow), or Reddit (which is archived broadly but also has been curated for some specific projects). You can also check sites like Kaggle to see if they have anything available. If the process to acquire the dataset for your project takes more than 24 hours or costs money, it's not an option for this class.
Keep things narrow. It's okay if your project doesn't create a new dataset, model, and evaluation all in one swoop! The smaller the scope is for what you're doing, the easier it will be to provide evidence that you did it well. For instance, you could
(a) perform a replication study (e.g., take existing code for an experiment and check that it's doing what it says, plus break down their results a bit more),
(b) take a large unstructured dataset and curate it into one that helps answer a more particular question + show a simple model works on it,
(c) try to get a good result on an existing shared task/analyze the contents of the text of a shared task to see what parts are "easy" or "hard", or
(d) create a new metric/evaluation and show it does something interesting.
For anything you're planning to do, pay attention to file sizes, and let Prof. Xanda know if you're going to need more space than your computer can handle. Processing a dataset bigger than a gigabyte or two is probably not going to happen for this semester. If you need to run something that'll take a while, you should reach out to Prof. Xanda to get set up with some disk space.
If you're not sure about starting ideas, I'd recommend looking at the shared SemEval tasks, browsing through the textbook, and maybe using the search box in the ACL anthology,. You're also welcome to reach out to me with the start of an idea (e.g. a domain or data set you're interested in) and we can work on refining it.
Draft of related work due April 15 *in class* for peer review
Related work + methods due April 21 by 10 PM on Gradescope
Part of your final paper for your project will include a discussion of related works to your paper. For this deadline, I would like you to work on compiling discussion of papers related to the project you are undertaking in order to describe what those projects do. I'd like you to do this using the ACL 2021 format (here's the template). You will submit one literature review per project (as a team).
The literature review writeup will include two parts: a related work section and a description of methods. You only need to draft the first part for 4/15's in-class exercise.
Related Work. Oftentimes, you've probably seen or written a section in papers called "Related Work" which, when executed best, describes scholarly works that are closely related to your project and how your project differs. This includes projects solving the same problem in a different way, projects solving slightly different problems with the same or a similar strategy to yours, projects on which your project builds, etc.
The amount of detail you want to put into this part may depend on how comparative your work is; for instance, while the Centroid-Based Text Summarization paper has a fairly succinct Section 2 that quickly addresses the context of their model, the NarrativeQA model spends a lot longer talking about similar works because illustrating that distinction is part of the core argument of their paper. I would guess most projects will fall somewhere in between these two, but make sure this isn't just a grocery list of papers that have similar keywords. You don't have to thoroughly read every paper you cite, but you should know enough about them to be able to succinctly say how what you're doing is similar and different to what they did.
To compile these papers, the resources above and Google Scholar are helpful starting points. If you find important papers, textbook chapters, or pages referencing this topic, I'd encourage you to look at the bibliographies of those papers to find out what the core papers are that people cite in this subfield. Google Scholar usually has a "Cited by ##" link that you can use to find more recent papers that cited some fundamental paper; for instance, if I was interested in seeing how people were following this for the NarrativeQA paper finds a bunch of interesting papers about subsequent corpora for question answering. If you're not sure where to start, the textbook or Wikipedia page may give you some starting paper links, but don't let the textbook be your main resource; go find primary sources!
Methods. In a standard NLP experimental research paper, citations are not only found in the Related Work section; they're scattered throughout the paper, as they help motivate the introduction, describe the evaluations, and contextualize the results of an experiment. For instance, the previously mentioned centroid-based summarization paper uses citations for what word embedding learning algorithms and weighting they used, the basis for their centroid selection algorithm, what metrics they used for evaluation, and where they got their datasets.
I'd like you to write out a citation-enriched plan of what you're going to use to assemble your project, including models, evaluations, libraries, datasets, and published strategies. Note that your job here isn't to justify why these choices are the right ones, it's to document where those choices came from in the existing literature (which may actually turn out to be all the justification you need). This isn't a contract with me for what your project will look like, but it's likely to help you get a good checklist for what you'll need to get running. Again, try to find primary sources if you can; while Jurafsky & Martin describes how to use tf-idf, for instance, it's appropriate to cite the original 1986 Salton & McGill paper as your source if you use tf-idf (as the centroid paper does). Similarly, for many datasets, there's an associated paper that introduces the dataset, and it's appropriate to cite that paper when you first refer to the dataset in your paper; if no such paper exists, however, it's okay to use a \footnote{\url{https://path/to/dataset}} to indicate where it came from.
On length and number of citations. There is no maximum or minimum length for this, because depending on the project, there may be more or less to focus on. I'd guess most of these will be 1.5-2 pages, which is about how much space the literature review and methods sections take up in an average ACL short (4-page) paper. I also expect roughly 10-15 citations, maybe 3 of which you engage with more deeply to compare your work. Feel free to use the ACL Anthology's BibTeX entries, or in a pinch Google Scholar's BibTeX entries to help populate your bibliography (and check with me or the grutors if you're not sure about how to use BibTeX)...however, if you use Google Scholar, you'll need to make sure the citations are actually complete, as Google Scholar often messes these up.
I'm not counting pages or citations: I'm interested in whether it's clear what the context of your work is and how your approach will work. While I expect many of the citations will come from NLP papers, I also expect some may come from other domains, like gender studies, linguistics, or political science.
For paywalled citations. While the ACL makes all of their papers free online, not all venues for machine learning and NLP research do. If you discover a relevant-looking paper is behind a paywall, you may be able to use your Claremont Colleges library access to help you reach it. To do this, the easiest thing is to add a bookmarklet to your Bookmarks bar that will redirect to a special library-portal version of the article access page. More information on how to do that is here: https://paperpile.com/p/proxy-claremont-colleges/.
Due Wednesday, April 28 at 10 PM PST.
To present the core problem you're working on, your approach, and a little of what you've done so far, you'll be recording a 5-6 minute video describing your project. 5 minutes is not a lot of time, so you'll want to quickly get to the core research question you're addressing and a quick idea of what results you have so far. (Teams of 3 can have an extra minute, or 7 minutes total.)
Your assignment will be graded on the following:
Problem Statement: (20 pts) Does your presentation clearly establish what the problem or research question is that your project addresses? This should be concrete and something for which you can provide evidence: a guiding question like "how does gender affect translation" isn't a concrete research question, but "how does signaling speaker gender to a machine translation system affect the quality of translations" is.
Background: (10 pts) Is it clear what existing work you're building off of/comparing to? You don't have to mention all related work, but it'd be good to mention a couple of close neighbors to your project/historical context for your project so someone can understand your specific contribution.
Progress: (10 pts) Is it clear what you've done so far, and what's left to do? It's okay if you don't have detailed plots or results, but a description of what the steps of your project are and where things are would be good.
Slides: (10 pts) Do the slides help communicate your ideas in a clear and effective way? Good slides give enough information to visualize or support the argument you're making, but won't necessarily have text for everything you say (in fact, many of my slides for technical presentations have no text at all). A good rule for slide decks is that most people average about a minute of speech per slide.
To record these videos, you can use your favorite screen capture tool, or just start a 1-person recorded Zoom call. Please submit the video by uploading it to the Lightning Talks folder, and submit your slides/visuals as a PDF to Gradescope.
*A nice but not required task for these videos is adding captions. If you write a script for your video and would like to add subtitles, a quick way to do this is to upload it to YouTube first and use their automatic captioning tools to upload your script and sync it up, then download the captions afterwards. Google Drive will then let you add captions if you right-click your uploaded video and click "Manage Caption Tracks", selecting your file for captions.
Due May 14th at 5 PM PST.
Your final project will be a 3-5 page paper (not including references) in the ACL 2021 format (here's the template). If you are having trouble using the TeX template, especially for citation management, please reach out to the grutors or Prof. Xanda right away for help!
Your report should clearly express your approach to addressing a clear research question, how your approach connects with and differs from previous approaches, and both quantitative and qualitative analysis of your results. To fully match the ACL style, it should also include a (short!) abstract that states in 3-5 sentences what problem you addressed and what a key finding was.
The paper will be graded on the following:
Problem Statement: (10 pts) Does your presentation clearly establish what the problem or research question is that your project addresses?
Background: (20 pts) Is it clear what existing work you're building off of/comparing to? It's okay if this isn't a full history of the area, but it should be enough that someone who sees your paper or video will be able to distinguish what the new contributions of your work are with respect to existing work. It's not expected that your video will have the same amount of detail as your paper.
Methods: (30 pts) Do you clearly describe the data, processing, models, and evaluations you are using, and how those help you address your problem statement?
Results: (40 pts) What do your evaluations say about your models, data, etc.? Note that this isn't asking "is your model good", but rather what kinds of information do you get out of your results. Effective results sections should answer "why did/didn't this work" instead of just "did it work", and should include some ideas of which pieces of a model or what specific text examples might help explain quantitative summary statistics you have.
Please submit your final PDF by May 14th at 5 PM on Gradescope. (As per final exam rules, there cannot be extensions on this deadline).