Statistical predictive models are used to address an increasing variety of problems in Natural Language Processing (NLP), computer vision, bioinformatics to name a few. However, most of these methods rely on annotated data. Obtaining such labeled data for every task and every domain (e.g., a language or even a genre: consider language use in newswire vs. user reviews) is very expensive (or impossible). These observations suggest that we should look into methods which exploit unlabeled data (e.g., texts available on the Internet) either to improve a model learnt from a (small amount of) labeled data, or induce one from scratch. The topic of this seminar is exactly this: exploiting unlabeled data to induce or improve statistical models.
The class will cover machine learning methods for semi-(un-) supervised learning. The main focus will be on problems from natural language processing but most of the considered methods will have applications in other domains (e.g., bioinformatics, vision, information retrieval, etc). We will try to focus mostly on structured prediction problems (i.e. predicting graphs, for example, syntactic parse trees), as they are widespread and challenging (and arguably more interesting).
Though most of the applications will be from the NLP domain, we do not require any prior exposure to NLP (though it would be a plus). Ideally, we expect that you have some prior experience with machine learning, statistical NLP or IR. If hesitant, feel free to contact us and ask.
The class will focus both on classic methods for semi-(un-) supervised learning but will also consider some recent, interesting, and often influential techniques. We will also consider some interesting applications, such as semantic parsing of natural language and unsupervised grammar induction.
Time and Location
Fridays at 14:15 in building C 7.2, room 2.11.
Please send us an e-mail to arrange a meeting.
Requirements for the course are:
- Present a paper to the class (30 - 45 minute presentation)
- Write 2 critical reviews (surveys) on two selected topics (1 - 2 pages each)
- Write a term paper (12 - 15 pages) (you do not need to write the term paper, if registered for 4 points)
- Read papers before the talks and participate in discussion
- Present the chosen paper in an accessible way
- Present sufficient background, do not expect the audience to know much about Machine Learning or Natural Language Processing, except for the material already covered in the class (according to surveys there is a good number of people who have no ML background)
- Have a critical view on the paper: discuss shortcomings, possible future work, etc
- To give a good presentation in most of the cases you will need to read one or two additional papers (e.g., those referenced in the paper)
- You should have a look into material on how to give a good presentation compiled by Alexander Koller
- The language for talks and discussions will be English
- Given the number of students now, we are planning to have 35 minutes long presentations, on some days we may decide to have 2 presentations
- Send both of us your slides (preferably in PDF) 4 days before the talk by 6 pm (the first 2 presenters can send me slides 2 days before the talk)
- If we keep the class on Friday, the deadline would be on Mon at 6 pm
- You will get feedback from us 2 days before the seminar (on Wed)
- A short critical (!) essay reviewing one of the papers in the list
- One or two paragraphs presenting the essence of the paper
- Other parts underlying both positive sides (what you like) of the paper and shortcomings
- You need to submit 2 reviews. There will be up-to 3 reviewers for each presentation.
- The review should be submitted (by email in pdf) before the presentation of the paper in class (Exception is the additional reviews submitted for the classes you missed: you should submit such an additional review within 2 weeks of the corresponding class and before the end of the term)
- No copy-paste from the paper. It should be all your words.
- Length: 1 - 1.5 pages each
- Describe the paper you presented in class.
- It should be written in a style of a research paper, the only difference is that in this paper most of the work you present here is not your own
- Your ideas, analysis, comparison
- It should be written in English
- Comparison of the methods used in the paper with other material presented in the class or any other related work
- Any ideas on improvement of the approach
- Any alternative interpretation or analysis
- Paper organization
- Technical correctness
- Style (written in research style without inappropriate speculations, correct citations, etc)
- Your ideas are meaningful and interesting
Length: 12 - 15 pages
Deadline: Sep 24 (however, we would recommend to submit it soon after your presentation).
Format: Submitted in PDF over email to both of us
Marks will be assigned as follows:
- Class participation: 60%
- Your talk and discussion after the talk
- Participation and discussion of other papers
- 2 reviews (5% each)
- Term paper: 40% (only if registered for 7 points, otherwise, class participation constitutes 100% of the grade)
You can skip ONE class without giving any explanation (if you are not presenting). If you need to skip more, you will need to write an additional critical review for every paper presented while you were absent.