CSE 538 Fall 2022

Natural Language Processing

The main objective of this course is to cover four fundamental aspects of natural language processing. The instructor for this course is Niranjan Balasubramanian.

1. Why is NLP difficult?

2. How do we build representations of natural language texts? Transformers, RNNs, CNNs.

3. How can we use these representations to process and generate language? Learning strategies.

4. How to evaluate NLP systems?

As an instructor my primary goal is to get you excited about natural language processing and its applications. A secondary goal is to provide you with enough introduction and familiarity about concepts/themes in NLP that you are able to learn more on your own.

You will find most of the administrative information you need on this website.

Syllabus

Detailed syllabus with lecture topics and dates will be made available in the first week of class. The topics that will be covered will include:


  1. Word Representations (Word2Vec, GLoVE)

  2. Sentence Representations (DAN, RNN, CNN, Transformers)

  3. Language Modeling (BERT, GPT, T5 etc.)

  4. Language Generation

  5. Prompting, In-Context and Instruction-based Learning

  6. Efficiency Considerations and Long Context Modeling

  7. Syntactic and Semantic Parsing

  8. Applications: Question Answering and Machine Translation


Schedule


F22 CSE538 Schedule

Requirements

Here is a list of things that would be useful for this class. I won't be able to respond to individual requests on whether your background is suitable. Please use the following to make your own determination.

The following are critical. If you are completely unaware of the following then you will likely have difficulties following material in class.

Must haves:

  • Basic probability and statistics (joint and conditional probabilities, Bayes rule etc)

  • Basic linear algebra (vector and matrix operations)

  • Basic calculus (differential calculus)

  • Machine learning basics (classification, basic ml recipe)

The following are useful but some of these will be covered in class and with some effort you can pick these up as we go along.

Can be picked up as we go along:

  • Deep learning basics (neural networks, feed forward, sequence models, etc.)

Books and Reading Material

I am not going to follow any book. Material presented in class will be in slides. Your best bet for learning is to attend class and follow up on pointers I give out to reading materials in class.

That said, there are many books and reading material out there on the web. Here are a few you may find helpful:

Course Related Links

  • [Piazza] for most of course communications.

  • [Blackboard] For submissions of assignments, projects etc.

  • [Google cloud credits] Coming soon.

Course Structure

NOTE 1: Dates for these will be posted in the first week of class.

NOTE 2: Assignments will be in python and will require you to learn pytorch, a deep learning framework. We will give you enough material to learn this as we go along.

NOTE 3: Projects can be implemented in any programming language.

  • Programming Assignments (30%)

    • Word representations

    • Sentence Representations

    • Dependency Parsing

  • Project (30%)

    • Baseline submission (5%)

    • Final Submission (25%)

  • Midterms (40%)

    • Midterm I (25%) -- In class.

    • Midterm II (15%) -- In class.


Accessibility

Stony Brook DSS provides some excellent services. We will make every effort to support accessibility needs for all parts of the course. Please contact me via email to make specific arrangements. I am still learning about needs and ways to provide support. I am happy to hear suggestions and inputs on making the class accessible for everyone.

Here is the official statement that SBU requires which I endorse and will follow as policy for the class.

If you have a physical, psychological, medical, or learning disability that may impact your course work, please contact the Student Accessibility Support Center, Stony Brook Union Suite 107, (631) 632-6748, or at sasc@stonybrook.edu. They will determine with you what accommodations are necessary and appropriate. All information and documentation is confidential.

Students who require assistance during emergency evacuation are encouraged to discuss their needs with their professors and the Student Accessibility Support Center. For procedures and information go to the following website: https://ehs.stonybrook.edu//programs/fire-safety/emergency-evacuation/evacuation-guide-disabilities and search Fire Safety and Evacuation and Disabilities.

Critical Incident Management


Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Student Conduct and Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students' ability to learn.

Flexible Submissions Policy

NOTE: Policy will be finalized by end of the first week of classes.

  • Assignments and Project

    • You have a total of six extra days to use as you see fit. You can use all six days for a single assignment, one day for each assignment -- whatever works best for you.

    • Submissions will be docked 20% for every late after the sixth day.

  • Midterms

    • Midterms will be held in class.

    • No rescheduling except for medical emergencies with appropriate documentation.

Grading

I will likely make adjustments to the grading scheme based on the overall performance of the class. Here is a tentative grading rubric:

A: 90 and above

A-: 80 or more but less than 90

B+: 75 or more but less than 80

B: 70 or more but less than 75

B-: 65 or more but less than 70

Five point intervals for lower letter grades.

Academic Integrity

In this class, we encourage collaboration. Whenever possible we will clearly state what forms of collaboration are allowed and what aren't. Of course, it is near impossible to list all forms of unethical or dishonest behavior. You can consult the SBU website on Academic Integrity for more information.

While a masters program can be an excellent opportunity, it can also be difficult and exhausting, here are some of my thoughts on ways in which you can structure your expectations for your MS program that might be of help to you.

Grades are hardly the point of grad classes. Please don't cheat.

  • It is hardly worth the risk.

  • It is often very easily detected.

  • Part of your training is to learn how to make ethical decisions.

  • If you are under difficult circumstances of any kind, come talk to me about it.

  • When in doubt, cite the sources from which you got content/code/ideas and give credit to people who you worked with.

  • When in doubt, ask the instructor or the TAs before engaging in any specific forms of collaboration or use of outside material.

Here is the official statement from SBU on academic integrity, which I endorse and will follow for this class:

Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Faculty is required to report any suspected instances of academic dishonesty to the Academic Judiciary. Faculty in the Health Sciences Center (School of Health Technology & Management, Nursing, Social Welfare, Dental Medicine) and School of Medicine are required to follow their school-specific procedures. For more comprehensive information on academic integrity, including categories of academic dishonesty please refer to the academic judiciary website at http://www.stonybrook.edu/commcms/academic_integrity/index.html