Instructor: Prof. Pablo Robles-Granda Contact Information: Email: pdr AT illinois DOT edu; Phone: (217) 244-3416

Teaching Assistants (alphabetical order):

Time & Place: 100 Gregory Hall TR 9:30-10:45 - On Campus students (i.e., TUG/TGR); or Every other week for CSP/MC3/MC4. 

Note: This is the on-campus version of CS410. For a Coursera version (DSO Section) visit https://courses.engr.illinois.edu/cs410/fa2024/

This page provides basic information to help students decide whether they would be interested in taking the course. More up-to-date information about the course is available on Canvas and Campuswire (students will be added to CW by Wednesday evening/Thursday morning, and daily afterward)  

About the Course

The growth of "big data" created unprecedented opportunities to leverage computational and statistical approaches, which turn raw data into actionable knowledge that can support various application tasks. This is especially true for the optimization of decision-making in virtually all application domains, such as health and medicine, security and safety, learning and education, scientific discovery, and business intelligence. This course covers general computational techniques for building intelligent text information systems to help users manage and use large amounts of text data in all kinds of applications.

Text data includes all data in the form of natural language text (e.g., English text or Chinese text), including all web pages, social media data such as tweets, news, scientific literature, emails, government documents, and many other kinds of enterprise data. Text data play an essential role in our lives. Since we communicate using natural languages, we produce and consume a large amount of text data every day covering all kinds of topics. The explosive growth of text data makes it impossible for people to consume all the relevant text data promptly

The two main techniques to assist people in consuming, digesting, and making use of the text data are

This course covers both text retrieval and text mining, to provide you with the opportunity to see the complete spectrum of techniques used in building an intelligent text information system. Building on top of two MOOCs on Coursera covering the same topics and including a course project, this course enables you to learn the basic concepts, principles, and general techniques in text retrieval and mining, as well as gain hands-on experience with using software tools to develop interesting text data applications.

Textbook

ChengXiang Zhai, Sean Massung, Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining, ACM and Morgan & Claypool Publishers, 2016. (click here to read the book online)

Prerequisites

Students should come with good programming skills. CS225 or CS400 or an equivalent course is required. Knowledge of basic probability and statistics is a plus. If you are not sure whether you have the right background, please get in touch with the instructor.

Format

Onsite Lectures: All the lectures are in-person at the Siebel Center for on-campus students.

Quizzes and two Exams: Students are expected to complete a quiz at the end of each few sections (roughly once every other week). There are 12 modules with 6 modules on Text Retrieval and 6 modules on Text Mining. There will be two 1-hour exams, covering Text Retrieval and Text Mining, respectively. The first exam, which covers the first 6 modules on Text Retrieval, will be given in the middle of the semester; the second, which covers the 6 modules on Text Mining, will be given during the final exam's week (defined by the Office of the Registrar's)

Programming Assignments: There will be a few programming assignments spreading over the semester to enable students to gain practical skills by working on software toolkits, and experimenting with data sets and ideas for improving algorithms.

Course Project: The students are also expected to finish a course project. Group projects are highly encouraged. While project activities may spread over the entire semester, the main time period when students are expected to work intensively on the course projects is the last couple of weeks of the semester. 

Office Hours

The Instructor and TA(s) will hold weekly office hours. Zoom option may be available upon request for office hours only

Grading

Grading will be based on the following weighting scheme:

The letter grades are determined based on the following mapping:

A+: [95,100]

A:  [90,94]

A-: [85, 89]

B+: [80, 84]

B: [75, 79]

B-: [70,74]

C: [60, 69]

D: [55,59]

F: <55


Students are strongly encouraged to help each other by actively answering questions for each other on Campuswire. The most active contributors on Campuswire will receive up to 5 points extra credit, which would help move your grade up by one bracket.