In this unusual summer semester during the pandemic, I will teach CMPT 456 (Information Retrieval and Web Search) and CMPT 459 (Data Mining). I see this a unique opportunity to offer a suite of courses covering data science from data extraction to data processing and intelligent analytics. Moreover, it is a unique moment to foster a group of young data scientists engaging to the current crisis and the community. Therefore, I will run an experiment to make these two courses twins in the sense that the two courses will echo each other and will be made relevant to the current pandemic crisis.
Specifically, the two courses will share the same project data sets. Students from the two classes can team up to conduct projects together. Some components in a course will refer to the other course. That said, those two are different courses covering different materials.
The course materials will be made available to public. A student can take both courses at the same time, or take only one of the twins. The course assignments and projects will use the latest COVID-19 data sets as much as possible. Students are encouraged to communicate their research outcomes from the course projects to public through social media and other proper channels. The enrollment limits will be raised as much as possible to accommodate more students who are interested in the subjects.
This course provides an introduction to modern information retrieval techniques with the focus on fundamental principles and techniques, information infrastructure, and user/flow operation and management. We will start with the essentials of information retrieval including the fundamental ideas and approaches. Then, we will discuss the basics of web and enterprise search. Last, we will explore some important and hot specific topics such as web analytics, search engine optimization, query suggestion, sponsored search, search in social networks/media, commercialization, and fairness.
What should you do when you are facing a huge amount of complicated data from real life applications? This course introduces the core techniques in big data analytics, namely knowledge discovery in databases (KDD), also known as data mining (DM). It focuses on the principles, fundamental algorithms, implementations, and applications.
Lectures (about 120-150 minutes per week for 13 weeks) will be available for public access
One hour per week of online meetings and office hours will be exclusive for enrolled students only
We will use Piazza and Zoom