This course provides an introduction to modern information retrieval techniques with the focus on fundamental principles and techniques, information infrastructure, and user/flow operation and management. We will start with the essentials of information retrieval including the fundamental ideas and approaches. Then, we will discuss the basics of web and enterprise search. Last, we will explore some important and hot specific topics such as web analytics, search engine optimization, query suggestion, sponsored search, and search in social networks/media.
Comprehensive understanding and skills in data structures, such as linked data structures, B-trees, and hash functions.
Analysis of algorithms and time complexity.
Operating systems, main memory and disk management, file systems.
Elementary probability theory and statistics, such as random variables, distributions, probability mass functions, sampling, and statistical tests.
The textbook and references are out of print. Please use the e-book.
(Official textbook) W. B. Croft, D. Metzler, and T. Strohman: Search Engines: Information Retrieval in Practice, Addison Wesley, 2010.
(Advanced reading) S. Büttcher, C. L. A. Clarke, and G. V. Comack: Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
(Advanced reading) Christopher D. Manning, Prabhakar Raghavan, and Hinrich Scht: Introduction to Information Retrieval, Cambridge University Press, 2008.
(Advanced reading) Ricardo Baeza-Yates and Berthier Ribeiro-Neto: Modern Information Retrieval: the Concepts and Technology behind Search (2nd edition), Addison Wesley, 2011.
The video lectures will be pre-recorded and posted at YouTube (in unlisted mode) with links provided on this webpage. You should view the video on or before the specified date.
The class will meet online every Thursday morning 11:30 am - 12:20 pm using Zoom. We will discuss assignments and projects, run quizzes, and do office hour at the time.
We will use Piazza.
Search engine architecture [slides, video, Chapter 2, May 21]
Text processing [slides, video: part 1, part 2, Chapter 4, June 4]
Indexing and ranking [slides, video: part 1, part 2, part 3, part 4, June 18]
Queries and interfaces [slides, July 2]
Information retrieval models [slides, July 14]
Evaluation [slides, July 28]
Link analysis and web search [slides, August 6]
Assignment 1, due at 11:59 pm May 31, 2020 (Sunday), covering introduction, search engine architecture, and web crawling. To access this assignment, you need the password that was distributed through email to all enrolled students.
Assignment 2, due at 11:59 pm June 19, 2020 (Friday), covering procssing text, indexing and ranking. To access this assignment, you need the password that was distributed through email to all enrolled students.
Assignment 3, due at 11:59 pm July 17, 2020 (Friday), covering queries and interfaces, and information retrieval models. To access this assignment, you need the password that was distributed through email to all enrolled students.
Assignment 4, due at 11:59 pm July 31, 2020 (Friday), covering evaluation. To access this assignment, you need the password that was distributed through email to all enrolled students.
Project 1, due at 11:59 pm August 1, 2020 (Sunday). This project uses the Lucene library.
Video tutorial: how to run the demo?
Video tutorial: how to compile program files?
Assignment 5, due at 11:59 pm August 10, 2020 (Monday), covering link analysis and web search. To access this assignment, you need the password that was distributed through email to all enrolled students.
SFU emails <@sfu.ca> are used except for otherwise specified.
Instrcutor: Dr. Jian Pei, email: jpei
TAs: