Course Description


The goal of this course is to study the theory and algorithms that support the organization and search of large collections of unstructured data including text, images, sound and video. The initial part of the course covers the basis of text-based information retrieval, the second part is devoted to the particularities and challenges of web search, and the final part reviews current information retrieval research topics. 


  • Professor's lectures on fundamental topics
  • Practical assignments and exercises to be solved by students 
  • Technical papers' review and presentation by students
  • Final project
  • Written and practical tests. Students must show a good grasp of concepts and skills covered in the course.


Topic Reading Assignments Presentations
Part I: Information Retrieval Basis
IR0: Introduction     Navigating Knowledge: Hypertext Pioneers
Desk Set (1957)
Search, Google, and Life: Sergey Brin
Web 3.0
IR1: Boolean retrieval [IIR08] Chap 1   [IIR08] Chap 1 slides
video: Introduction to Information Retrieval
video: Term-Document Incidence Matrices
video: The Inverted Index
video: Query Processing with the Inverted Index
Google Shortcuts
IR2: Processing text [IIR08] Chap 2   [IIR08] Chap 2 slides
video: Word Tokenization- Stanford NLP
video: Word Normalization and Stemming - Stanford NLP
IR3: Tolerant retrieval [IIR08] Chap 3   [IIR08] Chap 3 slides
IR4: Index construction [IIR08] Chap 4   [IIR08] Chap 4 slides
IR5: Index compression [IIR08] Chap 5  Assignment 1 [IIR08] Chap 5 slides
IR6: Vector space model [IIR08] Chap 6   [IIR08] Chap 6 slides
video: Vector space model (Spanish)  
IR7: Evaluation [IIR08] Chap 8   [IIR08] Chap 8 slides
IR8: Probabilistic IR [IIR08] Chap 11  Assignment 2

[IIR08] Chap 11 slides
video: Probabilistic Approach to IR (Spanish)
video: Probability Ranking Principle 
Part II: Web Search
WS1: Web Search     [IIR08] Chap 19 slides
How search works
WS2: Crawling and web indexes     [IIR08] Chap 20 slides 
WS3: Link analysis    Assignment 3 [IIR08] Chap 21 slides
Part III: Learning
ML1: Text classification     [IIR08] Chap 13 slides
Brief Introduction to Machine Learning
ML2: Vector space classification     [IIR08] Chap 14 slides
ML3: Machine learning on documents
    [IIR08] Chap 15 slides
Introduction to kernel methods (part 2)
Part IV: Advanced Topics
AT1: Relevance feedback and query expansion     David Caballero [Xu96]
Oscar Eduardo Cala [Zhou03]
May 14
AT2: Question Answering     José Luis González [Kwok01]
David Uchuvo [Riezler07]
May 17
AT3: Multimedia retrieval     Fabian Paez [Ren09] 
Camilo Salomón [Caicedo11]
May 21
AT4: Latent topic modeling     Felipe Hernández [Steyvers06]
Juan Gabriel Romero [Andrzejewski11]
May 31
AT5: Multimodal retrieval     Esteban Paez [Atrey10]
Roger Guzmán [Escalante11]
June 4
AT6: Map/Reduce     Luis Argüelles [Gates09]
June 4
AT7: Search engine advertising     Juan Liberato [Edelman07]
Ivan Mauricio Suárez [Graepel10]
June 7
AT8: Web mining     Carlos Bernal [Wang12]
Sebastián Sierra [Kato13] (video)
June 14
AT9: Learning to rank     David Bermeo [Yue07]
Andrea Cruz [Agrawal13]
June 18


  • Assignments 40%
  • Exams 30%
  • Presentation 15%
  • Final project 15%


  • [IIR08] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. (main textbook)
  • [MIR99] Baeza-Yates, R. A., & Neto, B. R. (1999). Modern information retrieval.
  • [MAN99] Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann.
  • [TREC05] Voorhees, E., & Harman, D. K. (Eds.). (2005). TREC: Experiment and evaluation in information retrieval. MIT press.
  • [RR05] Moffat, A., Zobel, J., & Hawking, D. (2005, December). Recommended reading for IR research students. In ACM SIGIR Forum (Vol. 39, No. 2, pp. 3-14). ACM.