Course Description

Objective

The goal of this course is to study the theory and algorithms that support the organization and search of large collections of unstructured data including text, images, sound and video. The initial part of the course covers the basis of text-based information retrieval, the second part is devoted to the particularities and challenges of web search, and the final part reviews current information retrieval research topics. 

Methodology

  • Professor's lectures on fundamental topics
  • Practical assignments and exercises to be solved by students 
  • Technical papers' review and presentation by students
  • Final project
  • Written and practical tests. Students must show a good grasp of concepts and skills covered in the course.

Contents


Topic Reading Assignments Presentations
Part I: Information Retrieval Basis
IR0: Introduction     Navigating Knowledge: Hypertext Pioneers
Desk Set (1957)
Search, Google, and Life: Sergey Brin
Web 3.0
IR1: Boolean retrieval [IIR08] Chap 1   [IIR08] Chap 1 slides
video: Introduction to Information Retrieval
video: Term-Document Incidence Matrices
video: The Inverted Index
video: Query Processing with the Inverted Index
Google Shortcuts
IR2: Processing text [IIR08] Chap 2   [IIR08] Chap 2 slides
video: Word Tokenization- Stanford NLP
video: Word Normalization and Stemming - Stanford NLP
IR3: Tolerant retrieval [IIR08] Chap 3   [IIR08] Chap 3 slides
IR4: Index construction [IIR08] Chap 4 Assignment 1
NGram viewer example
[IIR08] Chap 4 slides
Map-reduce exercises (code)
Map-reduce with mrjob notebook
IR5: Vector space model [IIR08] Chap 6   [IIR08] Chap 6 slides
video: Vector space model (Spanish)  
IR6: Evaluation [IIR08] Chap 8   [IIR08] Chap 8 slides
IR7: Probabilistic IR [IIR08] Chap 11
[IIR08] Chap 11 slides
video: Probabilistic Approach to IR (Spanish)
video: Probability Ranking Principle 
(Spanish)
Part II: Web Search
WS1: Web Search     [IIR08] Chap 19 slides
How search works
WS2: Crawling and web indexes     [IIR08] Chap 20 slides 
WS3: Link analysis   Assignment 2 [IIR08] Chap 21 slides
Part III: Learning
ML1: Text classification     [IIR08] Chap 13 slides
Brief Introduction to Machine Learning
ML2: Vector space classification     [IIR08] Chap 14 slides
ML3: Machine learning on documents
    [IIR08] Chap 15 slides
Video:
Introduction to kernel methods (part 2)
Part IV: Advanced Topics
AT1: Relevance feedback and query expansion     Jorge Mario Carrasco
Jaime Humberto Niño
AT2: IR and software evolution     Oscar Paruma
Hugo Castellanos
AT3: Multimedia retrieval     Andrés Marquez
AT4: Latent topic modeling     Fredy Díaz
AT5: Multimodal retrieval     Lina Rosales
AT6: Map/Reduce     Juan Manuel Flórez
AT7: Search engine advertising     Felipe Baquero
AT8: Web mining     Manuel Forero
AT9: Learning to rank     Iván Duque

Grading

  • Assignments 40%
  • Exams 20%
  • Presentation 20%
  • Final project 20%

References

  • [IIR08] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. (main textbook)
  • [MIR99] Baeza-Yates, R. A., & Neto, B. R. (1999). Modern information retrieval.
  • [MAN99] Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann.
  • [TREC05] Voorhees, E., & Harman, D. K. (Eds.). (2005). TREC: Experiment and evaluation in information retrieval. MIT press.
  • [RR05] Moffat, A., Zobel, J., & Hawking, D. (2005, December). Recommended reading for IR research students. In ACM SIGIR Forum (Vol. 39, No. 2, pp. 3-14). ACM.

Resources