Nanyang Technological University
SINGAPORE
Tsinghua University
CHINA
University of Illinois Urbana-Champaign
USA
National University of Singapore
SINGAPORE
University of Arizona
USA
Fudan University
CHINA
ABOUT OUR TUTORIAL
Large Language Models (LLMs) excel at zero- and few-shot learning but are restricted by the length of context windows when processing long documents. Two strategies have emerged to overcome this limitation: (1) Long Context (LC) methods, which extend or compress transformer architectures to input more text; and (2) Retrieval-Augmented Generation (RAG), which integrates external knowledge sources via embedding- or index-based retrieval. This half-day tutorial offers a unified, beginner-friendly introduction to both approaches. We first review transformer fundamentals—positional encoding, attention complexity, and common LC techniques. Next, we explain the classic RAG pipeline and recent RAG strategies, alongside evaluation metrics and benchmarks. We also analyze recent empirical studies to highlight strengths, limitations, and trade-offs of LC vs. RAG in terms of scalability, computational cost, and retrieval effectiveness. We conclude with best practices for real-world deployments, emerging hybrid architectures, and open research directions, equipping IR researchers and practitioners with actionable guidelines for processing long documents in LLMs.