Open Science Community

This LLM Cohort kicked off on January 10, 2025 and is intended to run till March 14, 2025 (Approximate date).
We present you 2 tracks that will run simultaneously with a weekly session for each track.
Here are the details of the tracks:

TRACK 1
Multilingual Long Context - Enhancing Processing with Advanced Techniques
Weekly Sessions: Saturday 8 am PT

What is the question we want to answer?

How can advanced positional encoding methods (RoPE, NoPE, LongROPE) and hybrid models that combine Transformers with State Space Models (SSMs) improve the processing and understanding of long-context sequences in natural language processing tasks?

Why is this question important?

Processing long-context sequences efficiently and effectively is a critical challenge in natural language processing (NLP). Many real-world applications, such as document summarization, long-form question answering, dialogue systems, and genomic sequence analysis, require models that can understand and reason over extended contexts. Traditional Transformers face limitations due to their quadratic computational complexity concerning sequence length and diminishing ability to capture long-range dependencies. By enhancing long-context processing, we can develop models that are more scalable, efficient, and capable of handling a broader range of tasks that involve long sequences, thereby pushing the boundaries of what current NLP models can achieve.

Resources:

TRACK 2
Evaluating Multilingual Long Context generation and reasoning
Weekly Sessions: Friday 10 am PT

What is the question we want to answer?

How can Multilingual Language Models (LLMs) be optimized for advanced comprehension and insight derivation in complex, long-context tasks across languages, particularly in domains like Healthcare, Finance, and Legal Systems, where accurate, contextually relevant responses are critical? Additionally, what are the current capabilities of existing LLMs in handling these tasks, and can we develop a data creation pipeline to create a new Long Context Benchmark?

Why is this question important?

Multilingual systems are crucial to enabling equitable access and broader adoption of AI across diverse languages, supporting users in contexts where technical, financial, medical, and legal information must be presented and understood in native languages. Many sectors, such as healthcare and finance, often require complex analyses or detailed insights that are deeply embedded within extensive and contextually rich documents. These LLMs must be capable of understanding long-context information to meet user needs accurately, especially when handling nuanced linguistic and contextual requirements across multiple languages.

Resources:

SESSION RECORDINGS:

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/01/10 09:55 PST - Recording

Kick-off Call

10 January 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/01/17 09:56 PST - Recording

Track 2 - Session 1

17 January 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/01/18 07:53 PST - Recording

Track 1 - Session 1

18 January 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/01/25 07:57 PST - Recording

Track 1 - Session 2

25 January 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/01/31 09:55 PST - Recording

Track 2 - Session 2

25 January 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/02/01 07:55 PST - Recording

Track 1 - Session 3

31 January 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/02/07 09:57 PST - Recording

Track 2 - Session 3

7 February 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/02/15 07:55 PST - Recording

Track 1 - Session 4

15 February 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/02/22 07:54 PST - Recording

Track 1 - Session 5

22 February 2025

C4AI - BIRDS (Beginners in Research Driven Studies) - 2025/03/15 07:45 PDT - Recording

Track 1 - Session 6

15 March 2025