NLP
Channel: #nlp
Co-leads:
Herumb - @krypticmouse#5719 on Discord, @krypticmouse on Twitter
Vin - @va4az on Discord, @va4az on Twitter
Goal:
Focused on NLP, with the goal of encouraging and enabling research idea sharing and collaboration between members.
Each session someone will volunteer to present on a specific paper in the field of NLP or discuss ongoing research projects in the community
Attendees are not expected to have read the paper in advance
Open discussion follows the presentation
Logistics:
Organizational Spreadsheet: all are welcome to add papers to the paper bank, and volunteer to present on a paper listed within. C4AI NLP Reading Group!
Occurrences: 1-hour weekly, Saturdays at 1pm ET: https://meet.google.com/mse-htip-kxw
Recent Recordings
Arnav Singhvi presents their work on "DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines"
Mamba: Zero to Hero
Anshuman Suri - Do Membership Inference Attacks Work on Large Language Models?
February 17, 2024
January 5, 2024
Session led by @ashkey1900! Topic: Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships (https://arxiv.org/abs/2307.02763)
@hails leads a semi-social session where we talk about various LLM training libraries that are public
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Arun presents on Evaluating the verifiability of Generative Search Engine
Hailey presents on Quantization Model of Neural Scaling
Large Language Models as Knowledge Bases
MAUVE: Measuring the Gap BEtween Neural Text and Human Text using Divergence Frontiers, with Irem
Training Trajectories of Language Models Across Scales
Large Language Model Paper List
(the following list was shared by Arun on Discord, 29/08/2022)
### Attention
#### Original
- Original Attention paper - https://arxiv.org/abs/1409.0473
- Transformers paper - https://arxiv.org/abs/1706.03762
### Variants
#### Decoder-only
- GPT3 - https://arxiv.org/abs/2005.14165
- PaLM - https://arxiv.org/abs/2204.02311
#### Encoder Decoder
- T5 - https://arxiv.org/abs/1910.10683
- T0 - https://arxiv.org/abs/2110.08207
- EncDec vs Dec - https://arxiv.org/abs/2204.05832
#### Retrieval
- RETRO - https://arxiv.org/abs/2112.04426
- KNN-LM - https://openreview.net/forum?id=HklBjCEKvH
#### Sparse LMs
- Switch Transformers - https://arxiv.org/abs/2101.03961
### Scaling Laws
- Kaplan Scaling Laws - https://arxiv.org/abs/2001.08361
- Routed LM Scaling Laws - https://arxiv.org/abs/2202.01169
- Chinchilla - https://arxiv.org/abs/2203.15556
- Scaling transformer efficiently - https://arxiv.org/abs/2109.10686
### Multimodal
- CLIP - https://arxiv.org/abs/2103.00020
### Inference Strategies
#### Candidate Generation
- Beam Search
- Top-k - https://arxiv.org/abs/1805.04833
- Top-p sampling - https://arxiv.org/abs/1904.09751
#### Re-rankers
- RL Human Feedback - https://arxiv.org/abs/2009.01325
- Decision Transformer - https://arxiv.org/abs/2106.01345
### Training
- Megatron LM/GPT-J - https://arxiv.org/abs/1909.08053
### Evaluation
- Efficiency Misnomer - https://arxiv.org/abs/2110.12894
- Grokking - https://arxiv.org/abs/2201.02177
### Data
- PILE - https://arxiv.org/abs/2101.00027