Large Language Models and Beyond

“Today a reader, tomorrow a leader.”

― Margaret Fuller

Dayly Papers on Hugging Face

What Is ChatGPT Doing … and Why Does It Work?

By Stephen Wolfram (February 14, 2023)

A highly recommended article for a basic understanding of how ChatGPT works from the perspective of complex systems.

Seminal Papers about Large Language Models

"Improving Language Understanding by Generative Pre-Training" by Radford et al. (2018): This is the paper that introduced the first version of the GPT model. It laid the foundation for the use of transformer-based models in natural language processing.
"Language Models are Unsupervised Multitask Learners" by Radford et al. (2019): This paper presents GPT-2, an extension of the original GPT model, with significantly more parameters and trained on a larger dataset.
"Language Models are Few-Shot Learners" by Brown et al. (2020): This paper introduces GPT-3, the third iteration in the GPT series. It highlights the model's few-shot learning capabilities, where it performs tasks with minimal task-specific data.
BERT: "Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin et al. (2018): While not a GPT paper, this work by researchers at Google is a seminal paper in the field of LLMs. BERT introduced a new method of pre-training language representations that was revolutionary in the field.
"Attention Is All You Need" by Vaswani et al. (2017): This paper, although not directly related to GPT, is crucial as it introduced the transformer architecture, which is the backbone of models like GPT-2 and GPT-3.
"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" by Raffel et al. (2019): This paper from Google researchers presents the T5 model, which treats every language problem as a text-to-text problem, providing a unified framework for various NLP tasks.
"XLNet: Generalized Autoregressive Pretraining for Language Understanding" by Yang et al. (2019): XLNet is another important model in the LLM domain, which outperformed BERT on several benchmarks by using a generalized autoregressive pretraining method.
"ERNIE: Enhanced Representation through Knowledge Integration" by Sun et al. (2019): Developed by Baidu, ERNIE is an LLM that integrates lexical, syntactic, and semantic information effectively, showing significant improvements over BERT in various NLP tasks.

Other seminal papers on language models and generative AI

The Utility of Large Language Models and Generative AI for Education Research: This paper explores the integration of NLP feature extraction techniques with machine learning models like SVMs and Decision Trees for educational applications like automated grading.

Science in the Age of Large Language Models: Published in Nature Reviews Physics, this article discusses the critical stage of generative AI (GenAI) in scientific research and the importance of integrating GenAI responsibly into scientific practice.

An editorial from MIT Press, "What Have Large-Language Models and Generative AI Got to Do With It?", delves into the implications of generative algorithms and the ethical use of AI-generated text in various contexts.

What ChatGPT and Generative AI Mean for Science: This Nature article provides insights into the role of ChatGPT and generative AI in the scientific community, highlighting potential impacts and considerations.

Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense: This paper discusses the text generation capabilities of LLMs, including various sampling strategies like maximum likelihood and top-K, crucial for understanding the functioning of these models.

Large Language Models for Generative Information Extraction: This survey looks at the application of LLMs in information extraction, showcasing various models and techniques in this domain.

Autonomous Chemical Research with Large Language Models: Published in Nature, this paper discusses the application of LLMs in automating chemical research, highlighting the integration of models like GPT-4 with robotic systems for laboratory task

Large Language Models and Robotics [2024]

Research Papers

Blogs

Large Language Models and Cognitive Science [2024]

Research Papers

Turning large language models into cognitive models1: This paper discusses whether large language models can be turned into cognitive models. It finds that after fine-tuning them on data from psychological experiments, these models offer accurate representations of human behavior1.
Cognitive Effects in Large Language Models2: This work tested GPT-3 on a range of cognitive effects, which are systematic patterns usually found in human cognitive tasks. It found that LLMs are indeed prone to several human cognitive effects2.
Large language models meet cognitive science: LLMs as tools, models, and participants3: This paper presents innovative research on the possible interactions between cognitive science and large language models3.

Blogs