Reading List on Large Language Models (LLM) and Generative AI
Masoud Makrehchi
In the previous post, I discussed my perspective on staying relevant in the era of Language Models (LLMs). Now, I’d like to share my reading list on LLMs. Please note that I haven’t read all of these papers yet, and the grouping may be inaccurate, with some overlap between groups. Nevertheless, I plan to gradually post my summaries on Medium. If you have read any of the following papers and have already summarized them, I would greatly appreciate it if you could send me the links to add to this list. Please be aware that the papers are not ordered based on any specific criterion.
How to read and review papers (Especially in AI, ML and NLP domains)
How to Read Research Papers: A Pragmatic Approach for ML Practitioners
How to read Machine Learning and Deep Learning Research papers
Best Practices for Using AI When Writing Scientific Manuscripts
Approaching literature review for academic purposes: The Literature Review Checklist
Natural Language Processing (NLP)
Embedding and Language Modeling
Distributed Representations of Words and Phrases and their Compositionality
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in VectorSpace [1]. [2]. [3]. [4]. [5] [6]
Attention Mechanism and Transformers
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Generative AI and Language Models
Choose Your Weapon: Survival Strategies for Depressed AI Academics
Improving Language Understanding by Generative Pre-Training [code]
Universal Intelligence: A Definition of Machine Intelligence
A Glimpse in ChatGPT Capabilities and its impact for AI research
Public Perception of Generative AI on Twitter: An Empirical Study Based on Occupation and Usage
Capturing Humans’ Mental Models of AI: An Item Response Theory Approach
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Inseq: An Interpretability Toolkit for Sequence Generation Models [code]
Introduction to LLMs
Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Fundamentals of LLMs and Foundation Models
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Systematic Rectification of Language Models via Dead-end Analysis
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Sparks of Artificial General Intelligence: Early experiments with GPT-4 (*)
GPT is becoming a Turing machine: Here are some ways to program it
Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models (*)
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
ChatGPT is not all you need. A State of the Art Review of large Generative AI models
What’s the Meaning of Superhuman Performance in Today’s NLU?
Neural networks learn to magnify areas near decision boundaries
Retentive Network: A Successor to Transformer for Large Language Models
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Emergent autonomous scientific research capabilities of large language models
Several categories of Large Language Models (LLMs): A Short Survey (*)
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
A jargon-free explanation of how AI large language models work
The Dawn of LMMs: Preliminary Explorations with GPT-4V(vision)
Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF (*)
What are LLMs?
Large Language Models are not Models of Natural Language: they are Corpus Models (*)
Language Models can Solve Computer Tasks → RCI (Recursive Criticism and Improvement) (*)
Self-Supervised Contextual Data Augmentation for Natural Language Processing
Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference
What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization
Don’t stop pretraining: Adapt language models to domains and tasks
Large Language Models are Built-in Autoregressive Search Engines
Large Language Models are Zero-Shot Rankers for Recommender Systems
Davinci the Dualist: the mind-body divide in large language models and in human learners
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners. [Code]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Understanding Large Language Models: The Physics of (Chat)GPT and BERT
Theory of Mind May Have Spontaneously Emerged in Large Language Models
Toolformer: Language Models Can Teach Themselves to Use Tools
Task Contamination: Language Models May Not Be Few-Shot Anymore
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
In-context learning
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work (S)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
Iterative Forward Tuning Boosts In-context Learning in Language Models
Learning to Retrieve In-Context Examples for Large Language Models (S)
In-context Autoencoder for Context Compression in a Large Language Model
What learning algorithm is in-context learning? Investigations with linear models
What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
Pre-Training, Fine-Tuning and Instruction-Tuning
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Simple and Scalable Strategies to Continually Pre-train Large Language Models
TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning
Prompt Engineering
Prompting Is Programming: A Query Language For Large Language Models (S)
Prefix-Tuning: Optimizing Continuous Prompts for Generation (S)
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts (S)
Privacy-Preserving Prompt Tuning for Large Language Model Services (S)
Satisfiability-Aided Language Models Using Declarative Prompting
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models (*) (S)
Chain-of-thought prompting for responding to in-depth dialogue questions with LLM (S)
HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks
Measuring and Narrowing the Compositionality Gap in Language Models
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm (S)
MemPrompt: Memory-assisted Prompt Editing with User Feedback (S)
PromptChainer: Chaining Large Language Model Prompts through Visual Programming (S)
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT (S)
The Power of Scale for Parameter-Efficient Prompt Tuning (S)
Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts (S)
PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting
AutoHint: Automatic Prompt Optimization with Hint Generation
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Evaluating LLMs
METEOR: An automatic metric for MT evaluation with improved correlation with human judgments
SemEval-2017 Task 1: Semantic Textual Similarity — Multilingual and Cross-lingual Focused Evaluation
Best practices for the human evaluation of automatically generated text
Beyond accuracy: Behavioral testing of NLP models with CheckList
Bleu: a method for automatic evaluation of machine translation
Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalization
Gemv2: Multilingual NLG benchmarking in a single line of code
Automatically constructing a corpus of sentential paraphrases
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction
Automatic Evaluation of Attribution by Large Language Models
Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
ANALOGICAL — A New Benchmark for Analogy of Long Text for Large Language Models
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets (*)
Measuring Attribution in Natural Language Generation Models (*)
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models (*)
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs (*)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models (*)
Exploring the Robustness of Large Language Models for Solving Programming Problems (*)
Style Over Substance: Evaluation Biases for Large Language Models
Comparing Traditional and LLM-based Search for Consumer Choice: A Randomized Experiment
Style Over Substance: Evaluation Biases for Large Language Models
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Zero-shot NLG evaluation through Pairware Comparisons with LLMs
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Evaluating Large Language Models on Controlled Generation Tasks
An Empirical Evaluation of LLMs for Solving Offensive Security Challenges
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
A User-Centric Benchmark for Evaluating Large Language Models
Reinforcement Learning with Human Feedback
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
Training language models to follow instructions with human feedback [*]
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Limitations and Risks of LLMs and the Mitigation Strategies
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Learnings from Data Integration for Augmented Language Models
All the News That’s Fit to Fabricate: AI-Generated Text as a Tool of Media Misinformation
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
Dissociating language and thought in large language models: a cognitive perspective
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Are Emergent Abilities of Large Language Models a Mirage? (*)
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
Beyond the Safeguards: Exploring the Security Risks of ChatGPT
Dual Use Concerns of Generative AI and Large Language Models
Knowledge Refinement via Interaction Between Search Engines and Large Language Models
A Drop of Ink may Make a Million Think: The Spread of False Information in Large Language Models
BiAs Detection for Large Language Models in the context of candidate screening
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility (*)
AI pioneer Yoshua Bengio: Governments must move fast to ‘protect the public’
The Web Can Be Your Oyster for Improving Large Language Models
Large Language Models can be Guided to Evade AI-Generated Text Detection
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
Multiscale Positive-Unlabeled Detection of AI-Generated Texts
Resolving Intent Ambiguities by Retrieving Discriminative Clarifying Questions
RARR: Researching and Revising What Language Models Say, Using Language Models
CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
On the Risk of Misinformation Pollution with Large Language Models
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Risks and Benefits of Large Language Models for the Environment
Exploring the Impact of Data Poisoning Attacks on Machine Learning Model Reliability
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI (*)
Citation: A Key to Building Responsible and Accountable Large Language Models
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
A Critical Review of Large Language Models: Sensitivity, Bias, and the Path Toward Specialized AI
Fighting Fire with Fire: Can ChatGPT Detect AI-generated Text?
Three Bricks to Consolidate Watermarks for Large Language Models
Neural Authorship Attribution: Stylometric Analysis on Large Language Models
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice
Unraveling the Risks: Cybersecurity and Large Language Models (LLMs)
Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning (*)
The Curse of Recursion: Training on Generated Data Makes Models Forget (*)
Emergent and Predictable Memorization in Large Language Models
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework
Hallucination
Large language models and the perils of their hallucinations
API-Bank: A BenchmarkforTool-AugmentedLLMs: mitigating LLM Hallucination
Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models → Hallucination mitigation
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
The Curious Case of Hallucinations in Neural Machine Translation
Retrieval Augmentation Reduces Hallucination in Conversation
Chain-of-Verification Reduces Hallucination in Large Language Models (*)
Chain-of-Verification Reduces Hallucination in Large Language Models
Zero-Resource Hallucination Prevention for Large Language Models
Chain-of-Verification Reduces Hallucination in Large Language Models
Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Red-Teaming LLM and Adversarial Attacks
Universal and Transferable Adversarial Attacks on Aligned Language Models. [code]
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial Uses
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Towards best practices in AGI safety and governance: A survey of expert opinion
Can Large Language Models Change User Preference Adversarially?
LLM Sandboxing: Early Lessons Learned (AntiGPT)
Universal Adversarial Triggers for Attacking and Analyzing NLP
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [data poisoning]
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Prompt Injection Attacks and Defenses in LLM-Integrated Applications
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Reasoning with LLMs
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Beyond Chain-of-Thought, Effective Graph-of Thought Reasoning in Large Language Models
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
. ReAct: Synergizing Reasoning and Acting in Language Models (*)
MathPrompter: Mathematical Reasoning using Large Language Models
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions
Chain of Logic: Rule-Based Reasoning with Large Language Models
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts
Least-to-most Prompting Enables Complex Reasoning In Large Language Models
Teaching Large Language Models to Reason with Reinforcement Learning
Chain of Logic: Rule-Based Reasoning with Large Language Models
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts
Data Generation using LLMs and LLM-Crowd-Sourcing
Retrieval-Augmented Generation for AI-Generated Content: A Survey
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort: Even with Language models, training data is still a big issue. Some models create training data with minimum human effort.
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
DATED: Guidelines for Creating Synthetic Datasets for Engineering Design Applications
Targeted Data Generation: Finding and Fixing Model Weaknesses
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Power-up! What Can Generative Models Do for Human Computation Workflows?
Generating Efficient Training Data via LLM-based Attribute Manipulation
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Active Learning Principles for In-Context Learning with Large Language Models
Using ChatGPT for Annotation of Attitude within the Appraisal Theory: Lessons Learned
Retrieval Augmented Generation (RAG)
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. [Math]
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Applications of LLMs
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark
StructGPT: A General Framework for Large Language Model to Reason over Structured Data
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning
Leveraging Large Language Models in Conversational Recommender Systems
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns
MemoryBank: Enhancing Large Language Models with Long-Term Memory
CooK: Empowering General-Purpose Language Models with Modular and Collaborative Knowledge
Self-Agreement: A Framework for Fine-tuning Language Models to Find Agreement among Diverse Opinions
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Breaking Language Barriers with a LEAP: Learning Strategies for Polyglot LLMs
ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing
ModuleFormer: Learning Modular Large Language Models From Uncurated Data
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
MathPrompter: Mathematical Reasoning using Large Language Models
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models
Revolutionizing Cyber Threat Detection with Large Language Models
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions
Full Automation of Goal-driven LLM Dialog Threads with And-Or Recursors and Refiner Oracles
Large Language Models as Sous Chefs: Revising Recipes with GPT-3
Voyager: An Open-Ended Embodied Agent with Large Language Models
DR. AI: How ChatGPT and other AI tools will change healthcare
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
Building Cooperative Embodied Agents Modularly with Large Language Models
Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media
Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study
Recommender Systems in the Era of Large Language Models (LLMs)
Named entity recognition using GPT for identifying comparable companies
Investigating ChatGPT’s Potential to Assist in Requirements Elicitation Processes
Intelligent Mutations in Genetic Programming: OpenAI Proposes Evolution Through Large Models
To Infinity and Beyond: SHOW-1 and Showrunner Agents in Multi-Agent Simulations
Matching Patients to Clinical Trials with Large Language Models
Leveraging Large Language Models (LLMs) for Process Mining (Technical Report)
Enhancing Job Recommendation through LLM-based Generative Adversarial Networks
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
In-Context Learning for Text Classification with Many Labels
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Self-Retrieval: Building an Information Retrieval System with One Large Language Mode
Shortcut learning in LLMs
Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey (*)
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models: shortcut learning
Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning
Shortcut Learning of Large Language Models in Natural Language Understanding
LLM and Toxicity
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models.
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
HateCheck: Functional Tests for Hate Speech Detection Models
Applications in Legal Domain
Legal Prompt Engineering for Multilingual Legal Judgement Prediction
A Short Survey of Viewing Large Language Models in Legal Aspect
Legal Prompting: Teaching a Language Model to Think Like a Lawyer
A Brief Report on LawGPT 1.0: A Virtual Legal Assistant Based on GPT-3
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models [legal]
Applications in Software Engineering and Coding
Introducing Code Llama, a state-of-the-art large language model for coding
CodeCoT and Beyond: Learning to Program and Test like a Developer [code]
Scope is all you need: Transforming LLMs for HPC Code [code]
CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors: Code Generation and CODEX
On the Use of GPT-4 for Creating Goal Models: An Exploratory Study
Language Models of Code are Few-Shot Commonsense Learners.: Code Generation and CODEX
Enabling Programming Thinking in Large Language Models Toward Code Generation: Code Generation and CODEX
Enabling Programming Thinking in Large Language Models Toward Code Generation
Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation
The potential of LLMs for coding with low-resource and domain-specific programming languages
Software Testing with Large Language Model: Survey, Landscape, and Vision
Large Language Models for Software Engineering: Survey and Open Problems
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
Question-Answering
PDFTriage: Question Answering over Long, Structured Documents
Evaluating Open-Domain Question Answering in the Era of Large Language Models
Integrating ChatGPT with internal knowledge base and question-answer platform
Towards Expert-Level Medical Question Answering with Large Language Models
Large Language Models Need Holistically Thought in Medical Conversational QA
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
Summarization
Improved Beam Search for Hallucination Mitigation in Abstractive Summarization
ODSum: New Benchmarks for Open Domain Multi-Document Summarization
LLM applications in biomedical, healthcare and pharma
Utilizing Large Language Models for Natural Interface to Pharmacology Databases
ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
A Platform for the Biomedical Application of Large Language Models
Large language model AI chatbots require approval as medical devices
Dangerous chatbots: Prof. Stephen Gilbert calls for AI chatbots to be approved as medical devices
Utilizing Large Language Models for Natural Interface to Pharmacology Databases
Matching Patients to Clinical Trials with Large Language Models
Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery
CancerGPT: Few-shot Drug Pair Synergy Prediction using Large Pre-trained Language Models
Generative AI in Life Sciences: Use Cases & Examples in 2023
How to Safely Integrate Large Language Models Into Health Care
MedEdit: Model Editing for Medical Question Answering with External Knowledge Bases
Towards Accurate Differential Diagnosis with Large Language Models
Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data
Graphs + LLMs
Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
Can Language Models Solve Graph Problems in Natural Language? (*)
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Talk Like a Graph: Encoding Graphs for Large Language Models
Knowledge Graphs + LLMs
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding → Hallucination mitigation and knowledge graph
Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs
LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text
Unifying Large Language Models and Knowledge Graphs: A Roadmap (*)
GraphCare: Enhancing Healthcare Predictions with Open-World Personalized Knowledge Graphs
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
On the Evolution of Knowledge Graphs: A Survey and Perspective
Intent Classification
Exploring Zero and Few-shot Techniques for Intent Classification
Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding
Prompt Learning With Knowledge Memorizing Prototypes For Generalized Few-Shot Intent Detection
Scaling LLMs, Compute Cost and SLMs
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
An empirical analysis of compute-optimal large language model training
Reducing Activation Recomputation in Large Transformer Models
Scalable Training of Language Models using JAX pjit and TPUv4
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Generative AI: Plagiarism and Education
Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases
Large language models challenge the future of higher education
Limits of Detecting Text Generated by Large-Scale Language Models
How Useful are Educational Questions Generated by Large Language Models?
LLM (Machine) Unlearning
Measuring and Modifying Factual Knowledge in Large Language Model
Machine Unlearning: its nature, scope, and importance for a “delete culture”
In-Context Unlearning: Language Models as Few Shot Unlearners
Responsible AI, LLM ethics and AI regulation
CoCoMo Computational Consciousness Modeling for Generative and Ethical AI
The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination
Language (Technology) is Power: A Critical Survey of “Bias” in NLP (*)
Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets
Demographic Dialectal Variation in Social Media: A Case Study of African-American English
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc
Frontier AI Regulation: Managing Emerging Risks to Public Safety
LLM Agents
LLaMA
LMM: Large Multimodal Models
Chain-of-Thought Prompt Distillation for Multimodal Named Entity and Multimodal Relation Extraction
Contextual Object Detection with Multimodal Large Language Models
The Multimodal and Modular Ai Chef: Complex Recipe Generation From Imagery
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
The Dawn of LMMs: Preliminary Explorations with GPT-4V(vision)
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
What is next?
What Should Data Science Education Do with Large Language Models
Choose Your Weapon: Survival Strategies for Depressed AI Academics
Cognition is All You Need -- The Next Layer of AI Above Large Language Models
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Now, Later, and Lasting: Ten Priorities for AI Research, Policy, and Practice
Books
Resources and Blogs
Prompt Engineering Institute
Want to run a large language model on your laptop? It's possible now! - [Part 1]
Top 10 List of Large Language Models Reshaping the Open-Source Arena in 2023
Large Language Models and the Future of Custom, Fine-tuned LLMs
Understanding the Power of Transformers: A Guide to Sentence Embeddings in Spark NLP
A Very Gentle Introduction to Large Language Models without the Hype
Chain-of-Thought Prompting — Improve Accuracy by Getting LLMs to Reason
Prompt Injection Threat is Real, Will Turn LLMs into Monsters
Generative AI — Protect your LLM against Prompt Injection in Production
The Internet’s New Favorite AI Proposes Torturing Iranians and Surveilling Mosques
Researchers found a command that could ‘jailbreak’ chatbots like Bard and GPT
Cybersecurity experts are warning about a new type of AI attack
Prompt injection explained, with video, slides, and a transcript
The ELI5 Guide to Prompt Injection: Techniques, Prevention Methods & Tools
ChainForge: an open-source visual programming environment for prompt engineering
llamafile is the new best way to run a LLM on your own computer
A practical guide to deploying Large Language Models Cheap, Good *and* Fast
How to secure your RAG application against LLM prompt injection attacks
CPT4All: A free-to-use, locally running, privacy-aware chatbot.
Model alignment protects against accidental harms, not intentional ones
Planning red teaming for large language models (LLMs) and their applications
Announcing Purple Llama: Towards open trust and safety in the new world of generative AI
Pre-training vs Fine-Tuning vs In-Context Learning of Large Language Models
A Comprehensive Study of Trustworthiness in Large Language Models
Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive
Understanding LoRA: Low-rank Adaption of Large Language Models
Tiny but mighty: The Phi-3 small language models with big potential
Unlearning Copyrighted Data From a Trained LLM – Is It Possible?
How AI copyright lawsuits could make the whole industry go extinct