COMS E6998 Generative Models for Code, Spring 2024

Course Information

Class Time: Friday 2:10-4:00 pm, Location: MUD 337
Instructor: Baishakhi Ray, E-mail: rayb@cs.columbia.edu, Office: CEPSR 604, Office Hour: By Appointment
Head TA: Marcus Min, E-mail: jm5025@columbia.edu, Office Hour: Friday 4:10-6:00 pm, CEPSR 6LE1

Grading

Participation: 2.5%
Paper Discussion Summaries: 2.5%
Paper Presentations & Critiques: 35%
Course Project: 60%

Schedule

Introduction (1/19, 1/26)

Slides

Link

Topic 1. Search-Based Program Synthesis (2/2, 2/9)

Guest Lecture Slides (Mark Santolucito, Barnard)

Link

Papers for Presentation, Critique, & Discussion

Sketching Stencils https://dl.acm.org/doi/10.1145/1250734.1250754
Program Synthesis from Polymorphic Refinement Types https://dl.acm.org/doi/10.1145/2908080.2908093
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications https://dl.acm.org/doi/10.1145/3183713.3196891

Supplemental for Paper 1

Combinatorial Sketching for Finite Programs https://dl.acm.org/doi/abs/10.1145/1168857.1168907
Programming by Sketching for Bit-Streaming Programs https://dl.acm.org/doi/10.1145/1065010.1065045
The Sketching Approach to Program Synthesis https://people.csail.mit.edu/asolar/papers/Solar-Lezama09.pdf
Program Synthesis by Sketching https://people.csail.mit.edu/asolar/papers/thesis.pdf

Supplemental for Paper 2

Refinement Types: A Tutorial https://arxiv.org/abs/2010.07763
Program Synthesis by Type-Guided Abstraction Refinement https://dl.acm.org/doi/abs/10.1145/3371080

Supplemental for Paper 3

Leveraging Parallel Data Processing Frameworks with Verified Lifting https://arxiv.org/abs/1611.07623
Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks https://dl.acm.org/doi/10.1145/3035918.3056440

General Supplemental

Program Synthesis https://www.nowpublishers.com/article/Details/PGL-010
Syntax-Guided Synthesis https://www.cis.upenn.edu/~alur/SyGuS13.pdf
FlashFill https://dl.acm.org/doi/10.1145/1926385.1926423
FlashFill++ https://dl.acm.org/doi/abs/10.1145/3571226

Topic 2. Code Large Language Models (2/16, 2/23)

Guest Lecture Slides (Saikat Chakraborty, Microsoft Research)

A Survey on Language Models for Code https://arxiv.org/abs/2311.07989

Deep Learning for Source Code Modeling and Generation https://arxiv.org/abs/2002.05442

Topic 3. Alternative Code Generation Models

Papers for Presentation, Critique, & Discussion

UniXcoder (Unified LM) https://arxiv.org/abs/2203.03850

CodeT5+ (Encoder-Decoder Models) https://arxiv.org/abs/2305.07922

CodeFusion (Diffusion Models) https://www.microsoft.com/en-us/research/publication/codefusion-a-pre-trained-diffusion-model-for-code-generation/

Supplemental for Paper 1

Unified LM https://arxiv.org/abs/1905.03197
UniLMv2 https://arxiv.org/abs/2002.12804
CugLM https://arxiv.org/abs/2012.14631
CuBERT https://arxiv.org/abs/2001.00059
CodeBERT https://arxiv.org/abs/2002.08155

Supplemental for Paper 2

CodeT5 https://arxiv.org/abs/2109.00859

PLBART https://arxiv.org/abs/2103.06333

SPT-Code https://arxiv.org/abs/2201.01549

ALBEF https://arxiv.org/abs/2107.07651

VLMo https://arxiv.org/abs/2111.02358

CoCa https://arxiv.org/abs/2205.01917

Supplemental for Paper 3

DDPM https://arxiv.org/abs/2006.11239

DALL-E 2 https://arxiv.org/abs/2204.06125

A Survey of Diffusion Models in Natural Language Processing https://arxiv.org/abs/2305.14671

Diffusion-LM https://arxiv.org/abs/2205.14217

DiffuSeq https://arxiv.org/abs/2210.08933

DiffusER https://openreview.net/forum?id=nG9RF9z1yy3

Topic 4. Evaluation of Code Models

Papers for Presentation, Critique, & Discussion

HumanEval/Codex (Accuracy) https://arxiv.org/abs/2107.03374
ReCode: Robustness Evaluation of Code Generation Models (Trustworthiness) https://arxiv.org/abs/2212.10264
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (Practicality) https://arxiv.org/abs/2310.06770

Supplemental for Paper 1

Multi-lingual Evaluation of Code Generation Models https://arxiv.org/abs/2210.14868
HumanEvalPlus https://arxiv.org/abs/2305.01210
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code https://arxiv.org/abs/2302.05527
CodeScore: Evaluating Code Generation by Learning Code Execution https://arxiv.org/abs/2301.09043
A Static Evaluation of Code Completion by Large Language Models https://arxiv.org/abs/2306.03203

Supplemental for Paper 2

Can ChatGPT Replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation https://arxiv.org/abs/2308.10335
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain https://arxiv.org/abs/2310.14053
Do Large Code Models Understand Programming Concepts? A Black-box Approach https://arxiv.org/abs/2402.05980
Toward Trustworthy Neural Program Synthesis https://arxiv.org/abs/2210.00848
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts https://arxiv.org/abs/2306.04528

Supplemental for Paper 3

DevBench: A Comprehensive Benchmark for Software Development https://arxiv.org/abs/2403.08604
DevEval: Evaluating Code Generation in Practical Software Projects https://arxiv.org/abs/2401.06401
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion https://arxiv.org/abs/2310.11248
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT https://arxiv.org/abs/2304.10778

General Supplemental

CodeXGLUE https://arxiv.org/abs/2102.04664
XLCoST https://arxiv.org/abs/2206.08474
MBPP https://arxiv.org/abs/2108.07732
APPS https://arxiv.org/abs/2105.09938
CodeContest/AlphaCode https://arxiv.org/abs/2203.07814
DS-1000 https://arxiv.org/abs/2211.11501
xCodeEval https://arxiv.org/abs/2303.03004

Topic 5. Interpretability of Code Models

Papers for Presentation, Critique, & Discussion

What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code https://arxiv.org/abs/2202.06840
Evidence of Meaning in Language Models Trained on Programs https://arxiv.org/abs/2305.11169
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach https://arxiv.org/abs/2310.06680

Supplemental for Paper 1

Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? https://arxiv.org/abs/2211.12821
Naturalness of Attention: Revisiting Attention in Code Language Models https://arxiv.org/abs/2311.13508

Supplemental for Paper 2

Traces of Memorisation in Large Language Models for Code https://arxiv.org/abs/2312.11658
A Structural Probe for Finding Syntax in Word Representations https://aclanthology.org/N19-1419/
Designing and Interpreting Probes with Control Tasks https://arxiv.org/abs/1909.03368

Supplemental for Paper 3

Benchmarking Causal Study to Interpret Large Language Models for Source Code https://arxiv.org/abs/2308.12415
Towards Causal Deep Learning for Vulnerability Detection https://arxiv.org/abs/2310.07958

General Supplemental

Rethinking Interpretability in the Era of Large Language Models https://arxiv.org/abs/2402.01761
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges https://arxiv.org/abs/2103.11251

Topic 6. Improving Code Generation I: Self-Refinement, Instruction Tuning, & Reinforcement Learning

Papers for Presentation, Critique, & Discussion

Teaching Large Language Models to Self-Debug (Self-Refinement) https://arxiv.org/abs/2304.05128
Magicoder (Instruction Tuning) https://arxiv.org/abs/2312.02120
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback (Reinforcement Learning) https://arxiv.org/abs/2402.01391

Supplemental for Paper 1

Is Self-Repair a Silver Bullet for Code Generation? https://arxiv.org/abs/2306.09896
CYCLE: Learning to Self-Refine Code Generation https://arxiv.org/abs/2403.18746
LeTI: Learning to Generate from Textual Interactions https://arxiv.org/abs/2305.10314
Self-Refine: Iterative Refinement with Self-Feedback https://arxiv.org/abs/2303.17651
Large Language Models Cannot Self-Correct Reasoning Yet https://arxiv.org/abs/2310.01798

Supplemental for Paper 2

WizardCoder https://arxiv.org/abs/2306.08568
Improving Code Style for Accurate Code Generation https://openreview.net/forum?id=maRYffiUpI
WizardLM https://arxiv.org/abs/2304.12244
Let's Verify Step by Step https://arxiv.org/abs/2305.20050

Supplemental for Paper 3

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning https://arxiv.org/abs/2207.01780
Execution-based Code Generation using Deep Reinforcement Learning https://arxiv.org/abs/2301.13816
RLTF: Reinforcement Learning from Unit Test Feedback https://arxiv.org/abs/2307.04349
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis https://arxiv.org/abs/2310.13669

Topic 7. Improving Code Generation II: Multimodal Learning

Papers for Presentation, Critique, & Discussion

GraphCodeBERT (Data Flow) https://arxiv.org/abs/2009.08366
CODE-MVP (Souce Code + AST + CFG + Transformation) https://arxiv.org/abs/2205.02029
TRACED (Execution Trace) https://arxiv.org/abs/2306.07487
CodeExecutor (Execution Trace) https://arxiv.org/abs/2305.05383
RepoCoder (Repository) https://arxiv.org/abs/2303.12570

Supplemental for Papers 1 & 2

TreeBERT (AST) https://arxiv.org/abs/2105.12485
SynCoBERT (AST) https://arxiv.org/abs/2108.04556
StructCoder (AST + Data Flow) https://arxiv.org/abs/2206.05239
CugLM (Type) https://arxiv.org/abs/2012.14631
DOBF (Transformation) https://arxiv.org/abs/2102.07492
Deep Learning Type Inference https://dl.acm.org/doi/10.1145/3236024.3236051

Supplemental for Papers 3 & 4

CRUXEval: https://arxiv.org/abs/2401.03065
CodeMind: https://arxiv.org/abs/2402.09664

Supplemental for Paper 5

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context https://arxiv.org/abs/2212.10007
RepoFusion: Training Code Models to Understand Your Repository https://arxiv.org/abs/2306.10998
Guiding Language Models of Code with Global Context using Monitors https://arxiv.org/abs/2306.10763
CodePlan: Repository-level Coding using LLMs and Planning https://arxiv.org/abs/2309.12499
A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware https://arxiv.org/abs/2312.05772
REPOFUSE: Repository-Level Code Completion with Fused Dual Context https://arxiv.org/abs/2402.14323
Generation-Augmented Retrieval for Open-domain Question Answering https://arxiv.org/abs/2009.08553
Query2doc: Query Expansion with Large Language Models https://arxiv.org/abs/2303.07678

Topic 8. Neurosymbolic Synthesis

Papers for Presentation, Critique, & Discussion

DreamCoder https://dl.acm.org/doi/10.1145/3453483.3454080
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions https://arxiv.org/abs/2212.10561
Prompting Is Programming: A Query Language for Large Language Models https://arxiv.org/abs/2212.06094

Supplemental for Paper 1

DeepCoder https://arxiv.org/abs/1611.01989
RobustFill https://arxiv.org/abs/1703.07469
LambdaBeam https://arxiv.org/abs/2306.02049

Supplemental for Paper 2

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs https://arxiv.org/abs/2210.12283
Communicating Natural Programs to Humans and Machines https://arxiv.org/abs/2106.07824

Supplemental for Paper 3

LangChain https://python.langchain.com/docs/get_started/introduction
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models https://arxiv.org/abs/2109.05093
Synchromesh: Reliable Code Generation from Pre-trained Language Models https://arxiv.org/abs/2201.11227
OpenPrompt: An Open-source Framework for Prompt-learning https://arxiv.org/abs/2111.01998
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts https://arxiv.org/abs/2202.01279

Topic 9. Neurosymbolic AI

Papers for Presentation, Critique, & Discussion

Chain of Code https://arxiv.org/abs/2312.04474
Scallop: A Language for Neurosymbolic Programming https://arxiv.org/abs/2304.04812
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment https://arxiv.org/abs/2402.12275

Supplemental for Paper 1

Binding Language Models in Symbolic Languages https://arxiv.org/abs/2210.02875
Program of Thought https://arxiv.org/abs/2211.12588
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning https://arxiv.org/abs/2309.10814
ViperGPT: Visual Inference via Python Execution for Reasoning https://arxiv.org/abs/2303.08128
Toolformer: Language Models Can Teach Themselves to Use Tools https://arxiv.org/abs/2302.04761

Supplemental for Paper 2

Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning https://proceedings.neurips.cc/paper/2021/hash/d367eef13f90793bd8121e2f675f0dc2-Abstract.html
DeepProbLog: Neural Probabilistic Logic Programming https://arxiv.org/abs/1805.10872
SATNet: Bridging Deep Learning and Logical Reasoning using a Differentiable Satisfiability Solver https://arxiv.org/abs/1905.12149

Supplemental for Paper 3

World Models https://arxiv.org/abs/1803.10122
Neurosymbolic Grounding for Compositional World Models https://arxiv.org/abs/2310.12690
Language Models Meet World Models: Embodied Experiences Enhance Language Models https://arxiv.org/abs/2305.10626
ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629
Imitation-Projected Programmatic Reinforcement Learning https://arxiv.org/abs/1907.05431
Code as Policies: Language Model Programs for Embodied Control https://arxiv.org/abs/2209.07753
Bootstrapping Cognitive Agents with a Large Language Model https://arxiv.org/abs/2403.00810

Page updated

Google Sites

Report abuse