Class Time: Friday 2:10-4:00 pm, Location: MUD 337
Instructor: Baishakhi Ray, E-mail: rayb@cs.columbia.edu, Office: CEPSR 604, Office Hour: By Appointment
Head TA: Marcus Min, E-mail: jm5025@columbia.edu, Office Hour: Friday 4:10-6:00 pm, CEPSR 6LE1
Participation: 2.5%
Paper Discussion Summaries: 2.5%
Paper Presentations & Critiques: 35%
Course Project: 60%
Sketching Stencils https://dl.acm.org/doi/10.1145/1250734.1250754
Program Synthesis from Polymorphic Refinement Types https://dl.acm.org/doi/10.1145/2908080.2908093
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications https://dl.acm.org/doi/10.1145/3183713.3196891
Combinatorial Sketching for Finite Programs https://dl.acm.org/doi/abs/10.1145/1168857.1168907
Programming by Sketching for Bit-Streaming Programs https://dl.acm.org/doi/10.1145/1065010.1065045
The Sketching Approach to Program Synthesis https://people.csail.mit.edu/asolar/papers/Solar-Lezama09.pdf
Program Synthesis by Sketching https://people.csail.mit.edu/asolar/papers/thesis.pdf
Refinement Types: A Tutorial https://arxiv.org/abs/2010.07763
Program Synthesis by Type-Guided Abstraction Refinement https://dl.acm.org/doi/abs/10.1145/3371080
Leveraging Parallel Data Processing Frameworks with Verified Lifting https://arxiv.org/abs/1611.07623
Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks https://dl.acm.org/doi/10.1145/3035918.3056440
Program Synthesis https://www.nowpublishers.com/article/Details/PGL-010
Syntax-Guided Synthesis https://www.cis.upenn.edu/~alur/SyGuS13.pdf
FlashFill++ https://dl.acm.org/doi/abs/10.1145/3571226
AlphaCode https://arxiv.org/abs/2203.07814
CodeGen2 https://arxiv.org/abs/2305.02309
Code Llama https://arxiv.org/abs/2308.12950
AlphaCode 2 https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf
AlphaCodium https://arxiv.org/abs/2401.08500
CodeGen https://arxiv.org/abs/2203.13474
InCoder https://arxiv.org/abs/2204.05999
SantaCoder https://arxiv.org/abs/2301.03988
Llama 2 https://arxiv.org/abs/2307.09288
StarCoder https://arxiv.org/abs/2305.06161
DeepSeek Coder https://deepseekcoder.github.io/
CodeFuse https://arxiv.org/abs/2310.06266
Large Language Models Meet NL2Code https://aclanthology.org/2023.acl-long.411/
A Survey on Language Models for Code https://arxiv.org/abs/2311.07989
Deep Learning for Source Code Modeling and Generation https://arxiv.org/abs/2002.05442
UniXcoder (Unified LM) https://arxiv.org/abs/2203.03850
CodeT5+ (Encoder-Decoder Models) https://arxiv.org/abs/2305.07922
CodeFusion (Diffusion Models) https://www.microsoft.com/en-us/research/publication/codefusion-a-pre-trained-diffusion-model-for-code-generation/
Unified LM https://arxiv.org/abs/1905.03197
UniLMv2 https://arxiv.org/abs/2002.12804
CodeBERT https://arxiv.org/abs/2002.08155
SPT-Code https://arxiv.org/abs/2201.01549
DALL-E 2 https://arxiv.org/abs/2204.06125
A Survey of Diffusion Models in Natural Language Processing https://arxiv.org/abs/2305.14671
Diffusion-LM https://arxiv.org/abs/2205.14217
DiffuSeq https://arxiv.org/abs/2210.08933
HumanEval/Codex (Accuracy) https://arxiv.org/abs/2107.03374
ReCode: Robustness Evaluation of Code Generation Models (Trustworthiness) https://arxiv.org/abs/2212.10264
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (Practicality) https://arxiv.org/abs/2310.06770
Multi-lingual Evaluation of Code Generation Models https://arxiv.org/abs/2210.14868
HumanEvalPlus https://arxiv.org/abs/2305.01210
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code https://arxiv.org/abs/2302.05527
CodeScore: Evaluating Code Generation by Learning Code Execution https://arxiv.org/abs/2301.09043
A Static Evaluation of Code Completion by Large Language Models https://arxiv.org/abs/2306.03203
Can ChatGPT Replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation https://arxiv.org/abs/2308.10335
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain https://arxiv.org/abs/2310.14053
Do Large Code Models Understand Programming Concepts? A Black-box Approach https://arxiv.org/abs/2402.05980
Toward Trustworthy Neural Program Synthesis https://arxiv.org/abs/2210.00848
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts https://arxiv.org/abs/2306.04528
DevBench: A Comprehensive Benchmark for Software Development https://arxiv.org/abs/2403.08604
DevEval: Evaluating Code Generation in Practical Software Projects https://arxiv.org/abs/2401.06401
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion https://arxiv.org/abs/2310.11248
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT https://arxiv.org/abs/2304.10778
CodeXGLUE https://arxiv.org/abs/2102.04664
CodeContest/AlphaCode https://arxiv.org/abs/2203.07814
DS-1000 https://arxiv.org/abs/2211.11501
xCodeEval https://arxiv.org/abs/2303.03004
What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code https://arxiv.org/abs/2202.06840
Evidence of Meaning in Language Models Trained on Programs https://arxiv.org/abs/2305.11169
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach https://arxiv.org/abs/2310.06680
Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? https://arxiv.org/abs/2211.12821
Naturalness of Attention: Revisiting Attention in Code Language Models https://arxiv.org/abs/2311.13508
Traces of Memorisation in Large Language Models for Code https://arxiv.org/abs/2312.11658
A Structural Probe for Finding Syntax in Word Representations https://aclanthology.org/N19-1419/
Designing and Interpreting Probes with Control Tasks https://arxiv.org/abs/1909.03368
Benchmarking Causal Study to Interpret Large Language Models for Source Code https://arxiv.org/abs/2308.12415
Towards Causal Deep Learning for Vulnerability Detection https://arxiv.org/abs/2310.07958
Rethinking Interpretability in the Era of Large Language Models https://arxiv.org/abs/2402.01761
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges https://arxiv.org/abs/2103.11251
Teaching Large Language Models to Self-Debug (Self-Refinement) https://arxiv.org/abs/2304.05128
Magicoder (Instruction Tuning) https://arxiv.org/abs/2312.02120
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback (Reinforcement Learning) https://arxiv.org/abs/2402.01391
Is Self-Repair a Silver Bullet for Code Generation? https://arxiv.org/abs/2306.09896
CYCLE: Learning to Self-Refine Code Generation https://arxiv.org/abs/2403.18746
LeTI: Learning to Generate from Textual Interactions https://arxiv.org/abs/2305.10314
Self-Refine: Iterative Refinement with Self-Feedback https://arxiv.org/abs/2303.17651
Large Language Models Cannot Self-Correct Reasoning Yet https://arxiv.org/abs/2310.01798
WizardCoder https://arxiv.org/abs/2306.08568
Improving Code Style for Accurate Code Generation https://openreview.net/forum?id=maRYffiUpI
WizardLM https://arxiv.org/abs/2304.12244
Let's Verify Step by Step https://arxiv.org/abs/2305.20050
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning https://arxiv.org/abs/2207.01780
Execution-based Code Generation using Deep Reinforcement Learning https://arxiv.org/abs/2301.13816
RLTF: Reinforcement Learning from Unit Test Feedback https://arxiv.org/abs/2307.04349
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis https://arxiv.org/abs/2310.13669
GraphCodeBERT (Data Flow) https://arxiv.org/abs/2009.08366
CODE-MVP (Souce Code + AST + CFG + Transformation) https://arxiv.org/abs/2205.02029
TRACED (Execution Trace) https://arxiv.org/abs/2306.07487
CodeExecutor (Execution Trace) https://arxiv.org/abs/2305.05383
RepoCoder (Repository) https://arxiv.org/abs/2303.12570
TreeBERT (AST) https://arxiv.org/abs/2105.12485
SynCoBERT (AST) https://arxiv.org/abs/2108.04556
StructCoder (AST + Data Flow) https://arxiv.org/abs/2206.05239
CugLM (Type) https://arxiv.org/abs/2012.14631
DOBF (Transformation) https://arxiv.org/abs/2102.07492
Deep Learning Type Inference https://dl.acm.org/doi/10.1145/3236024.3236051
CRUXEval: https://arxiv.org/abs/2401.03065
CodeMind: https://arxiv.org/abs/2402.09664
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context https://arxiv.org/abs/2212.10007
RepoFusion: Training Code Models to Understand Your Repository https://arxiv.org/abs/2306.10998
Guiding Language Models of Code with Global Context using Monitors https://arxiv.org/abs/2306.10763
CodePlan: Repository-level Coding using LLMs and Planning https://arxiv.org/abs/2309.12499
A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware https://arxiv.org/abs/2312.05772
REPOFUSE: Repository-Level Code Completion with Fused Dual Context https://arxiv.org/abs/2402.14323
Generation-Augmented Retrieval for Open-domain Question Answering https://arxiv.org/abs/2009.08553
Query2doc: Query Expansion with Large Language Models https://arxiv.org/abs/2303.07678
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions https://arxiv.org/abs/2212.10561
Prompting Is Programming: A Query Language for Large Language Models https://arxiv.org/abs/2212.06094
DeepCoder https://arxiv.org/abs/1611.01989
RobustFill https://arxiv.org/abs/1703.07469
LambdaBeam https://arxiv.org/abs/2306.02049
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs https://arxiv.org/abs/2210.12283
Communicating Natural Programs to Humans and Machines https://arxiv.org/abs/2106.07824
LangChain https://python.langchain.com/docs/get_started/introduction
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models https://arxiv.org/abs/2109.05093
Synchromesh: Reliable Code Generation from Pre-trained Language Models https://arxiv.org/abs/2201.11227
OpenPrompt: An Open-source Framework for Prompt-learning https://arxiv.org/abs/2111.01998
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts https://arxiv.org/abs/2202.01279
Chain of Code https://arxiv.org/abs/2312.04474
Scallop: A Language for Neurosymbolic Programming https://arxiv.org/abs/2304.04812
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment https://arxiv.org/abs/2402.12275
Binding Language Models in Symbolic Languages https://arxiv.org/abs/2210.02875
Program of Thought https://arxiv.org/abs/2211.12588
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning https://arxiv.org/abs/2309.10814
ViperGPT: Visual Inference via Python Execution for Reasoning https://arxiv.org/abs/2303.08128
Toolformer: Language Models Can Teach Themselves to Use Tools https://arxiv.org/abs/2302.04761
Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning https://proceedings.neurips.cc/paper/2021/hash/d367eef13f90793bd8121e2f675f0dc2-Abstract.html
DeepProbLog: Neural Probabilistic Logic Programming https://arxiv.org/abs/1805.10872
SATNet: Bridging Deep Learning and Logical Reasoning using a Differentiable Satisfiability Solver https://arxiv.org/abs/1905.12149
World Models https://arxiv.org/abs/1803.10122
Neurosymbolic Grounding for Compositional World Models https://arxiv.org/abs/2310.12690
Language Models Meet World Models: Embodied Experiences Enhance Language Models https://arxiv.org/abs/2305.10626
ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629
Imitation-Projected Programmatic Reinforcement Learning https://arxiv.org/abs/1907.05431
Code as Policies: Language Model Programs for Embodied Control https://arxiv.org/abs/2209.07753
Bootstrapping Cognitive Agents with a Large Language Model https://arxiv.org/abs/2403.00810