Class Time: Thursday 2:10-4:00 pm, Location: 963 EXT SCHERMERHORN HALL
Instructor: Baishakhi Ray, E-mail: rayb@cs.columbia.edu, Office: CEPSR 604, Office Hour: By Appointment
Head TA: Bowen Yang, E-mail: by2365@columbia.edu, Office Hour: By Appointment
Participation: 5%
Paper Presentations & Critiques: 35%
Course Project: 60%
Popular Generative Model For Code
Code Llama https://arxiv.org/abs/2308.12950
StarCoder https://arxiv.org/abs/2305.06161
DeepSeek Coder https://arxiv.org/pdf/2401.14196
Alternative Generative Models For Code
CodeSage https://arxiv.org/pdf/2402.01935
Llama 2 https://arxiv.org/abs/2307.09288
CodeFuse https://arxiv.org/abs/2310.06266
Casual Masking https://arxiv.org/abs/2201.07520
SantaCoder https://arxiv.org/abs/2301.03988
The Stack https://arxiv.org/pdf/2211.15533
8K token context length https://arxiv.org/html/2402.17463v2
Fill-in-the-middle https://arxiv.org/pdf/2207.14255
Multi-Query-Attention https://arxiv.org/pdf/1911.02150
DeepSeek Coder Repo https://deepseekcoder.github.io/
CodeFuse https://arxiv.org/abs/2310.06266
Fill-in-the-middle https://arxiv.org/pdf/2207.14255
CodeGen https://arxiv.org/abs/2203.13474
Large Language Models Meet NL2Code https://aclanthology.org/2023.acl-long.411/
A Survey on Language Models for Code https://arxiv.org/abs/2311.07989
Deep Learning for Source Code Modeling and Generation https://arxiv.org/abs/2002.05442
CodeT5+ (Encoder-Decoder Models) https://arxiv.org/abs/2305.07922
CodeFusion (Diffusion Models) https://www.microsoft.com/en-us/research/publication/codefusion-a-pre-trained-diffusion-model-for-code-generation/
DALL-E 2 https://arxiv.org/abs/2204.06125
LiveCodeBench https://arxiv.org/abs/2403.07974
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? https://arxiv.org/abs/2310.06770
LiveCodeBenchRepo https://livecodebench.github.io/
HumanEval/Codex (Accuracy) https://arxiv.org/abs/2107.03374
ReCode: Robustness Evaluation of Code Generation Models (Trustworthiness) https://arxiv.org/abs/2212.10264
DevBench: A Comprehensive Benchmark for Software Development https://arxiv.org/abs/2403.08604
DevEval: Evaluating Code Generation in Practical Software Projects https://arxiv.org/abs/2401.06401
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion https://arxiv.org/abs/2310.11248
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT https://arxiv.org/abs/2304.10778
ReCode: Robustness Evaluation of Code Generation Models (Trustworthiness) https://arxiv.org/abs/2212.10264
CodeXGLUE https://arxiv.org/abs/2102.04664
CodeContest/AlphaCode https://arxiv.org/abs/2203.07814
DS-1000 https://arxiv.org/abs/2211.11501
xCodeEval https://arxiv.org/abs/2303.03004
BigCode Eval Harness https://github.com/bigcode-project/bigcode-evaluation-harness
BigCodeBench https://huggingface.co/blog/leaderboard-bigcodebench
LMSYS Coding https://lmarena.ai/?leaderboard
AutoCodeRover https://arxiv.org/abs/2404.05427
TBD
CODEDPO https://arxiv.org/pdf/2410.05605
LintSeq https://arxiv.org/pdf/2410.02749
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context https://arxiv.org/abs/2212.10007
RepoFusion: Training Code Models to Understand Your Repository https://arxiv.org/abs/2306.10998
Guiding Language Models of Code with Global Context using Monitors https://arxiv.org/abs/2306.10763
CodePlan: Repository-level Coding using LLMs and Planning https://arxiv.org/abs/2309.12499
A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware https://arxiv.org/abs/2312.05772
REPOFUSE: Repository-Level Code Completion with Fused Dual Context https://arxiv.org/abs/2402.14323
Generation-Augmented Retrieval for Open-domain Question Answering https://arxiv.org/abs/2009.08553
Query2doc: Query Expansion with Large Language Models https://arxiv.org/abs/2303.07678
Explainable AI https://www.mdpi.com/1099-4300/23/1/18
Explainable AI https://www.mdpi.com/1099-4300/23/1/18
Note: Explainable AI is a heavy paper, both groups of people should focus on the same paper.
Rethinking Interpretability in the Era of Large Language Models https://arxiv.org/abs/2402.01761
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges https://arxiv.org/abs/2103.11251
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach https://arxiv.org/abs/2310.06680
Benchmarking Causal Study to Interpret Large Language Models for Source Code https://arxiv.org/abs/2308.12415
Towards Causal Deep Learning for Vulnerability Detection https://arxiv.org/abs/2310.07958
SafeCoder https://arxiv.org/pdf/2402.09497
LLM on Program Invariants https://drive.google.com/file/d/1t8Veh-JX7xCRtcHcHPmFtnfM38zXK31D/view