CQ Tang (Chunqiang Tang)

Email: tangchq AT gmail. Chinese Name: 唐春强. LinkedIn. Github. Google Scholar. Hiring visiting professors.

I am a Senior Director of Engineering at Meta/Facebook. I joined Facebook in 2013 and have worked on a wide range of production systems used by billions of people, encompassing AI, GPU/ASIC/Accelerator, custom ARM CPU, LLM/Llama, hardware/software co-design, High Performance Computing (HPC), Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and databases. Prior to Facebook, I was a research scientist and manager at IBM T.J. Watson Research Center.

My publications below shed light on some aspects of my work. All these publications reflect the hyperscale production systems we have built at Meta, rather than merely research prototypes.

If you have time to read only one paper, I recommend this article I published in Communications of the ACM: "Meta's Hyperscale Infrastructure: Overview and Insights."

Recent Best Papers and Highlights:

[CACM'25 Research Highlights] TMO: Transparent Memory Offloading in Datacenters.
The companion "Technical Perspective: Memory Efficiency via Offloading in Warehouse-Scale Datacenters," written by Parthasarathy Ranganathan from Google.
[SOSP'24 Best Paper] FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production Monitoring
[OSDI'24 Best Paper] ServiceLab: Preventing Tiny Performance Regressions at Hyperscale through Pre-Production Testing
[ISCA'23 Best Paper] Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters
[ASPLOS'22 Best Paper] TMO: Transparent Memory Offloading in Datacenters
[IEEE Micro Top Picks'24] Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters
[IEEE Micro Top Picks'23] IOCost: Block IO Control for Containers in Datacenters
[SOSP'23, Best Serverless Paper of 2023] XFaaS: Hyperscale and Low Cost Serverless Functions at Meta. This paper was selected by the 9th Workshop on Serverless Computing as the Best Serverless Paper of 2023 out of all serverless papers published that year.

Selected Publications

[Meta AI Blog] Four MTIA Chips in Two Years: Scaling AI Experiences for Billions
[ISCA'26] MTIA-300: Meta's Training Chip with Embedded NIC Chiplets and Communication Offloading Engine
[ISCA'26] Vistara: Making CXL Real—Full Path from ASIC Design and OS Support to Hyperscale Deployment
[ISCA’26] LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
[MLSys’26] Optimizing Deployment Configurations for LLM Inference
[MLSys’26] Sparing Strategy to Minimize Reliability Impact on Large Scale Training Jobs
[TOCS] Detecting Tiny Performance Regressions at Hyperscale
[ISCA'25] Scaling Llama 3 Training with Efficient Parallelism Strategies
[ISCA'25] Meta's Second Generation AI Chip: Model-Chip Co-Design and Productionization Experiences
[ISCA'25] DCPerf: An Open-Source, Battle-Tested Performance Benchmark Suite for Datacenter Workloads
[Communications of the ACM] Meta's Hyperscale Infrastructure: Overview and Insights
[OSDI'24] MAST: Global Scheduling of ML Training across Geo-Distributed Datacenters at Hyperscale
[OSDI'24] Optimizing Resource Allocation in Hyperscale Datacenters: Scalability, Usability, and Experiences
[NSDI'24] MobileConfig: Remote Configuration Management for Mobile Apps at Hyperscale
[OSDI'23] Conveyor: One-Tool-Fits-All Continuous Software Deployment at Meta
[OSDI'23] ServiceRouter: Hyperscale and Minimal Cost Service Mesh at Meta
[OSDI'23] Global Capacity Management With Flux
[ASPLOS'22] IOCost: Block IO Control for Containers in Datacenters
[SOSP'21] Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications
[SOSP'21] RAS: Continuously Optimized Region-Wide Datacenter Resource Allocation
[OSDI'20] Twine: a Unified Cluster Management System for Shared Infrastructure
[SOSP'15] Holistic Configuration Management at Facebook

Full list of my publications