Towards AGI - High Performance Computing

HPC and Distributed Model Training

Courses:

Introduction to Parallel Computing (Stanford)

Videos:

Neural Networks: From Zero to Hero, by Andrej Karpathy

Tutorials

Efficient Training on Multiple GPUs (HuggingFace)

Distributed data parallel training using Pytorch on AWS

Transformer Math 101 (EleutherAI)

Techniques for Training Large Neural Networks (Older version: How to Train Really Large Models on Many GPUs? )

Harmonizing Multi-GPUs: Efficient Scaling of LLM Inference

7 posts tagged with "llms-in-production"

Energy Efficiency in High-Performance Computing

Presentations:

Outro To Parallel Computing

Videos:

Efficient Multi-GPU Strategies for Faster Deep Learning

Papers:

BlackMamba: Mixture of Experts for State-Space Models

Comparative Study of Large Language Model Architectures on Frontier

The Case for Co-Designing Model Architectures with Hardware

Near-linear scaling of gigantic-model training on AWS

Which scaling rule applies to large artificial neural networks

Computation vs. Communication Scaling for Future Transformers on Future Hardware

Page updated

Google Sites

Report abuse