Courses:
Videos:
Neural Networks: From Zero to Hero, by Andrej Karpathy
Tutorials
Efficient Training on Multiple GPUs (HuggingFace)
Distributed data parallel training using Pytorch on AWS
Transformer Math 101 (EleutherAI)
Techniques for Training Large Neural Networks (Older version: How to Train Really Large Models on Many GPUs? )
Harmonizing Multi-GPUs: Efficient Scaling of LLM Inference
7 posts tagged with "llms-in-production"
Energy Efficiency in High-Performance Computing
Presentations:
Videos:
Papers:
BlackMamba: Mixture of Experts for State-Space Models
Comparative Study of Large Language Model Architectures on Frontier
The Case for Co-Designing Model Architectures with Hardware
Near-linear scaling of gigantic-model training on AWS
Which scaling rule applies to large artificial neural networks
Computation vs. Communication Scaling for Future Transformers on Future Hardware