Search this site
Embedded Files
Towards AGI
  • Home
  • Schedule
  • Topics&Papers
    • Adversarial Robustness
    • Alignment and Safety
    • CompPsych-FoMo
    • Compression and Fast Inference
    • Continual Learning at Scale
    • Emergence & Phase Transitions in ML
    • Foundation Models
    • Generalization (iid and ood)
    • High Performance Computing
    • Knowledge Fusion
    • Neural Scaling Laws
    • Out-of-Distribution Generalization
    • Scaling Laws in Nature
    • State Space Models
    • Time Series Foundation Models
Towards AGI
  • Home
  • Schedule
  • Topics&Papers
    • Adversarial Robustness
    • Alignment and Safety
    • CompPsych-FoMo
    • Compression and Fast Inference
    • Continual Learning at Scale
    • Emergence & Phase Transitions in ML
    • Foundation Models
    • Generalization (iid and ood)
    • High Performance Computing
    • Knowledge Fusion
    • Neural Scaling Laws
    • Out-of-Distribution Generalization
    • Scaling Laws in Nature
    • State Space Models
    • Time Series Foundation Models
  • More
    • Home
    • Schedule
    • Topics&Papers
      • Adversarial Robustness
      • Alignment and Safety
      • CompPsych-FoMo
      • Compression and Fast Inference
      • Continual Learning at Scale
      • Emergence & Phase Transitions in ML
      • Foundation Models
      • Generalization (iid and ood)
      • High Performance Computing
      • Knowledge Fusion
      • Neural Scaling Laws
      • Out-of-Distribution Generalization
      • Scaling Laws in Nature
      • State Space Models
      • Time Series Foundation Models

HPC and Distributed Model Training

Courses:

Introduction to Parallel Computing (Stanford)

Videos:

Neural Networks: From Zero to Hero,   by Andrej   Karpathy

Tutorials

Efficient Training on Multiple GPUs  (HuggingFace)

Distributed data parallel training using Pytorch on AWS

Transformer Math 101 (EleutherAI)

Techniques for Training Large Neural Networks (Older version: How to Train Really Large Models on Many GPUs? )

Harmonizing Multi-GPUs: Efficient Scaling of LLM Inference

7 posts tagged with "llms-in-production"

Energy Efficiency in High-Performance Computing


Presentations:

Outro To Parallel Computing

Videos:

Efficient Multi-GPU Strategies for Faster Deep Learning

Papers:


BlackMamba: Mixture of Experts for State-Space Models

Comparative Study of Large Language Model Architectures on Frontier

The Case for Co-Designing Model Architectures with Hardware

Near-linear scaling of gigantic-model training on AWS 

Which scaling rule applies to large artificial neural networks 

Computation vs. Communication Scaling for Future Transformers on Future Hardware

Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse