ML System Throughput

How to increase throughput

batch size - reduces number of optimizer steps that needs to be performed

learning rate - higher learning rate can improve the speed of learning

optimizer class - optimizers modifies weights with different startegies, some approaches might be faster

residual connections - vanishing, exploding, ... gradients can cause poor training performance, therefore adding some skip connections in the nn graph can help training process -> so you can achive the same results running less experiments

model split architecture - we can run different parts of the model with more GPUs

data split flow

number and size of hidden layers

location of data versus location of training infrastructure - training models where data are located will increase throughput of the system

mixed precision policy - improve the speed by alowing computation with different precision: float16/float32/...

apply batch processing - minimize the RAM vs VRAM communication cycles, instead of sending samples one by one, the samples are send in batches

Replication - when serving the model, we can use k8s to replicate the model, so that we could serve more requests. All we have to use, is to

pick specific node selector

set GPU Operator

[optional] set up mig devices

Partitioning (data split) we can split data to several gpus with DataParallel This will enable multi gpu training and inferrence.

Page updated

Google Sites

Report abuse