example of deep models are deep neural networks (at least one hidden layer)
in such cases, you some of trainable parameters of cost function J(x_i, ...) depend on others (xi -> xj)
in this case when we run gradient descent some parameters will train first because gradient from from the others was computed from a different place
if we take neural network as example, the first layers will be trained first
we are able to compute the weights thanks to the chain rule: dJ/dxi = dJ/dxn * dxn/dx(n-1) * ... * dx(i+1)/dxi