example - decoder-only transformer with training loop