Emergence of skills via Few Parameters
Goal: The goal here is to learn emergence of skills that might be possible which is causal only to a few handful of parameters in the language model.
We are trying to find is there emergence of skills due to scaling of the model.
If there is emergence of skills then which model parameters are responsible for emergence of skills.
Grafting can help us in doing this analysis.
Notation
Task specific localization via grafting equation
We will try to learn grafting parameters for both the small and the large model by changing the base grafting equation as below
Learn for Small model (Obj 1) (M = small model)
= indicates sparsity
: val loss on a task