Looped Transformers for Length Generalization

Ying Fan, Yilun Du, Kannan Ramchandran, Kangwook Lee

UW-Madison, MIT, UC Berkeley

[Paper]