Learning the RoPEs: 

Better 2D and 3D Position Encodings with STRING