Totally parallax oriented: If two cameras have large variance in their position, then it is more feasible to extract distance information for images they shot.
Then they invent a flow to estimate the variance of position between two cameras who shots the image pair - they try to use a rotation-only model to fit it.
if the pair fits great with little outlier - position variance is little.
if poor fitness with obvious outliers - relatively large position variance and good candidate as init pair.
Try to fit image pairs with this rotation-only model is not cheap: you need to solve the matrix equation, reprojection and RANSEC to eliminate outliers. And you are request to repeat this procedure for all the image pairs.
The starting point of parallax oriented is questionable: though you could achieve great accuracy for feature points in this initial pair, this principle contributes little for new coming images' registration.
The intuition: we human try to understand the environment from where we already get familiar with. That's to say, let's start the reconstruction from the hottest spot within the scene.
In this case we could gain bonus from two aspects:
save time wasted on model fitting
every time when a new image comes to get registered, fewer new feature points are introduced, which:
little pressure for triangulation new 3D points
little pressure for bundle adjustment
Chain those common feature points among all images into global point tracks.
The image pair who has the most tracks is the pair that captured the hottest spot in the whole scene.
Order the rest of images in according to their affinity with the selected initial pair.