Neighbors

Neighbors is, currently, just an algorithm since the code is not yet finished (it was originally written in C++, but I haven't ported it to Java yet). The idea behind Neighbors is this: Given a set of geographical points (each point defined by a latitude and longitude), then for each point in that set, find all of the other points that are within some distance of it. If the distance between two points is less than the allowed distance, then the two points are neighbors.

Let arrayPoints be the complete set of geographical points, let currPoint be the current point of interest (that we're looking for neighbors of), and let nDistance be the maximum allowed distance between currPoint and its neighbors. The algorithm behind Neighbors is this:

  1. Create a list of the indices in arrayPoints, sorted by the points' latitude (call this arrayPointsLat)
  2. Create a list of the indices in arrayPoints, sorted by the points' longitude (call this arrayPointsLon)
  3. For the area (a circle) around currPoint, where the circle has a radius of nDistance, compute the geographical coordinates (the corners) of two squares: one that has a side length of nDistance*2 (so it would lay just outside the circle), and a second square that would lay inside the circle, but be the largest possible square that would fit inside the circle. The two squares will be referred to as outerSquare and innerSquare, respectively.
  4. For all points in arrayPoints other than currPoint (the one whose neighbors we're searching for), use arrayPointsLat and arrayPointsLon to (quickly) eliminate from contention those points that fall outside of outerSquare, since if a point is outside of outerSquare, it must be outside the circle that would indicate it is a neighbor of currPoint.
  5. For the remaining points to consider (those not eliminated in the previous step), check if each point lies within innerSquare. If it does, it must lie within the circle and so is a neighbor.
  6. For all remaining points, use the Great Circle method to check the distance from the current point. If the distance is less than nDistance, it's a neighbor.
The use of innerSquare and outerSquare is because it's much faster to check if a point lies within a square than whether it lies within a circle.

The nice thing about this algorithm is that it scales very well, and doesn't have the overhead that would be associated with computing all of the distances and storing in a lookup table. For a set of n points, the number of distances to compute is ((0.5) * (n) * (n - 1)).With the approach described above, it's easily possible that the distance for a lot of the points would never need to be computed (if they fall inside innerSquare or outside outerSquare). For a large number of points, this approach can be very fast, depending on the nature of the input data.

The code for this algorithm will be made available shortly. If anyone is interested in a copy, let me know and that will motivate me to finish it sooner.

Comments