Occlusion Order Accuracy (ACCoo)

A novel UOAIS evaluation metrics

About

Due to the subjectivity of the objects’ invisible masks, evaluating UOAIS model performance solely based on the overlap and boundary P/R/F of segmented objects may be inaccurate. The current UOAIS occlusion evaluation metrics measure how well the model can predict if individual objects are occluded. However, these metrics do not measure object occlusion ordering in a scene which would be helpful in scene understanding for robotic grasp planning. The Occlusion Order Adjacency Matrix (OOAM) can represent a scene’s occlusion order, while the Occlusion Order Directed Graph (OODG) can be derived from the OOAM. By knowing the occlusion order in a scene, a robot can use topological sorting of the predicted OODG to plan the order of grasping to reach the occluded object of interest. Thus, to evaluate the accuracy of object occlusion order in an image, we introduce a new metric called Occlusion Order Accuracy (ACCOO).

Occlusion Order Adjacency Matrix (OOAM)

Due to the subjectivity of the objects’ invisible mask boundaries, evaluating UOAIS model performance solely based on the overlap and boundary P/R/F of segmented objects may not be very useful. This is especially so when an object has a large occlusion region which could take on many different shapes and forms that results in a poor overlap and boundary P/R/F score. The existing UOAIS occlusion classification metrics measure how well the model can predict if individual objects are occluded. However, these metrics do not measure object occlusion ordering from a viewpoint which would be helpful in scene understanding for robotic grasp planning. The object occlusion order determines which objects are occluders [1] (occlude other objects) and which objects are occludees [2] (occluded by other objects). In other words, there is no effective metric to accurately evaluate a perception model’s ability to determine the overall occlusion order relationship of objects in a viewpoint.

A viewpoint’s object occlusion ordering can be represented in the form of an adjacency matrix. The author denotes this matrix as the Occlusion Order Adjacency Matrix (OOAM). Using object visible and occlusion masks, the author proposed Algorithm 1 to generate the Occlusion Order Adjacency Matrix (OOAM) from a viewpoint (Figure 7). The fundamental principle of this algorithm is based on the heuristic that if object i’s visible mask intersects object j’s occlusion mask, object i must occlude object j. For a scene with M objects, the OOAM contains M x M elements, where the element (i, j) is a binary value in the matrix which indicates whether object i occludes object j. The diagonal of the OOAM is always zero, as an object cannot occlude itself.

[1] An object that can occlude another object from the viewpoint.

[2] An object that is occluded by another object from the viewpoint.

Algorithm 1: Proposed pseudo-code for generating Occlusion Order Adjacency Matrix (OOAM) in a viewpoint

Figure 1: A visualization of annotations generated for a cluttered tabletop viewpoint generated by SynTable.

To further illustrate the idea of OOAM, Figure 1 shows a visualization of annotations for a tabletop scene generated by SynTable. In the derived OOAM using Algorithm 1, a non-zero entry at (i, j) represents that object i is occluding object j, where i represents an occluder object and j represents an occludee object. The OOAM generated in  Figure 8 shows a non-zero entry at (i, j) = (14,3), where i and j are the object indices (as seen in the objects’ visible bounding box labels in Figure 1).

Occlusion Order Directed Graph (OODG)

Given the OOAM, the Occlusion Order Directed Graph (OODG) can be derived from the OOAM which can help to visualize the occlusion order in the viewpoint. For each non-zero entry in the OOAM, a directed edge is drawn from node i to node j. If the entry is zero, an edge is not drawn. From Figure 1, the OOAM entry of (i, j) = (14,3) means that object 14 occludes object 3 and a directed edge will point from node 14 to node 3 in the OODG.

From the generated OODG, the author can also check if the graph is cyclic or acyclic using graph cyclic detection methods such as Depth First Search (DFS) and Breadth First Search (BFS). Only if the graph has no directed cycles (i.e. Occlusion Order Directed Acyclic Graph (OODAG)), can topological sorting be implemented to find the picking sequence to safely grasp all objects in the scene. By knowing the occlusion order in a scene, a robot can use topological sorting of the predicted OODAG to plan the order of grasping to reach the occluded object of interest.

In the generated OODG, the author further classifies objects in three different order layers - Top, Intermediate and Bottom to establish a hierarchy of object accessibility and grasping order. Objects at the top layer are easily accessible as they are not occluded by any other object and can be grasped directly. Objects in the intermediate layers mean that they are occluded but they also occlude other objects. Objects in the bottom layer are defined to be occluded and do not occlude other objects.

Calculation of Occlusion Order Accuracy, ACCoo

To evaluate the accuracy of object occlusion order in an image, we contribute a new metric called Occlusion Order Accuracy (ACCoo).  ACCoo is used to evaluate the model’s ability to accurately determine the order of occlusions in a clutter of objects by comparing the predicted OOAM (predOOAM) by the model to the ground truth OOAM (gtOOAM) using Algorithm 2.


Algorithm 2: Evaluating Occlusion Order Accuracy, ACCoo

Algorithm 2 requires the ground truth and predicted masks for the visible and occluded portions of objects in the viewpoint. The ground truth masks and predicted mask arrays (after Hungarian matching) are denoted as gtVisible, gtOcclusion, predVisible and predOcclusion respectively.

Figure 2: Visualization of Hungarian matching of predicted and ground truth masks on image v. Blue lines denote visible mask assignments. Crosses denote unassigned predicted visible masks.

Given an image v, we get the ground truth-prediction assignment pairs after Hungarian matching as illustrated in Figure 2. The predicted masks will then be re-indexed to match the ids of the ground truth masks. Following that, the predVisible and predOcclusion masks that belong to the assigned pairs will be extracted. After that, the gtOOAM and predOOAM will be obtained using Algorithm 1.

Figure 3: Calculating Occlusion Accuracy of image v

Figure 3 illustrates the calculation of occlusion accuracy in the earlier example of image v. The similarityMatrix is obtained by conducting an element-wise equality comparison between the ground truth OOAM (gtOOAM), and the predicted OOAM (predOOAM). After that, ACCOO can be calculated using Equation (3.5).

In Equation (3.5), the ACCOO represents the ratio of number of correctPredictedOcclusionNodes over the number of groundtruthOcclusionNodes. #correctPredictedOcclusionNodes denotes the number of correct occluder and occludee predictions for all objects in a viewpoint (represented by green highlighted cells in Figure 3).  A summation of all the elements in the similarityMatrix is carried out to obtain #correctPredictedOcclusionNodes. #groundtruthOcclusionNodes denotes the number of ground truth occluder and occlude nodes in a viewpoint. To obtain #groundtruthOcclusionNodes, we count the number of elements (gtOOAMSize) in the ground truth OOAM. As an object cannot occlude itself, the diagonal of any OOAM is always 0 and the diagonal of any similarityMatrix is always 1 (depicted as grey highlighted cells in Figure 3). Thus, we subtract the number of elements along the diagonal of the gtOOAM (denoted by gtOOAMDiagonalSize) from the calculation of #correctPredictedOcclusionNodes and #groundtruthOcclusionNodes.