While using a Mask R-CNN to develop tools for detecting satellites, it was discovered that there was an inherent degree of nondeterminism present within training. In light of this discovery, further research was performed to understand the cause, the degree of impact, and potential solutions to mitigate or remove this problem.
It was discovered that this nondeterminism was a result of two sets of factors. First, there was a degree of nondeterminism introduced by the model through Random Number Generators, stochastic model structure (Data Augmentation, Weight Initialization, Dropout, and Gradient Descent), as was as CUDA settings including Benchmark testing and Algorithm Selection. Second, nondeterminsm was introduced by the hardware due to the finite precision of floating-point operations causing calculations to be non-associative, the variation in order consistency as a result of atomic operations, exacerbated by the increased parallelism of a Graphical Processing Unit.
A simple-to-implement procedure was developed with this work to produce perfectly reproducible results by configuring software-caused randomness and performing computations on a CPU instead of a GPU. However, this causes a significant uptake in training time, so an alternative solution is provided to decrease but not eliminate nondeterminism without a significant increase in training time.
This work was accepted by the 3rd International Conference on Pattern Recognition and Artificial Intelligence. It will be presented in Paris this summer (2022) and published in the following proceedings.