We introduced a novel method to train machine learning models in the presence of noisy labels, which are prevalent in various domains such as medical diagnosis and autonomous driving. The presence of noise in data potentially degrades a model’s generalization performance. Inspired by established literature highlighting how deep learning models are prone to overfitting noisy samples in the later epochs of training, we proposed a strategic approach. This strategy leverages the distance to class centroids in the latent space. It incorporates a discounting mechanism aiming to diminish the influence of samples that lie distant from all class centroids. By doing so, we effectively counteract the adverse effects of noisy labels. The foundational premise of our approach is the assumption that samples situated further from their respective class centroid in the initial stages of training are more likely to be associated with noise. Our methodology is grounded in robust theoretical principles and has been validated empirically through extensive experiments on several benchmark datasets. Our results show that our method consistently outperforms the existing state-of-the-art techniques, achieving significant improvements in classification accuracy in the presence of noisy labels. The code for our proposed loss function is available at https://github.com/wanifarooq/NCOD
A new loss using the learned similarity between a sample and its class representation as a soft label.
A novel, theoretically sound, regularization term is used to discern a discount factor for sample noise.
Comprehensive testing on datasets with both synthetic and natural noise highlights our method’s edge over SOTA.
Our approach eliminates the need to know the dataset’s noise rate or use anchor points, facilitating straightforward real-world applications.
Graph Neural Networks (GNNs) have achieved promising results for supervised learning tasks on graphs such as graph classification. Despite the great success of GNNs, many real-world graphs are often sparsely and noisily labeled, which could significantly degrade the performance of GNNs, as the noisy information could propagate to nodes via graph structure.
To engineer the Graph Neural Network (GNN) architecture, and augment our loss function to enhance the robustness of GNN against label noise in the context of supervised graph classification. This involves the refinement and extension of the existing GNN structure, accompanied by a modification of the loss function to mitigate the impact of label noise, ultimately contributing to the improved performance of supervised graph classification tasks.
This ongoing investigation has unveiled intriguing insights thus far. Initial observations suggest that Graph Neural Networks (GNNs) demonstrate greater resilience to noise compared to traditional neural networks (NNs). Moreover, throughout the training phase, there is a notable expansion in the distribution of latent space representations for individual samples, increasing intra-class distances. Surprisingly, this phenomenon does not significantly alter inter-class distances. We consistently found that when the GNN exhibits overfitting against noise, leading to a decline in overall model accuracy, our custom loss function consistently outperforms cross-entropy. However, in cases where the model's strength is insufficient or the dataset complexity hinders noise fitting, we observe comparable accuracies between our loss function and cross-entropy. In our ongoing research, we are further exploring the behavior of GNNs in learning especially in the presence of noise, and thus far, we have uncovered some remarkable findings.
The double descent phenomenon occurs in CNNs, ResNets, and transformers, in this phenomenon, the performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, it is not yet fully explained why it happens, and thus we view further study of this phenomenon as an important research direction
Currently, we possess preliminary findings, given that our efforts in this area have recently commenced. I find it particularly intriguing to explore the task space and its augmentation, viewing it as a promising avenue. If we manage to ascertain its equivalence within the task space—something I am confident about—it could provide insights into explaining the phenomenon of double descent. Moreover, this exploration could pave the way for achieving the sought-after capability of direct learning in neural networks, a domain that, to the best of my knowledge, remains unexplored by current practices.