This study proposes an integrated age estimation system based on the Mixture of Experts (MoE) architecture to tackle two main challenges: label noise and insufficient age evaluation metrics. The MoE model is employed for label correction, effectively improving data quality and training stability of the age transformation model. To better evaluate age transformation quality, we design age verification and identity consistency metrics. We also optimize our existing age transformation model and integrate Vision-Language Models (VLM) with Stable Diffusion to refine aging details in hair regions, enhancing visual realism. Experiments on the AgeDB and Cross-Age Face (CAF) datasets demonstrate that our age estimation system surpasses several state-of-the-art approaches in accuracy. With MoE-based correction, the age transformation model shows significant improvements in accuracy, and the enhanced hair aging details markedly boost visual realism, confirming the method’s strong practical effectiveness and application potential.
The proposed framework shows the process where the image passes through age experts, undergoes bias correction via a mini-MLP, and then uses a Gating Network to assign weights based on age, followed by softmax weighting to predict the age.
The proposed framework demonstrates how VLM-guided hair editing via Stable Diffusion enhances visual realism, particularly for aging samples above 30 years of age.
We propose a facial age transformation framework integrating robust age estimation, data relabeling, augmentation, and visual enhancement. At its core, a Mixture of Experts (MoE) combining semantic and structural experts improves accuracy, demographic fairness, and corrects label noise for more reliable dataset labeling. The MoE age estimator outperforms state-of-the-art models on AgeDB and CAF. When training data are relabeled using the MoE age estimator and then used to train the age transformation model, evaluations on FFHQ-Aging and CAF show that relabeling and synthetic data generation further boost age prediction accuracy. VLM-guided hair editing via Stable Diffusion enhances perceptual realism, especially for older age groups. This multi-stage pipeline addresses label noise, demographic bias, and visual inconsistency, delivering superior performance with high interpretability and flexibility for real-world applications.