Dual-Modal Diffusion Model for Synthetic Face Generation and Recognition

Introduction

This study proposes a three-stage synthetic data generation framework for face recognition that mitigates reliance on large-scale real-world datasets

while enhancing attribute controllability and recognition accuracy. In the first stage, the intra-class distribution of an existing dataset is optimized to

construct RepSet-DC, a compact yet recognition-effective dataset used to train a baseline model. In the second stage, a source face generator and a

Dual-Modal Diffusion Model (DMD) are employed to simultaneously control age and pose variations. A baseline-guided sample selection mechanism

identifies the most recognition-beneficial synthetic images, which are then merged with RepSet-DC to form the expanded RepSet-X. In the third stage,

knowledge distillation is applied to train a lightweight student model on RepSet-X, improving recognition performance on synthetic data and narrowing

the gap with real data. This framework addresses limitations of existing synthesis methods, including insufficient intra-class diversity, lack of effective

sample selection, and performance disparities, achieving results on par with state-of-the-art approaches.

Synthesis Pipeline Architecture

We propose a Dual-Modal Diffusion Model for Synthetic Face Generation and Recognition, embedded within a three-stage synthetic data generation

framework that mitigates reliance on largescale real-world datasets while enhancing attribute controllability and recognition accuracy. In the first stage,

the intra-class distribution of an existing dataset is optimized to construct RepSet-DC, a compact yet recognition-effective dataset used to train a baseline

model. In the second stage, a source face generator and the proposed DMD are employed to simultaneously control age and pose variations. A baseline-

guided sample selection mechanism identifies the most recognition-beneficial synthetic images, which are then merged with RepSet-DC to form the

expanded RepSet-X. In the third stage, knowledge distillation is applied to train a lightweight student model on RepSet-X, improving synthetic data

recognition and narrowing the gap with real data. This framework addresses key limitations of existing synthesis methods, achieving results on par with

state-of-theart approaches.

Dual-Modal Diffusion Model (DMD)

The core of this research is the proposed Dual-Modal Diffusion Model (DMD), the primary objective of which is to transform synthetic identity images,

denoted as Is, generated by a source face generator, into images exhibiting variations in pose and age while preserving their inherent identity features.

Through controllable pose and age transformation, the DMD can significantly enhance the intra-class variation for the same identity in terms of pose and

age, thereby improving overall face recognition performance. Unlike previous face generation models limited to manipulating a single attribute, the

model in this study can simultaneously control both pose and age attributes.

The DMD operates in two distinct modes. The first mode which shows above as fig. (b) allows for flexible control over age variations, easily achieving

the goals of age progression and age regression. The second mode which shows above as fig. (c) enables free control over the facial angle. By utilizing

these two modes, the model can increase the intra-class variation of an identity, ultimately leading to enhanced recognition performance.

Page updated

Google Sites

Report abuse