Erase, then Redraw: A Novel Data Augmentation Approach for Free Space Detection using Diffusion Model

[paper] [code]

Motivation

Standard data augmentation methods involve simple transformations, such as rotation and flipping, to generate new images. However, these new images often lack diversity in the semantic dimension. Some advanced data augmentation methods use techniques like block masking or block cropping and copying from the other images. While these methods can improve model performance, the images generated through them are often unlikely to occur in real-world scenarios. Therefore, we leverage powerful diffusion models to generate more realistic and semantically rich augmented data, further enhancing algorithm performance.

Method

The architecture of our proposed data augmentation pipeline. Our pipeline consists of two parts, namely, SAM-based erasing and stable-diffusion-based scene redrawing.

The redrawing process. After erasing the pixels of the region of interest, new data is generated through the reverse diffusion process of the well-trained diffusion model. Different text prompts can generate new image with different distributions. For example, in the figure, our textual prompts are “a traditional Chinese building” and “a traditional Arabic building”, resulting in the erased area producing buildings with completely different architectural styles.

Visual comparison with other data augmentation methods.

The first row of images represents the original data from the KITTI road dataset . The 2nd, 3rd, 4th, 5th, and 6th rows correspond to the synthetic data generated by the data augmentation methods RandomErasing, Cutout, Gridmask, CutMix, and our method, respectively.

Quantitative Results

The experimental results of our data augmentation method, along with other data augmentation methods such as Basic, RandomErasing, Cutout, CutMix, and GridMask. To ensure comprehensive experimentation, experiments were conducted on three different classic model on three different network architectures. Bold indicates the best result, while underline indicates the second-best result.

The experimental results of our data augmentation method, along with other data augmentation methods such as Basic, RandomErasing, Cutout, CutMix, and GridMask on two different multi-modal models. Bold indicates the best result, while underline indicates the second-best result.

Page updated

Google Sites

Report abuse