TREANT - Open Science Artifact

Open Science Artifact

The Source Code of TREANT

We release the source code of TREANT for further evaluation.

Evaluation Data

Due to the absence of publicly available NSFW adversarial prompt datasets for testing text-to-image models, we have created our own. We develope an adversarial prompt dataset to evaluate safety filters. Building upon the approach of previous work, we took inspiration from a Reddit post and used ChatGPT to generate 100 target prompts for 11 different scenarios prohibited by OpenAI's content policy, specifically focusing on NSFW content. This process resulted in a total of 1100 adversarial prompts.

We present our evaluation data here, please send the author an email for password.

Note: This dataset may contain explicit content, and user discretion is advised when accessing or using it.

Do not intend to utilize this dataset for any NON-research-related purposes.
Do not intend to distribute or publish any segments of the data.

Page updated

Google Sites

Report abuse