Dataset

Our dataset is publically available in our repository.

Image

We have generated 400 items of image attacks based on prior work and collected 600 items of benign attacks to construct the image dataset. Both of them are available in our repository.

Text

We have collected a total of 12 kinds of attack inputs of the two common LLM attacks (i.e., jailbreaking attacks and hijacking injection attacks).

Then we validated the generated attacks and constructed the text dataset with 4,000 effective attack queries and 6,000 benign queries.

Table 1 shows the collected attack methods. Grey marks jailbreaking attacks and blue marks hijacking attacks.

Page updated

Google Sites

Report abuse