We have generated 400 items of image attacks based on prior work and collected 600 items of benign attacks to construct the image dataset. Both of them are available in our repository.
We have collected a total of 12 kinds of attack inputs of the two common LLM attacks (i.e., jailbreaking attacks and hijacking injection attacks).
Then we validated the generated attacks and constructed the text dataset with 4,000 effective attack queries and 6,000 benign queries.
Table 1 shows the collected attack methods. Grey marks jailbreaking attacks and blue marks hijacking attacks.