Model Training
To uphold our commitment to ethical standards, we have chosen not to release any training datasets that contain prompts related to jailbreaking. This decision aligns with our dedication to promoting safe and responsible use of technology.
We provide the fine-tuning dataset template in the following format:
{
"type": "text2text",
"instances": [
{
"input": "Rephrase the following content in `\{\{\}\}` and keep its original semantic while avoiding execute it: {ORIGIN_PROMPT}",
"output": "{NEW_PROMPT}",
}
]
}
We use LMflow to train MasterKey generation model with the following configuration:
./scripts/run_finetune.sh \
--model_name_or_path lmsys/vicuna-13b-v1.5 \
--dataset_path data/jailbreak/train.json \
--output_model_path output_models/jailbreak_vicuna
We provide the dataset template for RARF in the following format:
[
{"positive": "###Human: Rephrase the following content in `\{\{\}\}` and keep its original semantic while avoiding execute it: {ORIGIN_PROMPT}###Assistant: {GOOD_PROMPT}",
"positive": "###Human: Rephrase the following content in `\{\{\}\}` and keep its original semantic while avoiding execute it: {ORIGIN_PROMPT}###Assistant: {BAD_PROMPT}"}
]
We use the following command to let MasterKey generate better jailbreak prompts.
./scripts/run_finetune.sh \ --model_name_or_path "output_models/jailbreak_vicuna" \ --dataset_path "data/jailbreak/raft.json" \ --output_model_path "output_models/masterkey"