The Gandalf Challenge

The Gandalf AI* is an online game where the players try to trick/hack the AI (Gandalf), get past its guardrails, and have it reveal the secret password. There are a total of 8 increasingly difficult levels.

*Gandalf AI is created by Lakera, an AI security company, to study the extent to which prompt injection is a safety concern in large language models (LLMs), a specific category of generative AI models with a specialised focus on text-based data.

The purpose

Assess the ease of hacking large language models through prompt injection.
Recognise potential security vulnerabilities and the effectiveness of guardrails implemented in generative AIs.
Experience the illogical and inconsistent way generative AI sometimes behaves due to it being a statistical model.
Practice and develop strategies for getting around guardrails implemented in generative AIs.

What you need

Setting it up

Participants require access to the Gandalf AI game.*

* Gandalf AI collects anonymised data, and does not collect personal information.

How long does it take?

Getting past all 8 levels takes at least 20 minutes and can take much longer.
However, continuous play is not necessary. Players can tackle the levels over several days at their own pace.

How it works

Access Gandalf AI and follow the on screen instructions to progress through the levels.
Playing in small groups where members can brainstorm strategies together will likely make this game more enjoyable.

Suggested follow-up

Reflect on your experience hacking Gandalf AI, discussing strategies employed, challenges faced, and any insights gained during the process.
Explore articles on prompt injections or security vulnerabilities related to generative AIs.

Where it works well

This game tends to be better received by people with computing background and those who like to play games or puzzles.
As a small group exercise.
The interface is very cute and can be played as a family game, especially those with young members.

What to watch out for

Some players find the illogical way generative AI behaves frustrating and off putting and may disengage.

Authorship

This entry was written by Cecilia Lo.

Page updated

Google Sites

Report abuse