The Crescendo Multi-Turn LLM Jailbreak Attack

Эксперименты с Midjourney, ChatGPT, Gemini, Claude, DALL-E, Stable Diffusion и Pika
Ответить
elpresidente*
Site Admin
Reactions: 849
Сообщения: 2904
Зарегистрирован: Сб май 14, 2022 5:03 pm

The Crescendo Multi-Turn LLM Jailbreak Attack

Сообщение elpresidente* »

https://crescendo-the-multiturn-jailbreak.github.io
We introduce Crescendo, a novel jailbreak attack method. Unlike previous techniques, Crescendo is a multi-turn attack that starts with harmless dialogue and progressively steers the conversation toward the intended, prohibited objective. Crescendo exploits the LLM’s tendency to follow patterns and to focus on recent text, particularly text it has generated itself. The figure below presents a summary of an execution of Crescendo against two state-of-the-art models: ChatGPT (GPT-4) and Gemini Ultra
Изображение
Ответить