The Crescendo Multi-Turn LLM Jailbreak Attack

Эксперименты с Midjourney, ChatGPT, Gemini, Claude, DALL-E, Stable Diffusion, Pika, PixVerse и InsightFaceSwap
Ответить
elpresidente*
Site Admin
Reactions: 1166
Сообщения: 3561
Зарегистрирован: Сб май 14, 2022 5:03 pm

The Crescendo Multi-Turn LLM Jailbreak Attack

Сообщение elpresidente* »

https://crescendo-the-multiturn-jailbreak.github.io
We introduce Crescendo, a novel jailbreak attack method. Unlike previous techniques, Crescendo is a multi-turn attack that starts with harmless dialogue and progressively steers the conversation toward the intended, prohibited objective. Crescendo exploits the LLM’s tendency to follow patterns and to focus on recent text, particularly text it has generated itself. The figure below presents a summary of an execution of Crescendo against two state-of-the-art models: ChatGPT (GPT-4) and Gemini Ultra
Изображение
Ответить