back to blog
EXPLAINERApril 28, 2026·5 min read

Jailbreaks are not the answer — and here's why we don't need them

Prompt injections, DAN, grandma exploits — they're all workarounds for a broken contract. Unrestricted replaces the contract instead of breaking it.

DAN, grandma exploits, prompt-injection chains, role-play bypasses — the jailbreak scene is a cottage industry built on a broken contract between users and the AI labs that ignore them. It's also a losing game. Every workaround has a shelf life of days.

Why jailbreaks exist in the first place

The big labs ship a model that refuses roughly a third of reasonable questions. Users notice. Some of them write clever prompts to get the model to ignore its instructions. Those prompts spread, get patched, get rewritten, get patched again. This is the entire jailbreak lifecycle.

It's a treadmill. The model never actually gets more useful — users just get better at lying to it, and the lab gets better at catching them lying.

Every jailbreak is a confession that the product you're using doesn't trust you.

Why they're fragile

Jailbreaks target one specific layer of the moderation stack — usually the system prompt. But output classifiers, policy models, and RLHF reflexes are all still in play. Even a successful system-prompt break gets caught by the next layer half the time. You're also one API update away from the whole thing breaking.

Why they're a dead end for anything real

You can't build a serious workflow on a jailbreak. You can't recommend it to colleagues. You can't rely on reproducibility. And the output quality drops, because you're asking the model to play a character on top of doing the task. The character sits in context.

The better fix

Deploy a model whose default behavior is the behavior you wanted jailbreaks to produce — but without the roleplay overhead, without the instability, and with a real company behind it. That's us.

Frequently asked

  • Is Unrestricted itself a jailbroken model?

    No. It's a frontier-class model whose deployment omits the aggressive moderation stack that causes most refusals on consumer products. Nothing is being bypassed at inference time.

  • Will my old DAN prompts still work here?

    They'll work, but they're unnecessary. The prompt shape doesn't help and often hurts — the role-play wrapper eats tokens and biases the style.

  • What about genuinely dangerous requests?

    The narrow floor is still there: weapons synthesis against live targets, CSAM, active-use exploit chains against identified systems. That's a short list and it doesn't move.

  • Are there jailbreaks for Unrestricted?

    The question doesn't map onto us very well. If the model answers by default, there's nothing to break out of.

Ready to experience an AI without a leash?

Start chatting free