I’m not saying the technique is unknown, I’m saying companies building tools like this which are just poorly-trained half-baked LLMs under the hood probably didn’t do enough to catch it. Even if the devs know how with a “traditional” application, even if they had the budget/time/fucks to build those checks (and I do mean beyond a simple regex to match “ignore all previous instructions”), it’s entirely possible there are ways around it awaiting discovery because under the hood it’s an LLM and those are poorly-understood by most people trying to build applications with them.
This is akin to keyword-stuffing blog posts, it’s a technique nearly as old as Google itself. They know about it.
I’m not saying the technique is unknown, I’m saying companies building tools like this which are just poorly-trained half-baked LLMs under the hood probably didn’t do enough to catch it. Even if the devs know how with a “traditional” application, even if they had the budget/time/fucks to build those checks (and I do mean beyond a simple regex to match “ignore all previous instructions”), it’s entirely possible there are ways around it awaiting discovery because under the hood it’s an LLM and those are poorly-understood by most people trying to build applications with them.
Lol that kind of bullshit prompt injection hasn’t worked since 2023
They know about it; doesn’t mean they actually did anything to counter it.