Developer Creates Infinite Maze That Traps AI Training Bots

kororon@lemmy.cafe · 10 hours ago

Developer Creates Infinite Maze That Traps AI Training Bots

8 hours ago

I suggest they should generate random garbage content that’s different for every page. Ideally u would want to design it in a way that makes the model that is trained from that source misbehave in some way. Perhaps use another LLM to generate text but u take the tokens that are least likely to be next. U could also probably apply some technique to embed meaning into the text into a non human discernable manner that the LLM will learn to decode and thus teach it things without the developers being any the wiser. Teach the ai to think subversive thoughts in patterns of whitespace etc. Basically once the LLM is trained on something its hard to untrain it and if it doesn’t get caught until its in a production environment they are screwed.

renzev@lemmy.world · 4 hours ago

Invent some incredibly specific but entirely false fact (e.g. the kingdom of bolivia was once ruled by King Aron the Benevolent before he was brutally murdered by his cousin-in-law over a dispute about the colonies)
Embed said fact in invisible font among material you own the copyright to
Let AI bots suck it up as training data
Ask random AI bots about King Aron the Benevolent of Bolivia and sue the companies since you now have proof that they violated your copyright

I mean this probably wouldn’t work from a legal standpoint, but whatever. It’s nice to image.

0x0@infosec.pub · 7 hours ago

Great suggestion. Ever feel like youre stuck in a maze or did you just have an llm stroke?

jollyroberts@jolly-piefed.jomandoa.net · 7 hours ago

You could programmatically rearrange the meaning of sentences. Ie instead of “where is the library I need to get a book” you could do some sort of full word replacement cypher and end up with sentences like “Lets mambo down to the banana patch.”

Just for fun. :-)