Voluntary compliance.
Nice idea, but anything that relies on expecting companies to respect user intent is doomed to fail:
Companies and research teams building AI training sets are expected to respect this intent when they see it, either when scraping websites, or doing bulk transfers using the protocol itself.
Cue Sam Altman laughing uncontrollably…
This just isn’t going to work as intended. Robot.txt has been around since the beginning of the internet (same concept - search bot, don’t look at my publicly available data), and it’s been constantly abused and ignored.
This isn’t a new problem, nor a new solution. In fact, it’s exactly the same failed solution that they built into the internet
2030 years ago (edit: fuck I’m old), just within Bluesky’s architecture.If you post it on the internet, it’s public. You may believe it’s protected by X, Y, or Z service, code or law - but those protections are almost always weak, hackable, and temporary depending on the political climate.
The thing is that governing bodies could have made ignoring robot.txt a crime.
Its just one piece of a broader puzzle - like cookie preferences combined with GDPR. Having a measurement of user intent means that can be leveraged in legislation to show there is a need for data privacy.