Bluesky proposal for "User Intents," a way to ask content scrapers to not include your data

Airportline@lemm.ee · 8 hours ago

Bluesky proposal for "User Intents," a way to ask content scrapers to not include your data

fubarx@lemmy.world · 7 hours ago

Voluntary compliance.

arotrios@lemmy.world · edit-2 6 hours ago

Nice idea, but anything that relies on expecting companies to respect user intent is doomed to fail:

Companies and research teams building AI training sets are expected to respect this intent when they see it, either when scraping websites, or doing bulk transfers using the protocol itself.

Cue Sam Altman laughing uncontrollably…

This just isn’t going to work as intended. Robot.txt has been around since the beginning of the internet (same concept - search bot, don’t look at my publicly available data), and it’s been constantly abused and ignored.

This isn’t a new problem, nor a new solution. In fact, it’s exactly the same failed solution that they built into the internet 20 30 years ago (edit: fuck I’m old), just within Bluesky’s architecture.

If you post it on the internet, it’s public. You may believe it’s protected by X, Y, or Z service, code or law - but those protections are almost always weak, hackable, and temporary depending on the political climate.

HubertManne@moist.catsweat.com · 1 hour ago

The thing is that governing bodies could have made ignoring robot.txt a crime.

ArtificialHoldings@lemmy.world · 5 hours ago

Its just one piece of a broader puzzle - like cookie preferences combined with GDPR. Having a measurement of user intent means that can be leveraged in legislation to show there is a need for data privacy.

Bluesky proposal for "User Intents," a way to ask content scrapers to not include your data

Bluesky proposal for "User Intents," a way to ask content scrapers to not include your data

proposals/0008-user-intents at main · bluesky-social/proposals