The more that content on the web is “locked down” with more stringent API requests and identity verification, e.g. Twitter, the more I wonder if I should be archiving every single HTTP request my browser makes. Or, rather, I wonder if in the future there will be an Archive Team style decentralized network of hoarders who, as they naturally browse the web, establish and maintain an archive collectively, creating a “shadow” database of content. This shadow archive is owned entirely by the collective and thus requests to it are not subject to the limitations set by the source service.
The main point is that the hoarding is not distinguishable from regular browsing from the perspective of the source website, so the hoarding system can’t be shut down without also giving up access to regular users.
Verification that the content actually came from the real service could probably be done using the HTTPS packets themselves, and some sort of reputation system could prevent the source websites themselves from trying to poison the collective with spam.
Clearly, not all of the collected data should be shared, and without differential privacy techniques and fingerprint resistance the participating accounts can be connected to the content they share.
Has anything like this been attempted before? I’ve never participated in Archive Team, but from what I read it seems similar.