@Codeberg@social.anoxinon.de
Actually since yesterday I'm pondering about the idea to build a #federated version of stackoverflow, nothing written yet, I'm reading, researching.
Also, right now I was checking this stack exchange sqlite db under CC BY-SA 4.0 to check how useful and doable would be import this data and using as a base for the federated version.
Also wondering if we could use this data somehow to train our own opensource AI to help the community, but I'm do not have knowledge on LLM/AI things. Please if there is any expert I would appreciate the opinion on that.
https://seqlite.puny.engineering
EDIT:
A better place to download the dump content, with more interesting tables, like the one with the Votes; the other link the dumped data only contains two tables Users and Posts. Right now downloading the whole data related with stack overflow, and will take some time due my humble home internet connection, so I didn't have the chance to take a look at the data, but I guess that's the interesting thing.
here:
https://archive.org/download/stackexchange
Codeberg was asking about this. The linked toot by a commenter points to :
It seems to matter for the users at Stack Overflow. And why should anybody give anything for free to the crooks in Silicon Valley. All they do is create technology designed to extract value out of people and give as little as possible back.
Because that’s the nature of FOSS. The good news is, if they trained on you data that’s licensed CC BY-SA (as all SO content is), then you can request their source code, and they legally must provide it.
It’s not about privacy. It’s about AI companies stealing other peoples work and knowledge and profiting. Like what they did with artists. And I think that’s bothering a lot of people. It’s kind of sad that we cannot exchange information with each other for free, without some Silicon Valley crooks taking advantage and trying to convert other people’s good will into profit.
These LLMs are also polluting the web with AI junk and slop. The web is absolutely tainted with shitty ChatGPT text and images, making it harder and harder to find authentic information. I think a lot of people don’t want to contribute with that.
How could anybody stop the AI robbers from stealing content from the fediverse?
Why does that matter? The content is licensed CC BY-SA. The point here is to prevent AI answers.
It seems to matter for the users at Stack Overflow. And why should anybody give anything for free to the crooks in Silicon Valley. All they do is create technology designed to extract value out of people and give as little as possible back.
Because that’s the nature of FOSS. The good news is, if they trained on you data that’s licensed CC BY-SA (as all SO content is), then you can request their source code, and they legally must provide it.
This is a good thing.
It’s not about privacy. It’s about AI companies stealing other peoples work and knowledge and profiting. Like what they did with artists. And I think that’s bothering a lot of people. It’s kind of sad that we cannot exchange information with each other for free, without some Silicon Valley crooks taking advantage and trying to convert other people’s good will into profit.
These LLMs are also polluting the web with AI junk and slop. The web is absolutely tainted with shitty ChatGPT text and images, making it harder and harder to find authentic information. I think a lot of people don’t want to contribute with that.
robots.txt may help : https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website or blocking by IP addresses.
deleted by creator