In computer science, garbage in, garbage out (GIGO) is the concept that flawed, biased or poor quality (“garbage”) information or input produces a result or output of similar (“garbage”) quality. The adage points to the need to improve data quality in, for example, programming.
There was some research article applying this 70s computer science concept to LLMs. It was published in Nature and hit major news outlets. Basically they further trained GPT on its output for a couple generations, until the model degraded terribly. Sounded obvious to me, but seeing it happen on the www is painful nonetheless…
Just use the llm to make the books that the llm then uses, what could go wrong?
Someone’s probably already coined the term, but I’m going to call it LLM inbreeding.
I suggested this term in academic circles, as a joke.
I also suggested hallucinations ~3-6 years ago only to find out it was ALSO suggested in the 1970s.
Inbreeding, lol
There was some research article applying this 70s computer science concept to LLMs. It was published in Nature and hit major news outlets. Basically they further trained GPT on its output for a couple generations, until the model degraded terribly. Sounded obvious to me, but seeing it happen on the www is painful nonetheless…
The real term is synthetic data
but it amounts to about the same
sort of
It’s quite similar to another situation known as data incest
Soylent AI? Auto-infocannibalism
It can only go right because corporations must be punished for trying to replace people with machines.