I’ve fucked around a bit with ChatGPT and while, yeah, it frequently says wrong or weird stuff, it’s usually fairly subtle shit, like crap I actually had to look up to verify it was wrong.
Now I’m seeing Google telling people to put glue on pizza. That a bit bigger than getting the name of George Washington’s wife wrong or Jay Leno’s birthday off by 3 years. Some of these answers seem almost cartoonish in their wrongness I almost half suspect some engineer at Google is fucking with it to prove a point.
Is it just that Googles AI sucks? I’ve seen other people say that AI is now getting info from other AIs and it’s leading to responses getting increasingly bizarre and wrong so… idk.
From the looks of it, it mainly trawls for info off of Reddit and Quora
The problem being that these sources are frequently wrong or deliberately jokey nonsense
But the AI doesn’t know that!
Can someone with the google ai answers ask it where pee is stored?
I regret to inform you that it answered correctly
In the balls, right?
Yeah the Google one is noticeably worse than the latest ChatGPT models.
Most of the dumb responses it gets are old reddit jokes, so I wonder if they saw how people use reddit:search and decided that means they should train it on mostly reddit.
It also might just be a cheaper model.
I read that they paid Reddit $60M to use it for training data
Google is pivoting into AI hard so I doubt their model is cheap at all. Unless they’re running a much smaller version for Google search compared to bespoke Gemini conversations.
Never underestimate the ability of capitalists to cut corners.
Cutting so many corners off this thing we’re just running in circles now…
I wouldn’t be surprised if they’re using a variant of their 2B or 7B Gemma models, as opposed to the monster Gemini.
Almost surely. If they’re generating an overview at search time they need a very fast model. You can use a cache for the most common searches but otherwise you have to do inference at search time, so you need a fast/small model
The LLM is just summarizing/paraphrasing the top search results, and from these examples, doesn’t seem to be doing any self-evaluation using the LLM itself. Since this is for free and they’re pushing it out worldwide, I’m guessing the model they’re using is very lightweight, and probably couldn’t reliably evaluate results if even they prompted it to.
As for model collapse, I’d caution buying too much into model collapse theory, since the paper that demonstrated it was with a very case study (a model purely and repeatedly trained on its own uncurated outputs over multiple model “generations”) that doesn’t really occur in foundation model training.
I’ll also note that “AI” isn’t a homogenate. Generally, (transformer) models are trained at different scales, with smaller models being less capable but faster and more energy efficient, while larger flagship models are (at least, marketed as) more capable despite being slow, power- and data-hungry. Almost no models are trained in real-time “online” with direct input from users or the web, but rather with vast curated “offline” datasets by researchers/engineers. So, AI doesn’t get information directly from other AIs. Rather, model-trainers would use traditional scraping tools or partner APIs to download data, do whatever data curation and filtering they do, and they then train the models. Now, the trainers may not be able to filter out AI content, or they may intentional use AI systems to generate variations on their human-curated data (synthetic data) because they believe it will improve the robustness of the model.
EDIT: Another way that models get dumber, is that when companies like OpenAI or Google debut their model, they’ll show off the full-scale, instruct-finetuned foundation model. However, since these monsters are incredibly expensive, they use these foundational models to train “distilled” models. For example, if you use ChatGPT (at least before GPT-4o), then you’re using either GPT3.5-Turbo (for free users), or GPT4-Turbo (for premium users). Google has recently debuted its own Gemini-Flash, which is the same concept. These distilled models are cheaper and faster, but also less capable (albeit potentially more capable than if you trained model from scratch at that reduced scale).
My experience using Gemini for generating filler text leads me to believe it has a far worse grip on reality than ChatGPT does. ChatGPT generates conversational, somewhat convincing language. Gemini generates barely grammatical language that hardly even works as something that you can skim over without noticing glaring mistakes and nonsense sentences.
I think Google’s AI is particularly bad but there’s a larger point to be made about how irresponsible it is to deploy an LLM to do what Google is doing, regardless of how good it is.
Fundamentally, all an LLM knows how to do is put words together to form sentences that are close to what it has seen in its training data. Attention, which is the mechanism LLMs use to analyze the meaning of words before it starts to figure out its response to a prompt, is supposed to model the real-world meaning of each world, and is context dependent. While there’s a lot of potential in attention as a mechanism to bring AI models closer to actually being intelligent, the fundamental problem they have is that no fancy word embedding process can actually give the AI a model of meaning to actually bring words from symbols to representing concepts, because you can’t “conceptualize” anything by multiplying a bunch of matrices. Attention isn’t all you need, even if it’s all you need for sequence processing it’s not enough to make an AI even close to intelligent.
You can’t expect a language model to function as a search engine or an assistant because those are tasks that need the AI to understand the world, not just how words work, and I think ultimately it’s gonna take a lot of duds and weird failures like this Google product before tech companies find the place to put the current generation of LLMs. It’s like crypto, it blew up and got everywhere before people quickly realized how much of a scam it was, and now it still exists but it’s niche. LLMs aren’t gonna be niche at all, even once the VC money dries out, but I highly doubt we’ll see too much more of this overgeneralization of AI. Maybe once the next generation of carbon guzzling machine learning models that take several gigawatts to operate in a datacenter the size of my hometown is finally ready to go, they’ll figure out a search engine assistant.
“AI” is not an information tool, but it’s being marketed as one. These machine learning algorithms do not possess any knowledge. All a LLM is doing is mimicking human speech patterns, it knows how to put words in a grammatically correct order that are relevant to the input prompt and that’s about it.
Machine learning has some practical uses but giving factual information, instruction, or advice are not among them. The idea of incorporating this shit into a search engine is fucking ridiculous.
I believe the google one literally just summarizes the top result, which is often an onion article or reddit post or something. It isn’t prompted by your query, it’s prompted by the search results
Edit: beaten to the punch by JohnBrownsBussy2
edit: beaten to the punch by JohnBrownsBussy2
please can we make this a tagline
I think part of it is that Genini is further behind ChatGPT and the training model isn’t all that great. Part of the training that Google uses for Gemini comes from Reddit, as part of deal in which Reddit shares info with Google to “improve” Gemini. Not sure about it using info from other AI models to train, but sounds about as dumb as using Reddit.
We are seeing something called model collapse right now which is the result of training a ML model off it’s own content. Because ML content is likely the majority of content on the Web over the past two years and these models require constant training data, the models are falling apart due to circular input.
Sorta relatedly, I remember back before AI the most common reddit bots would just post a top comment from the same askreddit thread or a similar one. It’s a weird feeling to think that everyone seems to more or less have forgotten about this as the main way to do account farming for a while. The focus on AI modifying the response makes no sense to me. The ‘content’ literally doesn’t matter so why do the more computationally expensive thing? There were plenty of methods that people used to reword and avoid bans too. Like checking the thread for the same text and picking a different comment, having a table of interchangeable common words, etc, etc. The AI methods seem to be literally worse than just a copy-paste fuzzy matching (or markov chain etc etc).
deleted by creator
Googles highest end ai is ranked really highly though, second only to the 3 best gpt-4 models. Even outranks a couple other gpt-4 variants. They weren’t really that late (I don’t think they were late at all tbh), they just didn’t invest in a completely dedicated AI company like OpenAi and Microsoft, they actually built their own from the ground up. I think the main problem with googles search model is that it’s really bad at crawling the web for data and turning that into a coherent answer. But their LLM alone and as a chatbot is top notch
You are absolutely right about having something to please investors because Microsoft was way more prepared to integrate Copilot into all their products. Googles models are super powerful, they just were not prepared to package it into a consumer product as soon as they did