I don’t really find the waste argument terribly convincing myself. The amount of waste depends on how many tries it needs to get the answer, and how much previous work it can reuse. The quality of output has already improved dramatically, and there’s no reason to expect that this will not continue to get better over time. Meanwhile, there’s every reason to expect that iterative loop will continue to be optimized as well.
In a broader sense, we waste power all the time on all kinds of things. Think of all the ads, crypto, or consumerism in general. There’s nothing uniquely wasteful about LLMs, and at least they can be put towards producing something of value, unlike many things our society wastes energy on.
I do think there’s something uniquely wasteful about floating point arithmetic, which is why need specialized processors for it, and there is something uniquely wasteful about crypto and LLMs, both in terms of electricity but also in terms of waste heat. I agree that generative AI for solving problems is definitely better than crypto, and it’s better than using generative AI to produce creative works, do advertising and marketing, etc.
But it’s not without it’s externalities and putting that in an unmonitored iterative loop at scale requires us to at least consider the costs.
Eventually we most likely will see specialized chips for this, and there are already analog chips being produced for neural networks which are a far better fit. There are selection pressures to improve this tech even under capitalism, since companies running models end up paying for the power usage. And then we have open source models with people optimizing them to run things locally. Personally, I find it mind blowing that we’ve already can run local models on a laptop that perform roughly as well as models that required a whole data centre to run just a year ago. It’s hard to say when all the low hanging fruit is picked, will improvements start to plateau, but so far it’s been really impressive to watch.
Yeah, there is something to be said for changing the hardware. Producing the models is still expensive even if running the models is becoming more efficient. But DeepSeek shows us even production is becoming more efficient.
What’s impressive to me is how useful the concept of the stochastic parrot is turning out to be. It doesn’t seem to make a lot of sense, at first or even second glace, that choosing the most probable next word in a sentence based on the statistical distribution of word usages across a training set would actually be all that useful.
I’ve used it for coding before and it’s obvious that these things are most useful at reproducing code tutorials or code examples and not at all for reasoning, but there’s a lot of code examples and tutorials out there that I haven’t read yet and never will read. The ability of a stochastic parrot to reproduce that code using human language as it’s control input is impressive.
I’ve been amazed by this idea ever since I learned about Markov chains, and arguably LLMs aren’t fundamentally different in nature. It’s simply a huge token space encoded in a multidimensional matrix, but the fundamental idea is the same. It’s really interesting how you start getting emergent properties when you scale something conceptually simple up. It might say something about the nature of our own cognition as well.
You mentioned Markov Chains; for a laymen with regards to mathematics (one would need to brush up on basic calculus) would you know any good books (I was thinking textbooks?) or resources to better understand maths with view to gain a better understanding of LLMs/GenAI later down the line?
A few books that are fairly accessible depending on your math level.
Basic Math for AI is written for people with no prior AI or advanced math knowledge. It aims to demystify the essential mathematics needed for AI, and gives a broad beginner-friendly introduction.
Mathematics for Machine Learning is a bit more academic than Hinton’s book, and it covers linear algebra, vector calculus, probability, and optimization, which are the pillars of LLM math.
Naked Statistics: Stripping the Dread from the Data is phenomenal for building an intuitive understanding of probability and statistics, which are often the most intimidating subjects for beginners.
I don’t really find the waste argument terribly convincing myself. The amount of waste depends on how many tries it needs to get the answer, and how much previous work it can reuse. The quality of output has already improved dramatically, and there’s no reason to expect that this will not continue to get better over time. Meanwhile, there’s every reason to expect that iterative loop will continue to be optimized as well.
In a broader sense, we waste power all the time on all kinds of things. Think of all the ads, crypto, or consumerism in general. There’s nothing uniquely wasteful about LLMs, and at least they can be put towards producing something of value, unlike many things our society wastes energy on.
I do think there’s something uniquely wasteful about floating point arithmetic, which is why need specialized processors for it, and there is something uniquely wasteful about crypto and LLMs, both in terms of electricity but also in terms of waste heat. I agree that generative AI for solving problems is definitely better than crypto, and it’s better than using generative AI to produce creative works, do advertising and marketing, etc.
But it’s not without it’s externalities and putting that in an unmonitored iterative loop at scale requires us to at least consider the costs.
Eventually we most likely will see specialized chips for this, and there are already analog chips being produced for neural networks which are a far better fit. There are selection pressures to improve this tech even under capitalism, since companies running models end up paying for the power usage. And then we have open source models with people optimizing them to run things locally. Personally, I find it mind blowing that we’ve already can run local models on a laptop that perform roughly as well as models that required a whole data centre to run just a year ago. It’s hard to say when all the low hanging fruit is picked, will improvements start to plateau, but so far it’s been really impressive to watch.
Yeah, there is something to be said for changing the hardware. Producing the models is still expensive even if running the models is becoming more efficient. But DeepSeek shows us even production is becoming more efficient.
What’s impressive to me is how useful the concept of the stochastic parrot is turning out to be. It doesn’t seem to make a lot of sense, at first or even second glace, that choosing the most probable next word in a sentence based on the statistical distribution of word usages across a training set would actually be all that useful.
I’ve used it for coding before and it’s obvious that these things are most useful at reproducing code tutorials or code examples and not at all for reasoning, but there’s a lot of code examples and tutorials out there that I haven’t read yet and never will read. The ability of a stochastic parrot to reproduce that code using human language as it’s control input is impressive.
I’ve been amazed by this idea ever since I learned about Markov chains, and arguably LLMs aren’t fundamentally different in nature. It’s simply a huge token space encoded in a multidimensional matrix, but the fundamental idea is the same. It’s really interesting how you start getting emergent properties when you scale something conceptually simple up. It might say something about the nature of our own cognition as well.
You mentioned Markov Chains; for a laymen with regards to mathematics (one would need to brush up on basic calculus) would you know any good books (I was thinking textbooks?) or resources to better understand maths with view to gain a better understanding of LLMs/GenAI later down the line?
A few books that are fairly accessible depending on your math level.
Basic Math for AI is written for people with no prior AI or advanced math knowledge. It aims to demystify the essential mathematics needed for AI, and gives a broad beginner-friendly introduction.
https://www.goodreads.com/book/show/214340546-basic-math-for-ai
Mathematics for Machine Learning is a bit more academic than Hinton’s book, and it covers linear algebra, vector calculus, probability, and optimization, which are the pillars of LLM math.
https://www.goodreads.com/book/show/50419441-mathematics-for-machine-learning
Naked Statistics: Stripping the Dread from the Data is phenomenal for building an intuitive understanding of probability and statistics, which are often the most intimidating subjects for beginners.
https://www.goodreads.com/book/show/17986418-naked-statistics
Thank you for taking the time for that reply and reading list; very much appreciated!
no prob