• ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
    link
    fedilink
    arrow-up
    14
    ·
    4 days ago

    I don’t really find the waste argument terribly convincing myself. The amount of waste depends on how many tries it needs to get the answer, and how much previous work it can reuse. The quality of output has already improved dramatically, and there’s no reason to expect that this will not continue to get better over time. Meanwhile, there’s every reason to expect that iterative loop will continue to be optimized as well.

    In a broader sense, we waste power all the time on all kinds of things. Think of all the ads, crypto, or consumerism in general. There’s nothing uniquely wasteful about LLMs, and at least they can be put towards producing something of value, unlike many things our society wastes energy on.

    • freagle@lemmygrad.ml
      link
      fedilink
      English
      arrow-up
      10
      ·
      4 days ago

      I do think there’s something uniquely wasteful about floating point arithmetic, which is why need specialized processors for it, and there is something uniquely wasteful about crypto and LLMs, both in terms of electricity but also in terms of waste heat. I agree that generative AI for solving problems is definitely better than crypto, and it’s better than using generative AI to produce creative works, do advertising and marketing, etc.

      But it’s not without it’s externalities and putting that in an unmonitored iterative loop at scale requires us to at least consider the costs.

      • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
        link
        fedilink
        arrow-up
        10
        ·
        4 days ago

        Eventually we most likely will see specialized chips for this, and there are already analog chips being produced for neural networks which are a far better fit. There are selection pressures to improve this tech even under capitalism, since companies running models end up paying for the power usage. And then we have open source models with people optimizing them to run things locally. Personally, I find it mind blowing that we’ve already can run local models on a laptop that perform roughly as well as models that required a whole data centre to run just a year ago. It’s hard to say when all the low hanging fruit is picked, will improvements start to plateau, but so far it’s been really impressive to watch.

        • freagle@lemmygrad.ml
          link
          fedilink
          English
          arrow-up
          4
          ·
          3 days ago

          Yeah, there is something to be said for changing the hardware. Producing the models is still expensive even if running the models is becoming more efficient. But DeepSeek shows us even production is becoming more efficient.

          What’s impressive to me is how useful the concept of the stochastic parrot is turning out to be. It doesn’t seem to make a lot of sense, at first or even second glace, that choosing the most probable next word in a sentence based on the statistical distribution of word usages across a training set would actually be all that useful.

          I’ve used it for coding before and it’s obvious that these things are most useful at reproducing code tutorials or code examples and not at all for reasoning, but there’s a lot of code examples and tutorials out there that I haven’t read yet and never will read. The ability of a stochastic parrot to reproduce that code using human language as it’s control input is impressive.

          • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
            link
            fedilink
            arrow-up
            4
            ·
            3 days ago

            I’ve been amazed by this idea ever since I learned about Markov chains, and arguably LLMs aren’t fundamentally different in nature. It’s simply a huge token space encoded in a multidimensional matrix, but the fundamental idea is the same. It’s really interesting how you start getting emergent properties when you scale something conceptually simple up. It might say something about the nature of our own cognition as well.