• modeler@lemmy.world
    link
    fedilink
    English
    arrow-up
    41
    ·
    edit-2
    4 months ago

    Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model. Ouch.

    Edit: you can try quantizing it. This reduces the amount of memory required per parameter to 4 bits, 2 bits or even 1 bit. As you reduce the size, the performance of the model can suffer. So in the extreme case you might be able to run this in under 64GB of graphics RAM.

    • cheddar@programming.dev
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      1
      ·
      4 months ago

      Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model.

    • Siegfried@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      4 months ago

      At work we habe a small cluster totalling around 4TB of RAM

      It has 4 cooling units, a m3 of PSUs and it must take something like 30 m2 of space

    • 1984@lemmy.today
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 months ago

      Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?

    • obbeel
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      According to huggingface, you can run a 34B model using 22.4GBs of RAM max. That’s a RTX 3090 Ti.

    • Longpork3@lemmy.nz
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      4 months ago

      Hmm, I probably have that much distributed across my network… maybe I should look into some way of distributing it across multiple gpu.

      Frak, just counted and I only have 270gb installed. Approx 40gb more if I install some of the deprecated cards in any spare pcie slots i can find.