Article: https://proton.me/blog/deepseek

Calls it “Deepsneak”, failing to make it clear that the reason people love Deepseek is that you can download and it run it securely on any of your own private devices or servers - unlike most of the competing SOTA AIs.

I can’t speak for Proton, but the last couple weeks are showing some very clear biases coming out.

  • lily33@lemm.ee
    link
    fedilink
    arrow-up
    34
    arrow-down
    2
    ·
    19 hours ago

    To be fair, most people can’t actually self-host Deepseek, but there already are other providers offering API access to it.

    • halcyoncmdr@lemmy.world
      link
      fedilink
      English
      arrow-up
      32
      arrow-down
      3
      ·
      19 hours ago

      There are plenty of step-by-step guides to run Deepseek locally. Hell, someone even had it running on a Raspberry Pi. It seems to be much more efficient than other current alternatives.

      That’s about as openly available to self host as you can get without a 1-button installer.

      • Aria@lemmygrad.ml
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        2 hours ago

        Running R1 locally isn’t realistic. But you can rent a server and run it privately on someone else’s computer. It costs about 10 per hour to run. You can run it on CPU for a little less. You need about 2TB of RAM.

        If you want to run it at home, even quantized in 4 bit, you need 20 4090s. And since you can only have 4 per computer for normal desktop mainboards, that’s 5 whole extra computers too, and you need to figure out networking between them. A more realistic setup is probably running it on CPU, with some layers offloaded to 4 GPUs. In that case you’ll need 4 4090s and 512GB of system RAM. Absolutely not cheap or what most people have, but technically still within the top top top end of what you might have on your home computer. And remember this is still the dumb 4 bit configuration.

        Edit: I double-checked and 512GB of RAM is unrealistic. In fact anything higher than 192 is unrealistic. (High-end) AM5 mainboards support up to 256GB, but 64GB RAM sticks are much more expensive than 48GB ones. Most people will probably opt for 48GB or lower sticks. You need a Threadripper to be able to use 512GB. Very unlikely for your home computer, but maybe it makes sense with something else you do professionally. In which case you might also have 8 RAM slots. And such a person might then think it’s reasonable to spend 3000 Euro on RAM. If you spent 15K Euro on your home computer, you might be able to run a reduced version of R1 very slowly.

      • tekato@lemmy.world
        link
        fedilink
        arrow-up
        15
        ·
        18 hours ago

        You can run an imitation of the DeepSeek R1 model, but not the actual one unless you literally buy a dozen of whatever NVIDIA’s top GPU is at the moment.

      • Dyf_Tfh@lemmy.sdf.org
        link
        fedilink
        arrow-up
        10
        arrow-down
        4
        ·
        edit-2
        18 hours ago

        Those are not deepseek R1. They are unrelated models like llama3 from Meta or Qwen from Alibaba “distilled” by deepseek.

        This is a common method to smarten a smaller model from a larger one.

        Ollama should have never labelled them deepseek:8B/32B. Way too many people misunderstood that.

        • pcalau12i@lemmygrad.ml
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          2
          ·
          edit-2
          15 hours ago

          The 1.5B/7B/8B/13B/32B/70B models are all officially DeepSeek R1 models, that is what DeepSeek themselves refer to those models as. It is DeepSeek themselves who produced those models and released them to the public and gave them their names. And their names are correct, it is just factually false to say they are not DeepSeek R1 models. They are.

          The “R1” in the name means “reasoning version one” because it does not just spit out an answer but reasons through it with an internal monologue. For example, here is a simple query I asked DeepSeek R1 13B:

          Me: can all the planets in the solar system fit between the earth and the moon?

          DeepSeek: Yes, all eight planets could theoretically be lined up along the line connecting Earth and the Moon without overlapping. The combined length of their diameters (approximately 379,011 km) is slightly less than the average Earth-Moon distance (about 384,400 km), allowing them to fit if placed consecutively with no required spacing.

          However, on top of its answer, I can expand an option to see its internal monologue it went through before generating the answer, which you can find the internal monologue here because it’s too long to paste.

          What makes these consumer-oriented models different is that that rather than being trained on raw data, they are trained on synthetic data from pre-existing models. That’s what the “Qwen” or “Llama” parts mean in the name. The 7B model is trained on synthetic data produced by Qwen, so it is effectively a compressed version of Qen. However, neither Qwen nor Llama can “reason,” they do not have an internal monologue.

          This is why it is just incorrect to claim that something like DeepSeek R1 7B Qwen Distill has no relevance to DeepSeek R1 but is just a Qwen model. If it’s supposedly a Qwen model, why is it that it can do something that Qwen cannot do but only DeepSeek R1 can? It’s because, again, it is a DeepSeek R1 model, they add the R1 reasoning to it during the distillation process as part of its training. (I think they use the original R1 to produce the data related to the internal monologue which it is learns to copy.)

          • lily33@lemm.ee
            link
            fedilink
            arrow-up
            4
            ·
            edit-2
            7 hours ago

            What makes these consumer-oriented models different is that that rather than being trained on raw data, they are trained on synthetic data from pre-existing models. That’s what the “Qwen” or “Llama” parts mean in the name. The 7B model is trained on synthetic data produced by Qwen, so it is effectively a compressed version of Qen. However, neither Qwen nor Llama can “reason,” they do not have an internal monologue.

            You got that backwards. They’re other models - qwen or llama - fine-tuned on synthetic data generated by Deepseek-R1. Specifically, reasoning data, so that they can learn some of its reasoning ability.

            But the base model - and so the base capability there - is that of the corresponding qwen or llama model. Calling them “Deepseek-R1-something” doesn’t change what they fundamentally are, it’s just marketing.

        • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
          link
          fedilink
          arrow-up
          5
          arrow-down
          1
          ·
          16 hours ago

          I’m running deepseek-r1:14b-qwen-distill-fp16 locally and it produces really good results I find. Like yeah it’s a reduced version of the online one, but it’s still far better than anything else I’ve tried running locally.

            • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
              link
              fedilink
              arrow-up
              2
              arrow-down
              1
              ·
              4 hours ago

              The main difference is speed and memory usage. Qwen is a full-sized, high-parameter model while qwen-distill is a smaller model created using knowledge distillation to mimic qwen’s outputs. If you have the resources to run qwen fast then I’d just go with that.

          • stink@lemmygrad.ml
            link
            fedilink
            English
            arrow-up
            2
            ·
            14 hours ago

            Its so cute when chinese is sprinkled in randomly hehe my little bilingual robot in my pc