• motorwerks@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I like to compare AI language model training to the early beginnings of music sampling in hip-hop. If they can prove that their works we’re used w/o approval I’m guessing the same result will occur.

  • Bezerker03@lemmy.bezzie.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I mean if I watch something and profit off it or even make my own business that’s not anything you can sue for.

    Dunno why these folks think they can sue a model trainer.

    • Flying Squid@lemmy.worldM
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      She claims it regurgitated passages from her book word-for-word. If she has proof of this, it sounds like infringement to me.

    • Wicker@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      Because it’s their work being used algorithmically to support someone else’s.

      Regardless of how you feel about AI, the training models have to exclude copyrighted works to not have this happen, because otherwise it is absolutely true that that AI keeps record of everything fed into it, and if you dont have the rights to what was fed into it, then there’s a copyright issue. Because even if it’s being reworked and influenced by other works, it is still using other people’s stuff to do it. It is, in many ways, an overgrown randomization & automation tool.

      The problem is that people dont see AI’s as a tool that companies are using, they see it almost like a person learning. It’s not like a person learning, and cant be treated the same as say, a consumer reading the book referenced (in this example) for enjoyment.

      • MartianSands@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        it is absolutely true that that AI keeps record of everything fed into it

        No it isn’t.

        A properly trained deep learning system will ultimately far smaller than all of the data it’s been trained on. It’s simply impossible for it to have retained a record of very much of it at all.

        When everything is working correctly it shouldn’t have any of the actual text stored at all. Certainly every single piece of training data will have left some impression on the model, but that’s a very long way from actually storing the training data. The model consists of statistical relationships, not a copy-paste of the inputs.

        Strictly speaking there is something resembling text in the model, but it’s made up of the smallest possible units of language (unless there’s been overfitting, in which case the training has gone wrong and there probably would be a case to answer).

        The model builds sentances from a list of “phrases” which don’t even need to line up with word boundaries. Things like “is a” might be treated as a “word”, as might “ing”, if the model finds that to be a useful snippet.

  • root@precious.net
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    We’re all influenced by the things we’ve experienced. Unless it quoting things verbatim as its own content then I don’t see the issue.