I’m using espeak (from F-Droid) for text to speech, and it’s working great. I’d like an app that does speech to text though, ideally supporting Swedish as well as English for Duolingo purposes, but even just English would be more than I have now.

  • Pantherina@feddit.de
    link
    fedilink
    arrow-up
    2
    ·
    5 months ago

    Packaging is a big thing. On Android the model needs to be integrated in a surrounding modern app using modern libraries.

    I wouldnt be too hyped about training an AI with really little data, but if its substantial this is probably crazy cool.

    • rufus@discuss.tchncs.de
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      5 months ago

      Maybe I should have worded things a bit differently. The 5 second snippets are for style transfer. I think it only picks up on the frequency spectrum of a new voice and knows how to handle that because it’s been trained with several other voices. I suppose one or two sentences aren’t enough to get the pacing right and all the disctinct features of a human speaker. I didn’t get good results anyways. Tools like that from ElevenLabs recommend you upload 30mins to 3hours of speech.

      I’ve managed to get the TTS running. the German thorsten/tacotron2-DDC is very good in my opinion. Could be the thing I was looking for. It just gets all the abbreviations and names wrong but the flow of the voice is quite good. And it’s fast, even on my laptop. Sadly I also read that Coqui-AI are shutting down. Seems to be difficult to compete against the big-tech companies who integrate their proprietary TTS tech for free into the platforms.

      I agree. Packaging and integration are some of the most important aspects if you want to actually use something. A research project is also nice, but those don’t solve my every-day tasks. And I can’t maintain too many development environments with complex dependencies and copy-and-paste everything to the command line. It abolutely needs to be available on the platform and there needs to be a wrapper that integrates it into the other software I use. We have that for espeak, flite and all the old-fashioned tools. But it’s completely missing for the last 5 years of technological advancements…