A new, open source text-to-speech model called Dia has arrived to challenge ElevenLabs, OpenAI and more

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 days ago

A new, open source text-to-speech model called Dia has arrived to challenge ElevenLabs, OpenAI and more

BountifulEggnog [she/her]@hexbear.net · edit-2 5 days ago

https://yummy-fir-7a4.notion.site/dia

Demo with clips. It sounds pretty good, I’m not very familiar with tts so I’m not sure what to expect. 10gb for the full sized model is very reasonable for consumer hardware though. Also really cool this is just two people that made this.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 days ago

Yeah this is something you can easily run local.

darkcalling [comrade/them, she/her]@hexbear.net · 5 days ago

Cool. Though still waiting for a good open source, locally runnable (without needing 64GB of RAM for itself alone) speech to text transcribing model that isn’t awful which I can use to generate subtitles for things from less than perfect audio samples. Doesn’t exist apparently as even Youtube’s transcription isn’t great (though Apple podcast’s transcription is actually really good by comparison).

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 days ago

looks like Dia runs with just 10gb already

The full version of Dia requires around 10GB of VRAM to run. We will be adding a quantized version in the future.

https://github.com/nari-labs/dia/