speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

TTS algorithms (Re: Comments on the Text to Speech "algorithm")


From: Klaus Knopper
Subject: TTS algorithms (Re: Comments on the Text to Speech "algorithm")
Date: Sun, 28 Feb 2010 06:11:02 +0100

Hi,

Maybe I'm missing something, but as far as I understood the question,
marc just asked about a common procedure called "unit selection" which
is the algorithm of many text-to-speech synthesizers.

With unit selection, larger snippets from prerecorded texts are glued
together with some pitch and melody correction, from a really huge
database of voice data. This usually produces the most "natural"
sounding output, but the effort for recording and post-editing is
tremendous, not even talking about legal issues with copyrights on the
personal voice data of the selected speaker(s).

espeak uses the "pure mathematical" approach, and (almost) no real
recordings, which makes it very fast and small, but the voice does not
sound as "real" as a prerecorded one. Its quality is still very good,
and in some cases even superior to recording-based voice synthesis in
terms of understandability.

festival and mbrola (mostly) use diphone synthesis, where 
combinations of natural voice sounds (phones) are glued together.

speechd itself does no text-to-speech synthesis. It just collects texts
and queues them for precessing with external synthesizers.

Back to unit selection: Because of time-critical issues, selection and
processing of real recordings requires a lot of IO throughput, so you
will need a very fast harddisk (maybe raid) or database, possibly cached
in RAM, or just accept the output data to be generated "offline" with
playback a few seconds or even minutes after the original text was sent,
output being in form of a WAV, Ogg or also the aforementioned MP3 if you
don't mind using a patented format with its problematic legal issues.

"Mary" from the DFKI uses unit selection, it is open source and written
in JAVA, but there is no plugin for speechd yet. It may be possible to
use it as commandline-based external program for speechd still.

http://mary.dfki.de/

Regards
-Klaus Knopper

On Sun, Feb 28, 2010 at 06:21:03AM +0200, A wrote:
> Let alone mp3 is a bad choice but why should file access be so bad? If
> windows file system can't keep up, then some way to bundle the files
> in a single (or few) data structure should do the trick. And if the
> speech engine starts playing the file as soon as there are enough bits
> to do so instead of reading the whole first.
> I think it's more a problem of latency optimizations rather than
> anything else on the current dual and multicore CPUs.
> 
> On Sun, Feb 28, 2010 at 1:23 AM, Kenny Hitt <kenny at hittsjunk.net> wrote:
> > Hi. ?That would probably be ok for reading books, but it
> > would suck for a screen reader. ?One reason I haven't
> > used Cepstral Swift much even though I own several voices, is it's not
> > responsive enough for daily screen reading.
> > The file access alone for so many mp3s would be aweful.
> >
> > ? ? ? ? ?Kenny
> >
> > On Sat, Feb 27, 2010 at 10:47:13PM +0100, marc wrote:
> >> Hello,
> >>
> >> I made this remark at the http://rmll.info last summer in Nantes.
> >>
> >> I you have Text to Speech (TTS), the "old" way is to invent some
> >> mathematical function and to generate a "sound" which is "close" (in
> >> Hausdorf distance?) to the spoken words.
> >>
> >> But these mathematical formulas date from times when computers
> >> didn't have the possibilities to contain about 60.000 MP3s from a
> >> human speaker. If we could organise it that way, the concatanation
> >> of the words would be better than the mathematical contruction. ?And
> >> if you learned how to make a higher sound at the end of a question,
> >> you should be able to adapt the mp3 too.
> >>
> >> Problem is: we will have to throw away a lot of work by
> >> mathematicians... ?Mathematicians never had patents (the Greek would
> >> be rich ;-). ?But we throw away a lot of stuff in computer science
> >> ...
> >>
> >>
> >> Marc
> >>
> >>
> >>
> >>
> >> --
> >> What's on Shortwave guide: choose an hour, go!
> >> http://shortwave.tk
> >> 700+ Radio Stations on SW http://swstations.tk
> >> 300+ languages on SW http://radiolanguages.tk
> >>
> >> _______________________________________________
> >> Speechd mailing list
> >> Speechd at lists.freebsoft.org
> >> http://lists.freebsoft.org/mailman/listinfo/speechd
> >
> > _______________________________________________
> > Speechd mailing list
> > Speechd at lists.freebsoft.org
> > http://lists.freebsoft.org/mailman/listinfo/speechd
> >
> 
> _______________________________________________
> Speechd mailing list
> Speechd at lists.freebsoft.org
> http://lists.freebsoft.org/mailman/listinfo/speechd



reply via email to

[Prev in Thread] Current Thread [Next in Thread]