gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnuspeech-contact] introduction and gnuspeech questions (fwd)


From: D.R. Hill
Subject: Re: [gnuspeech-contact] introduction and gnuspeech questions (fwd)
Date: Sun, 20 Nov 2005 14:51:53 -0700 (MST)

Hi Eric,

I notice that your messages haven't been going out to the gnuspeech mailing list. We need to keep other members in the loop I think (and that includes me not sending humungous emails to the list, but keeping the messages going through -- thus I have to get the big files properly organised for access).

Anyway, I'm sending this to everyone, and (if it's OK with you) I can forward all the stuff that hasn't gone out yet -- about 6 emails I think.

More below.

All good wishes.

david
------
David Hill, Prof. Emeritus, Computer Science  |  Imagination is more       |
U. Calgary, Calgary, AB, Canada T2N 1N4       |  important than knowledge  |
address@hidden OR address@hidden   |         (Albert Einstein)  |
http://www.cpsc.ucalgary.ca/~hill             |  Kill your television      |

On Sun, 20 Nov 2005, Eric Zoerner wrote:

With Steve's email with the couple of small modifications to the source code, I was able to build Monet on my Mac. I have been playing around with it and reading the manual trying to learn as much as I can about it. I was happy to discover that it successully outputs audio "out of the box". Let me say right away that so far my experience with Monet has been extremely positive in comparison to using the Festival system (which I have a very low opinion of). With respect to ease-of-use, gnuspeech is lightyears ahead of Festival.

Interesting. Thanks for the feedback. I think articulatory synthesis is the way forward for the future for a variety of reasons, and, as you may be aware, computer-human interaction is one of my main areas of research interest (along with speech synthesis and recognition and AI), so I strongly believe that software should be user-friendly in a non-trivial, non-patronising sense (I personally hate little dogs in my help screen that wag their tail ;-) [but at least you can send the creature packing, which is as it should be]. But "The parameter is wrong" -- what kind of error message is that, especially if you never provided a parameter and there's no indication as to what such a parameter might be. The one I like best is really a caricature: "Your mouse moved. Reboot your system to continue." But it does capture some of the idiocy that is rampant in software development.


The quality of the speech output for the default utterances in monet.diphones was a little disappointing in comparison to the hello demo you sent me and the pat-a-pan demo. Were those demos hand-tuned in some way to provide a higher quality output? In any case, I am eagerly awaiting the lumberjack demo and any other audio files you can send me.

Synthesising sung words is a great deal easier than synthesising normal speech because the rhythm and intonation are largely determined (there are things like "lente" and "rallantando" I guess which can ask for unspecified rhythmic variation, but notes have quantity and values that are well defined). That, plus the addition of multipart voices and the acoustic imaging resulting from a virtual room correctly modelled make Pat-a-pan an excellent demo, despite some missing fricatives. Leonard Manzara's Ph.D. was actually a *music* PhD (!!) so he's ideally placed to do that kind of work, given his skill with mathematics and programming. Of course, Perry Cook and Julian Smith at the Stanford Centre for Computer Research on Music and Acoustics were equally interdisciplinary and that's from whence Len got a lot of his inspiration.

As I said before, the "Hello" comparison was done with the so-called "software synthesiser" which is available as source code on the CVS site and (being written in "C" -- "tube.c") can run anywhere. But it needs properly formatted input parameters as produced within Monet -- both static (defining the tube and the atmosphere) and dynamic (controlling the postures, and pitch within the time framework). The software synthesiser avoided the short-cuts and simplifications needed to get the DSP version running in real-time on the old NeXT DSP56001 digital signal processor. Even then, the shortest tube we could emulate was around 15cm (the sample rate goes up as the tube gets shorter. Not having the real-time constraint, the tube model could do everything properly and still handle a child tube or even a baby tube (say, 7 cms). So yes, the "Hello" comparison, with hand applied intonation and greater accuracy, does sound better -- but that indicates how good the Mac synthesis should be. The compute cycle constraint is largely removed, the rhythm model is pretty good, and the intonation model, although about the best around, could be greatly improved with access to the results of grammatical analysis.

I'll be interested to know what you think of "lumberjack" which is a true representation of text-to-speech, without interference, as produced by the original NeXT TTS system. I have another test piece which I'll send (or upload, or put on the web site), "thechaos" which is actually a "poem" published in the UK magazine "Punch" many years ago by a Dutchman who wanted to point out the obstacles to learning English presented by the disconnect between spelling and pronunciation. I used it because it is a good demonstration of the ability of the TTS system we produced to deal with "difficult" words. Of course, the biggest advantage is the use of a dictionary, falling back to letter-to-sound rules only as a last resort, but it does provide a wide variety of words and contexts which are fairly demanding at the segmental/suprasegmental levels. I attach a script for both "lumberjack" and "thechaos". Apart from providing a script for what you hear (you may wish to postpone reading the scripts till you've had a chance to listen to the audio a few times), they probably give an idea of why punctuation is important, and what "good" punctuation really means. Well, I suppose you can always read "Eats, shoots and leaves" too ;-)


I see that the application GnuSpeech is also in the "current" directory for the Mac. I have not yet built that application; at this point text-to-speech is a lower priority for me right now than being able to directly input a phonetic transcription.

I understand that, but using GnuSpeech (not the best choice of name since it causes confusion for now, though later it can have the bits added to do everything) does provide a quick way of generating a correct Monet input syntax and you can hand edit the phonetic content.


Eric


On 20 Samh 2005, at 07:20, D.R. Hill wrote:



Let me know how you are getting on with the compilation. If you get stuck, I can always email you an older binary, produced before Greg Casamento started modifying the source for GNU/Linux compatibility.

You'll find the program "GnuSpeech" (in the "Applications" folder on the savannah site) -- written by Steve -- provides a direct conversion of text input to the input format needed by Monet. It is a 0.9 version though, and is a little picky about input format. E.g. is wants spaces before punctuation and things like that, and provides a "raspberry" (bzzzzt) if it doesn't like the text input.

Hope this all helps.  Keep bugging me!

All good wishes.

David Hill, Prof. Emeritus, Computer Science | Imagination is more | U. Calgary, Calgary, AB, Canada T2N 1N4 | important than knowledge | address@hidden OR address@hidden | (Albert Einstein) | http://www.cpsc.ucalgary.ca/~hill | Kill your television |

On Wed, 16 Nov 2005, Eric Zoerner wrote:

When you get a chance, could you please send me more sound samples including full sentences?
Thank you in advance!
Eric
- Are there any sound files available that give samples of the speech output?
Yes, I have attached a comparitive synthesis of "Hello" by male, female and child voices from the system. I can send more stuff, including sentences and so on if you like.
  [ The following attachments were DELETED when this message was saved:  ]
  [ A Audio/BASIC segment of about 10,822,618 bytes,                     ]

Attachment: lumberjack.txt
Description: Text document

Attachment: the-chaos.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]