Re: [gnuspeech-contact] introduction and gnuspeech questions (fwd)

gnuspeech-contact

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnuspeech-contact] introduction and gnuspeech questions (fwd)

From:	D.R. Hill
Subject:	Re: [gnuspeech-contact] introduction and gnuspeech questions (fwd)
Date:	Sun, 20 Nov 2005 14:51:53 -0700 (MST)

Hi Eric,

I notice that your messages haven't been going out to the gnuspeechmailing list. We need to keep other members in the loop I think (and thatincludes me not sending humungous emails to the list, but keeping themessages going through -- thus I have to get the big files properlyorganised for access).

Anyway, I'm sending this to everyone, and (if it's OK with you) I canforward all the stuff that hasn't gone out yet -- about 6 emails I think.


More below.

All good wishes.

david
------
David Hill, Prof. Emeritus, Computer Science  |  Imagination is more       |
U. Calgary, Calgary, AB, Canada T2N 1N4       |  important than knowledge  |
address@hidden OR address@hidden   |         (Albert Einstein)  |
http://www.cpsc.ucalgary.ca/~hill             |  Kill your television      |

On Sun, 20 Nov 2005, Eric Zoerner wrote:

With Steve's email with the couple of small modifications to the source code,I was able to build Monet on my Mac. I have been playing around with it andreading the manual trying to learn as much as I can about it. I was happy todiscover that it successully outputs audio "out of the box". Let me sayright away that so far my experience with Monet has been extremely positivein comparison to using the Festival system (which I have a very low opinionof). With respect to ease-of-use, gnuspeech is lightyears ahead of Festival.

Interesting. Thanks for the feedback. I think articulatory synthesis isthe way forward for the future for a variety of reasons, and, as you maybe aware, computer-human interaction is one of my main areas of researchinterest (along with speech synthesis and recognition and AI), so Istrongly believe that software should be user-friendly in a non-trivial,non-patronising sense (I personally hate little dogs in my help screenthat wag their tail ;-) [but at least you can send the creature packing,which is as it should be]. But "The parameter is wrong" -- what kind oferror message is that, especially if you never provided a parameter andthere's no indication as to what such a parameter might be. The one Ilike best is really a caricature: "Your mouse moved. Reboot your systemto continue." But it does capture some of the idiocy that is rampant insoftware development.

The quality of the speech output for the default utterances in monet.diphoneswas a little disappointing in comparison to the hello demo you sent me andthe pat-a-pan demo. Were those demos hand-tuned in some way to provide ahigher quality output? In any case, I am eagerly awaiting the lumberjack demoand any other audio files you can send me.

Synthesising sung words is a great deal easier than synthesising normalspeech because the rhythm and intonation are largely determined (there arethings like "lente" and "rallantando" I guess which can ask forunspecified rhythmic variation, but notes have quantity and values thatare well defined). That, plus the addition of multipart voices and theacoustic imaging resulting from a virtual room correctly modelled makePat-a-pan an excellent demo, despite some missing fricatives. LeonardManzara's Ph.D. was actually a *music* PhD (!!) so he's ideally placed todo that kind of work, given his skill with mathematics and programming.Of course, Perry Cook and Julian Smith at the Stanford Centre for ComputerResearch on Music and Acoustics were equally interdisciplinary and that'sfrom whence Len got a lot of his inspiration.

As I said before, the "Hello" comparison was done with the so-called"software synthesiser" which is available as source code on the CVS siteand (being written in "C" -- "tube.c") can run anywhere. But it needsproperly formatted input parameters as produced within Monet -- bothstatic (defining the tube and the atmosphere) and dynamic (controlling thepostures, and pitch within the time framework). The software synthesiseravoided the short-cuts and simplifications needed to get the DSP versionrunning in real-time on the old NeXT DSP56001 digital signal processor.Even then, the shortest tube we could emulate was around 15cm (the samplerate goes up as the tube gets shorter. Not having the real-timeconstraint, the tube model could do everything properly and still handle achild tube or even a baby tube (say, 7 cms). So yes, the "Hello"comparison, with hand applied intonation and greater accuracy, does soundbetter -- but that indicates how good the Mac synthesis should be. Thecompute cycle constraint is largely removed, the rhythm model is prettygood, and the intonation model, although about the best around, could begreatly improved with access to the results of grammatical analysis.

I'll be interested to know what you think of "lumberjack" which is a truerepresentation of text-to-speech, without interference, as produced by theoriginal NeXT TTS system. I have another test piece which I'll send (orupload, or put on the web site), "thechaos" which is actually a "poem"published in the UK magazine "Punch" many years ago by a Dutchman whowanted to point out the obstacles to learning English presented by thedisconnect between spelling and pronunciation. I used it because it is agood demonstration of the ability of the TTS system we produced to dealwith "difficult" words. Of course, the biggest advantage is the use of adictionary, falling back to letter-to-sound rules only as a last resort,but it does provide a wide variety of words and contexts which are fairlydemanding at the segmental/suprasegmental levels. I attach a script forboth "lumberjack" and "thechaos". Apart from providing a script for whatyou hear (you may wish to postpone reading the scripts till you've had achance to listen to the audio a few times), they probably give an idea ofwhy punctuation is important, and what "good" punctuation really means.Well, I suppose you can always read "Eats, shoots and leaves" too ;-)

I see that the application GnuSpeech is also in the "current" directory forthe Mac. I have not yet built that application; at this point text-to-speechis a lower priority for me right now than being able to directly input aphonetic transcription.

I understand that, but using GnuSpeech (not the best choice of name sinceit causes confusion for now, though later it can have the bits added todo everything) does provide a quick way of generating a correct Monetinput syntax and you can hand edit the phonetic content.

Eric


On 20 Samh 2005, at 07:20, D.R. Hill wrote:
Let me know how you are getting on with the compilation. If you getstuck, I can always email you an older binary, produced before GregCasamento started modifying the source for GNU/Linux compatibility.
You'll find the program "GnuSpeech" (in the "Applications" folder on thesavannah site) -- written by Steve -- provides a direct conversion of textinput to the input format needed by Monet. It is a 0.9 version though,and is a little picky about input format. E.g. is wants spaces beforepunctuation and things like that, and provides a "raspberry" (bzzzzt) ifit doesn't like the text input.
Hope this all helps.  Keep bugging me!

All good wishes.
David Hill, Prof. Emeritus, Computer Science | Imagination is more|U. Calgary, Calgary, AB, Canada T2N 1N4 | important than knowledge|address@hidden OR address@hidden | (Albert Einstein)|http://www.cpsc.ucalgary.ca/~hill | Kill your television|
On Wed, 16 Nov 2005, Eric Zoerner wrote:
When you get a chance, could you please send me more sound samplesincluding full sentences?
Thank you in advance!
Eric
- Are there any sound files available that give samples of thespeech output?
Yes, I have attached a comparitive synthesis of "Hello" by male,female and child voices from the system. I can send more stuff,including sentences and so on if you like.
  [ The following attachments were DELETED when this message was saved:  ]
  [ A Audio/BASIC segment of about 10,822,618 bytes,                     ]

lumberjack.txt
Description: Text document

the-chaos.txt
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [gnuspeech-contact] introduction and gnuspeech questions (fwd), D.R. Hill <=

Prev by Date: Re: [gnuspeech-contact] compile failure
Next by Date: Re: [gnuspeech-contact] another important comment
Previous by thread: [gnuspeech-contact] compile failure
Next by thread: [gnuspeech-contact] voice control in articulatory vs. HMM-based synthesis
Index(es):
- Date
- Thread