gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnuspeech-contact] Early GNUSpeech observations


From: David Hill
Subject: Re: [gnuspeech-contact] Early GNUSpeech observations
Date: Wed, 8 Apr 2009 15:37:09 -0700

Hi Jason,

On Apr 8, 2009, at 1:24 AM, Jason White wrote:

Having run the (very early) GNU/Linux version, I wish to congratulate the
authors of GNUSpeech for having advanced the porting effort this far.

Thank you.


I notice that the version which I built doesn't pronounce names and relatively uncommon words - perhaps it is restricted to pronouncing words that are in its
dictionary. I hear a "zzz" sound in place of each omitted word.

This sound was put in there deliberately by Steve Nygard to make sure it was clearly understood that the system was not dealing with parts of the input, because (as you guess) the parser (which does all kinds of things including dictionary derivatives, arranging numbers and dates to be spoken in the way people speak them, and so on) is by no means completely ported. It is probably the very next job because it makes a big different to the overall quality of the spoken output.

There is a letter-to-sound component in there as should be functioning. I don't think that's what causes the funny "zzzzzt" noises, though they are not very good rules and normally are not normally used much because with a 70,000 word dictionary, with hand- crafted pronunciations, and facilities for a lot of derivative words, the letter-to-sound rules are rarely called in the complete system. They are based on work by McIroy at Bell Labs.


Have the letter to sound rules not been ported yet, or is it just a bug? I think it is important for any synthesizer to have good letter to sound rules,
since there will inevitably be words in the text that aren't in the
dictionary.

Also, the dictionary should be expanded -- a project that got put on hold when the NeXT & NeXT software disappeared. All sorts of proper names/nouns need to be added, including city and country names, people's names, and so on. It has been more important recently to get the basic software up on GNU/Linux and the Mac.


I also find the intonation pattern interesting, and quite different from the original samples, but I'm sure that improving it is on the list of tasks to be completed. It also seems to me that the tonal quality of the voice is better than that of the sound files that David generously supplied on his Web site,
but this might be entirely my imagination.

Again, the intonation rules, based on the M.A.K. Halliday's intonation scheme for British English, were being refined. Craig [Taube-] Schock wrote his thesis on the topic under my supervision ("Intonation for Computer Speech Output" -- University of Calgary Dept. of Computer Science 1993) and received the Governor General's Gold Medal for it, but the method had already been greatly improved when we released the new articulatory synthesis software in 1994-5.

The "Lumberjack" and "The Chaos" speech samples on my university web site under "Gnuspeech material" are the untouched results of putting punctuated text into the original NeXTStep version of Gnuspeech (known then as the Trillium TextToSpeech kit). The "Pat-a-pan" sample was a Christmas teaser composed by our PhD musician Leonard Manzara for Christmas 1994. There are no instruments in the piece which is a simulation of singing an old Burgundian carol in four parts, with 16 voices, and set in an auditorium 30 feet square, with reverberations supplied by Leonard's acoustic imaging software (part of his PhD work). I attach a short write up on that piece for convenience.

Hope this helps.



Thank you, again, for the excellent work so far.



You encouragement is much appreciated.

Warm regards.

david

---------

Pat-a-pan (only the first verse of this old Burgundian carol is synthesised)

Note that there is no instrumental accompaniment in this synthesis, only voice harmony.

[The sound files are on my university web site: http:// pages.cpsc.ucalgary.ca/~hill under "Gnuspeech material"]

God and man this day are one,
Even more than fife and drum;
So these instruments we play,
Tu-re-lu-re-lu, pat-a-pat-a-pan,
So these instruments we play
For a joyful Christmas day!

This synthesis was produced as a pre-Christmas teaser for advertising puposes for Trillium Sound Research Inc in 1994. There are 16 unaccompanied male voices in four parts—arranged by Leonard Manzara— and located in a virtual hall 20 metres by 30 metres using acoustic imaging software developed by Leonard for the technical part of his doctoral thesis in music from the SUNY at Buffalo (Manzara 1990). Because it is a carol, the rhythm and intonation for the four parts are musically determined and not composed by the rhythm and intonation rules used for the other examples. Some variation was introduced between voices singing the same parts. Only the sopranos sing the lyrics above, the other parts sing “pat-a-pan” in various ways. The composition was completed before the system was finalised, so there are some deficiencies, notably in the balance between voiced and unvoiced sound. The sixteen different voices and acoustic imaging required significant effort which has not been repeated since the system achieved release status.
References

Manzara LC (1990) The simulation of acoustical space by means of physical modeling. PhD Dissertation, Faculty of the Graduate School of the State University of New York at Buffalo


reply via email to

[Prev in Thread] Current Thread [Next in Thread]