[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gnuspeech-contact] Early GNUSpeech observations
From: |
David Hill |
Subject: |
Re: [gnuspeech-contact] Early GNUSpeech observations |
Date: |
Wed, 8 Apr 2009 15:37:09 -0700 |
Hi Jason,
On Apr 8, 2009, at 1:24 AM, Jason White wrote:
Having run the (very early) GNU/Linux version, I wish to
congratulate the
authors of GNUSpeech for having advanced the porting effort this far.
Thank you.
I notice that the version which I built doesn't pronounce names and
relatively
uncommon words - perhaps it is restricted to pronouncing words that
are in its
dictionary. I hear a "zzz" sound in place of each omitted word.
This sound was put in there deliberately by Steve Nygard to make sure
it was clearly understood that the system was not dealing with parts
of the input, because (as you guess) the parser (which does all kinds
of things including dictionary derivatives, arranging numbers and
dates to be spoken in the way people speak them, and so on) is by no
means completely ported. It is probably the very next job because it
makes a big different to the overall quality of the spoken output.
There is a letter-to-sound component in there as should be
functioning. I don't think that's what causes the funny "zzzzzt"
noises, though they are not very good rules and normally are not
normally used much because with a 70,000 word dictionary, with hand-
crafted pronunciations, and facilities for a lot of derivative words,
the letter-to-sound rules are rarely called in the complete system.
They are based on work by McIroy at Bell Labs.
Have the letter to sound rules not been ported yet, or is it just a
bug? I
think it is important for any synthesizer to have good letter to
sound rules,
since there will inevitably be words in the text that aren't in the
dictionary.
Also, the dictionary should be expanded -- a project that got put on
hold when the NeXT & NeXT software disappeared. All sorts of proper
names/nouns need to be added, including city and country names,
people's names, and so on. It has been more important recently to
get the basic software up on GNU/Linux and the Mac.
I also find the intonation pattern interesting, and quite different
from the
original samples, but I'm sure that improving it is on the list of
tasks to be
completed. It also seems to me that the tonal quality of the voice
is better
than that of the sound files that David generously supplied on his
Web site,
but this might be entirely my imagination.
Again, the intonation rules, based on the M.A.K. Halliday's
intonation scheme for British English, were being refined. Craig
[Taube-] Schock wrote his thesis on the topic under my supervision
("Intonation for Computer Speech Output" -- University of Calgary
Dept. of Computer Science 1993) and received the Governor General's
Gold Medal for it, but the method had already been greatly improved
when we released the new articulatory synthesis software in 1994-5.
The "Lumberjack" and "The Chaos" speech samples on my university web
site under "Gnuspeech material" are the untouched results of putting
punctuated text into the original NeXTStep version of Gnuspeech
(known then as the Trillium TextToSpeech kit). The "Pat-a-pan"
sample was a Christmas teaser composed by our PhD musician Leonard
Manzara for Christmas 1994. There are no instruments in the piece
which is a simulation of singing an old Burgundian carol in four
parts, with 16 voices, and set in an auditorium 30 feet square, with
reverberations supplied by Leonard's acoustic imaging software (part
of his PhD work). I attach a short write up on that piece for
convenience.
Hope this helps.
Thank you, again, for the excellent work so far.
You encouragement is much appreciated.
Warm regards.
david
---------
Pat-a-pan (only the first verse of this old Burgundian carol is
synthesised)
Note that there is no instrumental accompaniment in this synthesis,
only voice harmony.
[The sound files are on my university web site: http://
pages.cpsc.ucalgary.ca/~hill under "Gnuspeech material"]
God and man this day are one,
Even more than fife and drum;
So these instruments we play,
Tu-re-lu-re-lu, pat-a-pat-a-pan,
So these instruments we play
For a joyful Christmas day!
This synthesis was produced as a pre-Christmas teaser for advertising
puposes for Trillium Sound Research Inc in 1994. There are 16
unaccompanied male voices in four parts—arranged by Leonard Manzara—
and located in a virtual hall 20 metres by 30 metres using acoustic
imaging software developed by Leonard for the technical part of his
doctoral thesis in music from the SUNY at Buffalo (Manzara 1990).
Because it is a carol, the rhythm and intonation for the four parts
are musically determined and not composed by the rhythm and
intonation rules used for the other examples. Some variation was
introduced between voices singing the same parts. Only the sopranos
sing the lyrics above, the other parts sing “pat-a-pan” in various
ways. The composition was completed before the system was finalised,
so there are some deficiencies, notably in the balance between voiced
and unvoiced sound. The sixteen different voices and acoustic imaging
required significant effort which has not been repeated since the
system achieved release status.
References
Manzara LC (1990) The simulation of acoustical space by means of
physical modeling. PhD Dissertation, Faculty of the Graduate School
of the State University of New York at Buffalo