gnuspeech-contact
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnuspeech-contact] TRM as backend for festival


From: Nickolay V. Shmyrev
Subject: Re: [gnuspeech-contact] TRM as backend for festival
Date: Mon, 12 Feb 2007 18:36:53 +0300

В Вск, 11/02/2007 в 16:31 -0800, David Hill пишет:
> It would probably help your understanding if you were to read the
> Monet manual.  You wrote (see below):
> 
> 
> > But I have no idea how Monet
> > reproduces consonants. There are examples, but no trm files for
> > them.
> 
> 
> The .trm files are associated strictly with the tube model ("trm" =
> "tube resonance model") and are saved and used by the "Synthesiser"
> application (which is a GUI application for playing with the tube --
> but only steady state configurations).  (You should probably read that
> manual as well).  Consonants are mostly created by the dynamics of the
> vocal tract changes, though there are some continuant sounds such as
> frication as well (e.g. /s/) but even for these transitional cues are
> important.  Thus it is impossible to create consonants from .trm files
> alone.  They were really only useful in exploring the vocal tract
> configurations needed to create the vocal tract "postures" needed as
> anchor points (loosely related to "phones" for the varying speech
> parameters.  The dynamic information needed for complete speech is
> created from these quasi-steady-state values representing vocal tract
> postures, plus context sensitive rules for moving from posture to
> posture, according to timing information that reflects the rhythmic
> character of British English.  This information is all held within
> "diphones.monet" (the rules are actually more complex than diphones in
> many cases and include triphones & even tetraphones).  Monet has the
> algorithms to use this information appropriately.  The intonation is
> applied to the varying stream of tube parameters generated on this
> basis according to a model of British English intonation based on work
> by M.A.K. Halliday and elaborated by our own studies by varying the
> pitch (Fo) parameter, but these variations are added to small pitch
> changes created at the posture (segmental) level by constrictions in
> the vocal tract -- so-called "micro-intonation -- which provide
> additional cues for the identification of consonants.  Many of the
> relevant papers are available on my university web site.
> 
> 
> The "oi" sound is just a succession of vowel sounds with a varying
> pitch, so a series of what appear to be .trm values will work.  To
> produce speech, you need to be able to construct a more complex set of
> varying parameters reflecting the reality of speech.  This is what
> Monet does.  This is the part of Monet that needs to be extracted if
> all you wish to do is convert sound specifications to a speech
> waveform specification.  The current Monet does much more since it
> allows you to create the databases as well as listen to the speech
> that can then be produced.  The extracted part (non-interactive) that
> would simply use the databases to convert streams of posture symbols
> to an output waveform is what we call "Real-time Monet".  It has not
> been ported from the original NeXT implementation yet.
> 
> 
> david

Heh, excuse me my ignorance. I was really confused by the picture on the
GnuSpeech homepage:

http://www.gnu.org/software/gnuspeech/

It shows trm as last stage before sound output, that's why I was
thinking it's possible to create trm file and then process it with tube
model and get output :(

Although I still don't understand where this Real-time Monet is located
and what code should be ported. And what is the difference btw
diphones.monet and diphones.mxml.


> 

Attachment: signature.asc
Description: Эта часть сообщения подписана цифровой подписью


reply via email to

[Prev in Thread] Current Thread [Next in Thread]