[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gnuspeech-contact] TRM as backend for festival
From: |
Nickolay V. Shmyrev |
Subject: |
Re: [gnuspeech-contact] TRM as backend for festival |
Date: |
Mon, 12 Feb 2007 18:36:53 +0300 |
В Вск, 11/02/2007 в 16:31 -0800, David Hill пишет:
> It would probably help your understanding if you were to read the
> Monet manual. You wrote (see below):
>
>
> > But I have no idea how Monet
> > reproduces consonants. There are examples, but no trm files for
> > them.
>
>
> The .trm files are associated strictly with the tube model ("trm" =
> "tube resonance model") and are saved and used by the "Synthesiser"
> application (which is a GUI application for playing with the tube --
> but only steady state configurations). (You should probably read that
> manual as well). Consonants are mostly created by the dynamics of the
> vocal tract changes, though there are some continuant sounds such as
> frication as well (e.g. /s/) but even for these transitional cues are
> important. Thus it is impossible to create consonants from .trm files
> alone. They were really only useful in exploring the vocal tract
> configurations needed to create the vocal tract "postures" needed as
> anchor points (loosely related to "phones" for the varying speech
> parameters. The dynamic information needed for complete speech is
> created from these quasi-steady-state values representing vocal tract
> postures, plus context sensitive rules for moving from posture to
> posture, according to timing information that reflects the rhythmic
> character of British English. This information is all held within
> "diphones.monet" (the rules are actually more complex than diphones in
> many cases and include triphones & even tetraphones). Monet has the
> algorithms to use this information appropriately. The intonation is
> applied to the varying stream of tube parameters generated on this
> basis according to a model of British English intonation based on work
> by M.A.K. Halliday and elaborated by our own studies by varying the
> pitch (Fo) parameter, but these variations are added to small pitch
> changes created at the posture (segmental) level by constrictions in
> the vocal tract -- so-called "micro-intonation -- which provide
> additional cues for the identification of consonants. Many of the
> relevant papers are available on my university web site.
>
>
> The "oi" sound is just a succession of vowel sounds with a varying
> pitch, so a series of what appear to be .trm values will work. To
> produce speech, you need to be able to construct a more complex set of
> varying parameters reflecting the reality of speech. This is what
> Monet does. This is the part of Monet that needs to be extracted if
> all you wish to do is convert sound specifications to a speech
> waveform specification. The current Monet does much more since it
> allows you to create the databases as well as listen to the speech
> that can then be produced. The extracted part (non-interactive) that
> would simply use the databases to convert streams of posture symbols
> to an output waveform is what we call "Real-time Monet". It has not
> been ported from the original NeXT implementation yet.
>
>
> david
Heh, excuse me my ignorance. I was really confused by the picture on the
GnuSpeech homepage:
http://www.gnu.org/software/gnuspeech/
It shows trm as last stage before sound output, that's why I was
thinking it's possible to create trm file and then process it with tube
model and get output :(
Although I still don't understand where this Real-time Monet is located
and what code should be ported. And what is the difference btw
diphones.monet and diphones.mxml.
>
signature.asc
Description: Эта часть сообщения подписана цифровой подписью