On 29/03/2016 00:59, David Hill wrote:
Dear Alex,
Yes, indeed. There have been too many alligators in the swamp
recently, and something in the speech synthesis line is way
overdue. Thanks for the prod! :-)
Real soon now!
All good wishes.
david
--------
David Hill
--------
Simplicity,
patience, compassion. These three are your
greatest treasures (Tao Te Ching #67)
---------
Thanks, My query was prompted by the absence of any "free" voices
that I could use for free content development, namely the use of
synthetic speech for audio-drama. Not many of the voice in Espeak
http://espeak.sourceforge.net/)( most work now being done on the
Espeak-ng fork here -
https://github.com/espeak-ng/espeak-ng) (
which was about the only "free" synthetic speech generator I could
find for Windows XP) sound that naturalistic which may be a
limitation of the speech generation model it uses ( formanant
synthesis). MBROLA ( a diphone based system) sounded better, but
can't be used in commercial instances , which means I couldn't
really use it for audio-drama purposes for licensing reasons.
Free content audio-drama, needs free resources, like sound effects
(and this includes free voices for synthetic speech.)
Please note I am note in speech research or any related fields
professionally.
Some issues with dramatic speech I noted. (which I am not sure
current synthetic speech systems handle that well.)
1. Different prosody for the same text phrase,
"You will come with us, Miss Jones?" Spoken (as British RP) in a
comedy sounds very different from "You WILL come WITH US, Miss
Jones!" spoken as a command in a heavily German accented voice in a
thriller, despite being nominally the same text.
Some other examples :
I can think of is the way that different actors have approached
Shakesperean passages.. I've heard at least 3 (or more) version of
some of them, which sounded very different.
A phrase like " We have seen the outcome of the dozens of missed
opportunities for the government establishment to resolve this
unfortunate situation.." is another which has different prosody
depending on the context , and whose speaking, which is also related
to the next issue I ran up against.
2. Accented and dialect styles.
In an audio-drama script I was working on, one of the characters
although speaking in English has a Germanic accent for dramatic
purposes. In order to get this working in E-Speak, I was in effect
having to manually re-code English words into an approriately
sounding phonetic form. This could be somewhat automated, but would
need information about phoneme mapping between different
languages.
In addition different accents run at different speeds. The example
phrase, "We have seen the outcome of the dozens of missed
opportunities for the government establishment to resolve this most
unfortunate situation!", can be at different speeds even in
English. A standard BBC jouranalistic voice would read it slightly
faster, than an Irish Republican might, but considerably slower
than a British Asian.
There are many "staged" accents/dialects which are not necessarily
representative of the language they sterotype, and thusly to do
"staged" voices, looking at the original language group may not be
the whole story. Some examples ( nominally all speaking in
English) are (I've tried to give some sample phrases.)
the "Indian doctor" (invented by Peter Sellers among others), "Well
, I think you'll find it's not as usual as you think..."
Nannete ( the French maid in Farces.), "B-B-ut, I has only juzt
feeneeshed ze floors!"
The mad scientist - "You vill understand the importance of my vork,
even if it kills me vurst!",
'Strine ( As in broad but clearly staged Australian) - " I don't
know about you mate, but that fella was a darned lucky one!"
Mommerset ( a generic rural dialect use by the BBC amongst other
producers, based on a British West Country dialect,
"I don't know what the squire was on about, That bull never left the
field."
"pirate" (which is supposedly Bristol/Plymouth but seems to have
been a complete invention for a film according to the internet),
amongst others.-
"I;ll be thinkin, you'll be more respectful, when the Cap'n' brings
his friends."
"Countess" - A Central European/Slavic dialect, trope used in a lot
of Vampire films...- "So you, wish to view the cryp-t? I would
advise against dist-urbing the sleep of my anc-es-tors.."
Whilst professional research has understandably focused on actual
languages and dialects, for audio drama work, especially that aiming
to emulate earlier production styles, synthetic speech that follows
"staged styles" should be considered in my view.
3. Cut-in and echoing.
In dramatic speech, the situation where one character cuts in or
echoes the phrasing of another occurs. This may present an issue
for multi-voice synthetic speech generation, as without additional
coding (such as marking cut in and echo points) there isn't an easy
way of knowing where a second voice should be overlaid on the first.
An example ( Both voices are New York possibly Brooklyn)
DANIELS "So you went to the Gallery, Joey?"
JOEY "Yes.. Mr Daniels"
DANIELS: "And?"
JOEY: "And I didn't fi(nd it!)"
DANIELS (cutting in angrily ) I don't want excuses!
JOEY: You want me to take a second look?
DANIELS (nods) A second , third look until you (find the "artefact")
JOEY (echoing over DANIELS) find the "artefact"
I'm not personally aware of synthetic speech generators that can
handle cut-in and phrase echoing, even though they occur in normal
speech. I am not sure if it could even be done in real-time because
of the need to mix two audio sequences.
I said earlier that I am not in Speech research professionaly, so
these issues may have already been dealt with in certain systems.
Alex Farlie
This email has been sent from a virus-free computer protected by Avast. www.avast.com
|