speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Speech Dispatcher roadmap discussion.


From: Bohdan R . Rau
Subject: Speech Dispatcher roadmap discussion.
Date: Mon, 13 Oct 2014 10:45:05 +0200

W dniu 2014-10-10 02:13, Luke Yelavich napisa?(a):
> On Thu, Oct 09, 2014 at 10:50:35PM AEDT, Bohdan R. Rau wrote:
>>
>> I also have another suggestions, but it's topic for next mail :)
>
> Looking forward to hearing about it.

PART ONE

At first:

as some of SSIP responses may be changed in new version, we should 
provide compatibility with current versions of speech-dispatcher (both 
in SSIP protocol and libspeechd). So first step would be new SSIP 
command, something like:

COMPAT_MODE On|Off

(default is On - it means, server emulates current version of SSIP 
protocol even with known bugs)

It should be safe - old applications will still work with new version, 
new applications would decide to continue if COMPAT_MODE command is not 
implemented (ie. we are connected to old version of speech-dispatcher) 
or die (because some of vital functions are not implemented).

Also new libspeechd should have function:

int spd_compat_mode(SPDConnection *conn, int compatible);

Function should return 0 at success or -1 if error occurs.

Alternatively - we could provide spd_open_new and spd_open2_new 
functions. In future applications we would use only _new functions.

1. Extending current event notifications

There is nothing new in protocol. Simply after commands CHAR, KEY and 
SOUND_ICON server should answer identically as after SPEAK, ie.:

> CHAR x
< 225-msg_id
< 225 OK MESSAGE_QUEUED

Analogically library functions spd_key, spd_char, spd_wchar and 
spd_sound_icon should return message id - but in compatibility mode they 
must return zero on success, because of possible code:

if (spd_char(conn,character)) {
     error();
}

2. New events

SYNC:

706-msg_id
706-client_id
706-start_index
706-end_index
706 SYNCHRONIZED

Event is fired in SSML mode, when module has SYNC mode enabled. Is 
similar to INDEX_MARK, but returns pair of index mark names (or empty 
string at start at end of text). It may be usable for application 
highlighting currently spoken text (book readers or applications for 
people with dyslexia). Both index name are used, because module may 
ignore some marks.
If SYNC mode in module is disabled (for example module has no SYNC 
capability), event must be fired with empty start_index and end_index 
name.

Alternatively, there may be reserved mark names - for example 
"__begin__" and "__end__".


AUTOSYNC:

707-msg_id
707-client_id
707-start_offset-end_offset
707 SYNCHRONIZED

Event fired in TEXT mode when module has AUTOSYNC mode enabled. It is 
similar to SYNC event, but relies on module capability of splitting 
spoken text into smaller parts. Returns offsets (in bytes) of begin/end 
of spoken text from start of message given in SPEAK command.
If AUTOSYNC mode of module is disabled (for example module has no 
AUTOSYNC capability), event must be fired with 0 as

AUTOPAUSE:

708-msg_id
708-client_id
708-offset
708 STOPPED

Event fired in TEXT mode when module has AUTOPAUSE mode enabled and we 
explicitly require autopause response from server with:

SET self AUTOPAUSE On

Returns length of spoken part of text (in bytes).
If we don't require AUTOPAUSE, server should automatically use 
AUTOPAUSE response of module, store internally remaining part of text 
and respond 704 PAUSED to client.

MOUTH:

709-msg_id
709-client_id
709-width-height
709 MOUTH

Event fired, when for example graphical application should redraw mouth 
of shown face. Width and height are given in range 0..100.
Todays modules have no idea about mouth shapes, but as I know it's 
possible. Module must have MOUTH mode enabled.

In libspeechd it's necessary to rewrite callback system. My suggestion:

typedef void (*SPD_Callback)(int msg, int id, int event, void 
*user_data,...);

and retrieve values with vararg.

For example:
INDEX_MARK - one value of char *
SYNC - two values of char *
AUTOSYNC - two integers
MOUTH - two integers
AUTOPAUSE - one integer

Also, there must be functions like:

SPD_Callback *spd_register_callback(SPDConnection *conn, int event, 
SPD_Callback *callback, void *user_data);
SPD_Callback *spd_unregister_callback(SPDConnection *conn,int event);

Of course this function is valid only in no-compatibility mode!

3. Module output capabilities

SPEAK - module can speak
FETCH - module can return synthesized wave to server
FILE - module can save synthesized wave to file

4. Module input capabilities

SSML - module can fully play with SSML and index marks;
FLAT - module translates internally SSML into plain text. Index mark 
are lost, pause/resume are not implemented.
PLAIN - module understands plain text (no SSML). Extra features (like 
AUTOPAUSE and AUTOSYNC) are possible only in this mode.

FLAT and SSML capabilities are mutually exclusive.
Server should never send SSML data for module reporting only PLAIN 
capability.
Server should always send SSML data when module does not reports PLAIN.
Server should never internally encode plain text into SSML if module 
reports PLAIN and any of extra features (AUTOPAUSE, AUTOSYNC etc.) is 
enabled. Also, server should never accept SSML data from application if 
extra features are enabled (it's application bug).

5. Module extended capabilities:

SYNC - valid only in SSML mode. 706 SYNCHRONIZED events will be fired 
only if SYNC mode is enabled.

AUTOSYNC - valid only in PLAIN mode. 707 SYNCHRONIZED event will be 
fired only if AUTOSYNC mode is enabled. Requires simple NLP in module.

WORDSYNC - valid only in PLAIN mode. 707 SYNCHRONIZED event will be 
fired at word boundaries instead of phrase/sentence boundariesif 
WORDSYNC mode is enabled.

AUTOPAUSE - valid only in PLAIN mode. 708 STOPPED event will be fired 
only if AUTOPAUSE mode is enabled. This mode may be turned on 
automatically by server responding 704 PAUSED to client. Requires simple 
NLP in module.

SPEECH_MOUTH - module can fire 709 MOUTH events during speaking.
FETCH_MOUTH - module can return to server mouth shapes together with 
speech wave
FILE_MOUTH - module can save mouth shapes together with speech wave

MOUTH capabilities are separated, because it's relatively simple to add 
MOUTH events to Mbrola based module in FETCH/FILE mode, but 
synchronization in realtime may be difficult.

Simple NLP (Natural Language Processor) must be able to automatically 
split given text into sentences (or - if synthesizer can speak also 
parts of sentences - phrases). It may be trivial (as splitting after 
each dot, exclamation or question followed by space) or more 
sophisticated (as phraser in Milena NLP, which understands context of 
dots and won't split after abbreviation or positional number in Polish 
text). Trivial NLP should be part of projected library for 
speech-dispatcher modules.

More about FETCH/FILE modes in next mail :)

ethanak
-- 
http://milena.polip.com/ - Pa pa, Ivonko!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]