Idea: extending SSIP protocol with CAPABILITIES command

speechd-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Idea: extending SSIP protocol with CAPABILITIES command

From:	Bohdan R . Rau
Subject:	Idea: extending SSIP protocol with CAPABILITIES command
Date:	Sat, 30 Oct 2010 10:08:42 +0200

Hi all.

It's my last little idea :)

Assumption:
Multi-language applications (like screenreaders) can perform particular
actions, but speech module (specialized for particular language) does it
better if this action is implemented. For example - screenreader can
replace punctuation characters with their spoken names. If punctution is
implemented in particular module/voice, the module may use more precise
way: it can use proper gramatic forms (important with flexion-based
languages), mark the punctuation characters with pitch or sound icon
(keeping prosodic pitch envelope unchanged), or even replace particular
characters by sound icons depending on this character context. So:
punctuation should be done on speech-dispatcher level if possible.

Goal:
Give the application possibility to choice best way to perform those
actions.

Syntax (capa list):

CAPABILITIES
(or shorten form)
CAPA

Server returns each capability in separated line, with form:
255-{NAME} {DESCRIPTION}
where {NAME} is capability name, description depends on current
capability. Common description are:
YES - capability is on
NO - server knows the capability, but is disabled in current voice
UNIMPLEMENTED or UNIMP - server knows the capability, but is unimplemented

Last line begins with response code without dash, like:
255 OK CAPABILITES SENT

Syntax (single):

CAPABILITY name
or
CAPA name

Server returns single line:
256 {NAME} {DESCRIPTION}
Another common description is UNKNOWN.

Examples:

CAPA
255-PAUSE YES
255-PUNCTUATION ALL SOME NONE
255-PROSODY RATE PITCH
255 OK CAPABILITIES SENT

Example 2 (module not supporting pauses):

CAPA PAUSE
256 PAUSE NO

Example 3 (speech-dispatcher knows the command
but is unimplemented on module or server level):

CAPA HISTORY
256 HISTORY UNIMPLEMENTED

Example 4: (unknown capability)

CAPA GETWAVE
256 GETWAVE UNKNOWN

Example 5 (voice supports rate setting, but not pitch):

CAPA PROSODY
256 PROSODY RATE

Possible interaction with client (for example screenreader):

Orca has 4 possibilities for punctuation (none, some, most, all). Today's
speech-dispatcher has 3, may be future versions will have all 4  (or even
more) possibilities, probably partially implemented in particular modules.

After start (or switching voice) screenreader should send CAPA command to
get actual capabilities for this particular voice. 

Let's assume we have 'some' punctuation checked.

If server sent 'SOME' in response, screenreader simply sends SET
PUNCTUATION command and then spoken text. If there was no 'SOME' in server
response, screenreader can replace punctuation characters by their spoken
names.

Another example:
Application has long text to speak, which may be interrupted by some
important messages. If the server supports pause, application may simply
create SSML string with markers and send it to server. If not, application
may break long text into smaller parts and speak each part separately.

What do you think?

ethanak
-- 
http://milena.polip.com/ - Pa pa, Ivonko!

[Prev in Thread]

Current Thread

[Next in Thread]

Idea: extending SSIP protocol with CAPABILITIES command, Bohdan R . Rau <=

Prev by Date: Design suggestion: The server just for synthesis
Next by Date: Design suggestion: The server just for synthesis
Previous by thread: Design suggestion: The server just for synthesis
Index(es):
- Date
- Thread