speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Speechd] Client notification about message start/stop and reached index


From: Hynek Hanke
Subject: [Speechd] Client notification about message start/stop and reached index marks
Date: Mon Sep 4 09:59:48 2006

Hi everyone,

we have already discussed on this list that it would be very desirable to let
the client applications of Speech Dispatcher know when it starts to speak a
certain message, when it terminates to speak a message and when index marks
inside the message are reached.

It's easy to get this information inside Speech Dispatcher and Festival even
provides support for index marking in SSML with festival-freebsoft-utils.
However, it's not clear how this information should be transmitted to the
client application, because SSIP is a synchronous protocol.

I think the following would be a reasonable solution.

* SSIP will be extended so that the reply on the SPEAK command includes message
identification number. (Technically, this would not be necessary, because it's
already possible to obtain such information using the HISTORY commands. However
it might not be practical to have to ask Dispatcher after every SPEAK request.)

* Applications that want to make use of the notification open a second
(synchronous) TCP/IP connection to Speech Dispatcher on which Speech Dispatcher
is the sender and application is the listener. The protocol used for this would
be very similar to SSIP or become a part of SSIP.

I can think of 4 or 6 types of events:
        1) started to speak message "msg-id"
        2) index mark "mark-id" reached in message "msg-id"
        3) end of speaking message "msg-id"
        4) message "msg-id" discarded (canceled)
        5) message "msg-id" paused (would this be useful for something?)
        6) message "msg-id" resumed (would this be useful for something?)

* SSIP will be extended so that the application can specify which types of
events it wants to be notified about. These would be state switches valid for
all incomming messages until the notification request is switched off or to
some other mode. (In a similar way to how priority switching is handled now in
Speech Dispatcher.)

I think receiving all notification about all messages is unnecessary most of the
time, it would only cause more CPU load on the side of both server and
application, and the application would typically receive a lot of useless
messages/callbacks (for example type (1), (3) and (4) for messages of priority
PROGRESS etc.) For this reason, I think it's good to have a possibility in SSIP
to choose for which messages the application is interested in receiving these
notifications.

Just to give an example, it could be something like
        SET self NOTIFICATION none
        SET self NOTIFICATION begin-end-discarded
        SET self NOTIFICATION index-marks-begin-end-discarded
        SET self NOTIFICATION all
of course with values of a diferent name.

An SSIP session on the main connection might then look like this

        SET self NOTIFICATION all
        2??-OK NOTIFICATION SET

        SPEAK
        <ssml>Hello world!<mark name="my_index_1"> How are you?</ssml>
        .
        2??-213
        2??-OK MESSAGE QUEUED

        SET self NOTIFICATION none
        2??-OK NOTIFICATION SET

        SPEAK
        blabla
        .
        2??-213
        2??-OK MESSAGE QUEUED
                
* The language-specific APIs would offer a function that listens on the socket
and returns on each received event with the full information about the event.

* The language-specific APIs would offer a function that (maybe run from a
separate thread) would execute a user-specified callback on receiving each
event.

This should all be quite easy to implement.


Question: How to ensure that both connections (main and notification) are
connected to the same client application?

Each client application could, on request, receive a hash-code through the main
connection and then use it to open the notification connection.

Would that cause some security troubles?

(Of course, each program that would listen to the communication between client
and server on the main connection would know the hash-code and could use it for
false authorization. But would that be a problem? The program wouldn't get any
more sensitive data than what it already gets from listening to the
communication on the main connection -- which already would be very severe.)


Please let me know what do you think about it. I'd especially like to know if
this is suitable from the client application point of view or if some different
approach would be better (which one?) as I don't have much experience with the
application side.

With Regards,
Hynek Hanke


reply via email to

[Prev in Thread] Current Thread [Next in Thread]