speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Speechd] KTTS and SpeechD integration


From: Milan Zamazal
Subject: [Speechd] KTTS and SpeechD integration
Date: Mon Sep 4 09:59:48 2006

Hello Gary,

it's nice to see you work actively on the TTS front-ends all the time!

>>>>> "GC" == Gary Cramblitt <address@hidden> writes:

    GC> Hynek and I have been discussing integration of the KDE
    GC> Text-to-Speech System (KTTS) and Speech Dispatcher.  If this
    GC> could be done, it would offer several advantages:

It's an excellent idea!

    GC> Towards this goal, I sat down to write a SpeechD plugin for
    GC> KTTS, but immediately ran into some roadblocks.  I'd like to
    GC> explain these roadblocks so the SpeechD team can consider
    GC> possible changes to SpeechD.

Hynek is the one who should provide authoritative answers.  But I can
speak from a different point of view, the client side, as an author of a
client extensively using Speech Dispatcher features (speechd-el for
Emacs).

    GC> SpeechD doesn't fall into any of these models.  It does not
    GC> return a wav file.  

I think this is because Speech Dispatcher is not intended to operate as
a plugin to another message manager.  I think its functionality is
analogical to KTTS which perhaps neither returns a wave file.

    GC> More seriously, it always runs asynchronously but does not
    GC> notify when speech of a message has completed.

This is a feature which should definitely be provided by Speech
Dispatcher.  We have been having it in mind for long time, but I don't
know what the current state is.  Hynek?

    GC> Now SpeechD has its own priority and queueing system, so my next
    GC> approach was to forego these capabilities and immediately send
    GC> all messages to SpeechD.  In addition to losing the capabilities
    GC> listed above, this would also mean that KTTS users could not
    GC> combine SpeechD with other KTTS plugins, as speech from the
    GC> other plugins would either block while SpeechD is speaking, or
    GC> talk simultaneously, depending upon their PC's audio
    GC> capabilities.

Again, I don't think using Speech Dispatcher as a plugin to another
message manager is a good idea (although it can be useful for
experiments right now).  If the two systems were merged in future, some
things might work better.
    
    GC> Now it is possible I'm not reading the SpeechD API correctly.
    GC> It may be that I am misinterpreting the word "cancel" in the
    GC> docs.  Under 'Important', it says

    GC> -- When a new message of level `important' comes during a
    GC> message of another priority is being spoken, this message other
    GC> message is canceled and the message with priority `important' is
    GC> said instead. Other messages of lower priorities are either
    GC> postponed (priority `message' and `text') until there are no
    GC> messages of priority important waiting or canceled (priority
    GC> `notification' and `progress'.  --

Yes, unless I'm very mistaken IMPORTANT just interrupts the messages and
doesn't actually cancel them.  The wording in the documentation is
confusing.

    GC> Then under 'Message' type it says

    GC> -- If there are messages of priority `notification', `progress'
    GC> or `text' waiting in the queue or being spoken when a message of
    GC> priority `message' comes, these are canceled.  --

This is right, in this case the messages are really cancelled.
    
    GC> So what I need is a message type like 'Important', but which
    GC> interrupts and discards itself.  I thought about trying to use
    GC> the SSIP CANCEL command to simulate such a message type, but
    GC> since I have no way of knowing what kind of message SpeechD is
    GC> currently speaking, that won't work.

You can do it by opening a separate Speech Dispatcher connection for the
Screen Reader messages.  Then CANCEL can apply only to messages sent
through this connection, while the interruption still applies globally.
But see below.
    
    GC> Stopping for a moment and reflecting on these issues, I came to
    GC> the realization that SpeechD has a priority system that is ideal
    GC> for Screen Readers, but not so good for speaking longer texts,
    GC> such as web pages, pdf documents, or ebooks, while still
    GC> providing interruption by higher-priority messages.  The 'Text',
    GC> 'Notification', and 'Progress' types are ideal for screen
    GC> readers, but strangely are of lower priority than 'Important' or
    GC> 'Message'.  What seems to be missing is a "long text" type that
    GC> is of lower priority than 'Text', 'Notification', and
    GC> 'Progress', but is never discarded (unless application
    GC> specifically cancels it.)

Maybe it would be useful if we started with explanation of the
background ideas of our message priority systems.

If I understand the KTTS system right (please correct or complement me
as necessary), it is based on the assumption that a user basically uses
a screen reader for the purpose of navigating and controlling
applications.  This is a "single-task" highly interactive activity with
immediate TTS feedback.  Except this, system and applications can emit
various more (Warning) or less (Message) urgent messages.  Additionally,
a user can start reading a longer text (the Text priority) on a
background and continue doing other things.

If I am correct above, I have some questions about the KTTS model:

- When a user performs intensive screen reading, he can miss important
  messages like "the system is going down for reboot" or "phone call
  from Joe Hacker".  The messages can be masked by the screen reading
  and the user can hear them only after it's too late.  Do you rely on
  sound events?

- How do you solve the problem of a growing queue when new messages are
  coming more quickly than they can be spoken?  E.g. when a user types
  on the keyboard too quickly to be able to hear the character just
  typed in?

- How do you ensure that a user doesn't hear too many uninteresting
  messages, i.e. messages which are only interesting now and not later
  and only in case nothing more important is to be spoken
  (e.g. reporting current time on background or a sequence of progress
  messages)?

These questions provide motivation for the Speech Dispatcher priority
system.  The IMPORTANT priority serves exactly the purpose of signalling
important system events (reboot, phone call, perhaps new mail)
immediately, without breaking other messages.  The other priorities are
focused on common reading, they all work quite well together.  Typically
NOTIFICATION is used for low priority events which may be safely
discarded if there is anything more interesting to say (e.g. typed
characters, periodical clock or weather reporting).  MESSAGE is for
messages which should be spoken.  TEXT is most common for regular
reading.  PROGRESS is good for sequences of progress messages, which you
definitely don't want listen to all but which should be presented if
nothing else to be spoken and after the progress report is finished.

Long text reading can be done simply with the TEXT priority assuming the
user listens to the text and doesn't do anything else.  I think
"background" text reading is not possible in the Speech Dispatcher
model.  I like your "long text" priority idea, it might be useful to add
it in some form to Speech Dispatcher.

    GC> Thanks for listening.

Thanks for your observations and suggestions.

Regards,

Milan Zamazal

-- 
The world is not something you can wrap your head around without needing years
of experience.                              -- Kent M. Pitman in comp.lang.lisp


reply via email to

[Prev in Thread] Current Thread [Next in Thread]