speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Speechd] KTTS and SpeechD integration


From: Gary Cramblitt
Subject: [Speechd] KTTS and SpeechD integration
Date: Mon Sep 4 09:59:48 2006

On Thursday 28 April 2005 04:30 pm, Milan Zamazal wrote:
> Hello Gary,
>
> it's nice to see you work actively on the TTS front-ends all the time!
>
> >>>>> "GC" == Gary Cramblitt <address@hidden> writes:
>
>     GC> Hynek and I have been discussing integration of the KDE
>     GC> Text-to-Speech System (KTTS) and Speech Dispatcher.  If this
>     GC> could be done, it would offer several advantages:
>
> It's an excellent idea!
>
>     GC> Towards this goal, I sat down to write a SpeechD plugin for
>     GC> KTTS, but immediately ran into some roadblocks.  I'd like to
>     GC> explain these roadblocks so the SpeechD team can consider
>     GC> possible changes to SpeechD.
>
> Hynek is the one who should provide authoritative answers.  But I can
> speak from a different point of view, the client side, as an author of a
> client extensively using Speech Dispatcher features (speechd-el for
> Emacs).
>
>     GC> SpeechD doesn't fall into any of these models.  It does not
>     GC> return a wav file.
>
> I think this is because Speech Dispatcher is not intended to operate as
> a plugin to another message manager.  

I understand.  Also, returning a wav file adds latency.  This is a tricky 
problem for which I have no good answer today.  KDE is going to be changing 
its multi-media architecture and integrating that with Speech Dispatcher will 
present some challenges.  For example, I can't tell you whether SpeechD 
should use GStreamer, or ALSA, or ... whatever.  I don't think returning a 
wav file is essential at this time, but it might become essential in the 
future.  For now, I'm treating it as an issue that can't be solved today, but 
must be resolved at some point.

> I think its functionality is 
> analogical to KTTS which perhaps neither returns a wave file.

True, although KTTS could be easily enhanced to do that.

>
>     GC> More seriously, it always runs asynchronously but does not
>     GC> notify when speech of a message has completed.
>
> This is a feature which should definitely be provided by Speech
> Dispatcher.  We have been having it in mind for long time, but I don't
> know what the current state is.  Hynek?
>
>     GC> Now SpeechD has its own priority and queueing system, so my next
>     GC> approach was to forego these capabilities and immediately send
>     GC> all messages to SpeechD.  In addition to losing the capabilities
>     GC> listed above, this would also mean that KTTS users could not
>     GC> combine SpeechD with other KTTS plugins, as speech from the
>     GC> other plugins would either block while SpeechD is speaking, or
>     GC> talk simultaneously, depending upon their PC's audio
>     GC> capabilities.
>
> Again, I don't think using Speech Dispatcher as a plugin to another
> message manager is a good idea (although it can be useful for
> experiments right now).  If the two systems were merged in future, some
> things might work better.

Agreed.  Yes, the plugin is a step towards the longer range goal of 
eliminating the KTTSD backend entirely, leaving a DCOP wrapper, KDE 
notification interface, enhanced filtering and document conversion, and a GUI 
for configuring everything and controlling speech in real time (Pause, 
Resume, Cancel).  The KTTSD queueing and prioritization would be eliminated.  
KTTS would no longer be a message manager, but would add additional 
capabilities.  It is even conceivable that KDE programmers might want to 
bypass the DCOP interface and interface directly with SpeechD.  Time will 
tell.

To the extent that writing a SpeechD plugin for KTTS raises issues, such as 
those we are discussing here, it is a good thing.  In a private email to 
Hynek I stated that the SpeechD plugin for KTTS would be distributed as 
"experimental" and not a final or even recommended solution.

>
>     GC> Now it is possible I'm not reading the SpeechD API correctly.
>     GC> It may be that I am misinterpreting the word "cancel" in the
>     GC> docs.  Under 'Important', it says
>
>     GC> -- When a new message of level `important' comes during a
>     GC> message of another priority is being spoken, this message other
>     GC> message is canceled and the message with priority `important' is
>     GC> said instead. Other messages of lower priorities are either
>     GC> postponed (priority `message' and `text') until there are no
>     GC> messages of priority important waiting or canceled (priority
>     GC> `notification' and `progress'.  --
>
> Yes, unless I'm very mistaken IMPORTANT just interrupts the messages and
> doesn't actually cancel them.  The wording in the documentation is
> confusing.
>
>     GC> Then under 'Message' type it says
>
>     GC> -- If there are messages of priority `notification', `progress'
>     GC> or `text' waiting in the queue or being spoken when a message of
>     GC> priority `message' comes, these are canceled.  --
>
> This is right, in this case the messages are really cancelled.
>
>     GC> So what I need is a message type like 'Important', but which
>     GC> interrupts and discards itself.  I thought about trying to use
>     GC> the SSIP CANCEL command to simulate such a message type, but
>     GC> since I have no way of knowing what kind of message SpeechD is
>     GC> currently speaking, that won't work.
>
> You can do it by opening a separate Speech Dispatcher connection for the
> Screen Reader messages.  Then CANCEL can apply only to messages sent
> through this connection, while the interruption still applies globally.
> But see below.

If I did SCREEN READER in a separate connection, would the two connections 
speak simultaneously (not necessarily a bad thing)?

>
>     GC> Stopping for a moment and reflecting on these issues, I came to
>     GC> the realization that SpeechD has a priority system that is ideal
>     GC> for Screen Readers, but not so good for speaking longer texts,
>     GC> such as web pages, pdf documents, or ebooks, while still
>     GC> providing interruption by higher-priority messages.  The 'Text',
>     GC> 'Notification', and 'Progress' types are ideal for screen
>     GC> readers, but strangely are of lower priority than 'Important' or
>     GC> 'Message'.  What seems to be missing is a "long text" type that
>     GC> is of lower priority than 'Text', 'Notification', and
>     GC> 'Progress', but is never discarded (unless application
>     GC> specifically cancels it.)
>
> Maybe it would be useful if we started with explanation of the
> background ideas of our message priority systems.
>
> If I understand the KTTS system right (please correct or complement me
> as necessary), it is based on the assumption that a user basically uses
> a screen reader for the purpose of navigating and controlling
> applications.  This is a "single-task" highly interactive activity with
> immediate TTS feedback.  Except this, system and applications can emit
> various more (Warning) or less (Message) urgent messages.  Additionally,
> a user can start reading a longer text (the Text priority) on a
> background and continue doing other things.

Yes, that's good summary of the existing KTTS model.

>
> If I am correct above, I have some questions about the KTTS model:
>
> - When a user performs intensive screen reading, he can miss important
>   messages like "the system is going down for reboot" or "phone call
>   from Joe Hacker".  The messages can be masked by the screen reading
>   and the user can hear them only after it's too late.

Yes, that would be a problem.

>  Do you rely on  sound events?

There is a capability to use sound when long text is interrupted by SCREEN 
READER, WARNING, or MESSAGE.   User must configure this however.  Sound 
events are on the TODO list.

>
> - How do you solve the problem of a growing queue when new messages are
>   coming more quickly than they can be spoken?  E.g. when a user types
>   on the keyboard too quickly to be able to hear the character just
>   typed in?

I guess we don't.  If the keyboard app uses SCREEN READER, then fast typing 
will discard speaking some of the letters.  If the keyboard app uses WARNING 
or MESSAGE, they will not be discarded.  We will probably want to add a 
NOTIFICATION type to address this.

>
> - How do you ensure that a user doesn't hear too many uninteresting
>   messages, i.e. messages which are only interesting now and not later
>   and only in case nothing more important is to be spoken
>   (e.g. reporting current time on background or a sequence of progress
>   messages)?

We will probably want to add a message type similar to PROGRESS.

>
> These questions provide motivation for the Speech Dispatcher priority
> system.  The IMPORTANT priority serves exactly the purpose of signalling
> important system events (reboot, phone call, perhaps new mail)
> immediately, without breaking other messages.  The other priorities are
> focused on common reading, they all work quite well together.  Typically
> NOTIFICATION is used for low priority events which may be safely
> discarded if there is anything more interesting to say (e.g. typed
> characters, periodical clock or weather reporting).  MESSAGE is for
> messages which should be spoken.  TEXT is most common for regular
> reading.  PROGRESS is good for sequences of progress messages, which you
> definitely don't want listen to all but which should be presented if
> nothing else to be spoken and after the progress report is finished.

As KDE does not currently have a Screen Reader, we haven't had to address the 
issues it raises as well as Speech Dispatcher has.  I anticipate that we will 
want to incorporate much of the SpeechD model.  In fact, in the longer range 
solution, it will *be* the SpeechD model since SpeechD will do all the 
queueing and prioritization.

>
> Long text reading can be done simply with the TEXT priority assuming the
> user listens to the text and doesn't do anything else.  I think
> "background" text reading is not possible in the Speech Dispatcher
> model.  I like your "long text" priority idea, it might be useful to add
> it in some form to Speech Dispatcher.

Yes please.

-- 
Gary Cramblitt (aka PhantomsDad)
KDE Text-to-Speech Maintainer
http://accessibility.kde.org/developer/kttsd/index.php


reply via email to

[Prev in Thread] Current Thread [Next in Thread]