[Speechd] KTTS and SpeechD integration

speechd-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Speechd] KTTS and SpeechD integration

From:	Milan Zamazal
Subject:	[Speechd] KTTS and SpeechD integration
Date:	Mon Sep 4 09:59:48 2006

>>>>> "HH" == Hynek Hanke <address@hidden> writes:

    HH> 3) The TTS is able to start synthesizing the text from an
    HH> arbitrary index mark in the given (complete) text.

    HH> What do you Milan think about (3)?

I'm not sure what's the point here.  I assume you don't think about just
cutting out the before-mark prefix, you probably think about starting
the synthesis considering the left context, e.g. correct pronunciation
("I will <mark> read" vs. "I have <mark> read"), intonation dependent on
the previous contents, or considering surrounding SSML markup.

I think this is a legitimate requirement on a TTS system.  But retaining
the complete left context means processing it, which may mean the
synthesizer doesn't start returning the output "immediately".  Solving
the related problems may not be easy.  Anyway, yes, it should be a job
of the TTS system, which can understand the text to the extent of the
needs of its speech synthesis mechanism.

    HH> A screen reader is a general purpose tool which you can use to
    HH> do most of your work and it should provide you a reasonable
    HH> level of access to all aplications that don't have some kind of
    HH> barier in themselves.

    HH> Screen reader typically only sees the surface of what actually
    HH> is contained in the application. It might be customized to some
    HH> degree for that particular application, but still it doesn't
    HH> have access to some information the application knows.

    HH> So it might make sense to build accessibility to some
    HH> applications directly, so that these accessibility solutions can
    HH> take advantage of the particular details of the application and
    HH> of the information that are available to it when it operates
    HH> *inside* the application.

    HH> I'll give two examples.

    HH> speechd-el knows the origin of each message so it can decide
    HH> which information is and which isn't important for the user
    HH> (according to user configuration of course) and take advantage
    HH> of the advanced priority model. It knows the language and
    HH> character encoding of buffers, so it can switch languages
    HH> automatically. It can perform other actions that a general
    HH> purpose screen reader couldn't.

What you say here is probably somewhat confused.  speechd-el is a sort
of screen reader inside the Emacs desktop.  Generally it knows nothing
about origins of messages nor about particular applications implemented
in Emacs.  It just utilizes information Emacs provides about the text to
make a (mostly) proper alternative output.  This is a very important
design decision, which makes speechd-el working better and without
maintenance nightmares, unlike highly application specific accessibility
solutions.  No Emacs application is modified for the purpose of smooth
working with speechd-el (not counting user customizations).

In my point of view the difference between a "screen reader" and an
"application reader" is that a screen reader reads contents of the
screen, while an application reader reads data provided by the
application.  From this point of view both speechd-el and what Gary was
talking about are application readers; but from the point of view of
your definition both are screen readers.

As for language and coding information: speechd-el doesn't know the
language of the text, unless it receives some hint; this is not much
different from what any common screen(Hynek)/application(Milan) reader
could do.  speechd-el doesn't care about coding at all, it works on
Emacs *characters*; speechd-el just asks Emacs to send the characters to
the alternative output device (Speech Dispatcher, BrlTTY) in the given
coding.

Please note speechd-el is not the final accessibility solution for the
_application_ called Emacs.  It is a practical demonstration of a
general accessibility solution for the Emacs _desktop_.  Also (more
importantly) it is an intermediate *working* solution providing wide
range of accessible applications to a user until a more general good,
powerful and stable application accessibility interface is defined and
implemented.  Once it becomes available, speechd-el can turn into the
interface bindings (of the Emacs _application_) and won't communicate
with Speech Dispatcher itself any longer.  The Emacs independent
application reader will do.

I think making applications accessible doesn't have much to do with
message priorities.  OTOH the Speech Dispatcher priority model proved to
be very helpful in speechd-el, for both the speech and Braille outputs.
So I think the priority model is suitable for that kind of
screen/application readers.

Regards,

Milan Zamazal

-- 
Free software is about freedom, not about free beer.  If you care only about
the latter, you'll end up with no freedom and no free beer.

[Prev in Thread]

Current Thread

[Next in Thread]

[Speechd] KTTS and SpeechD integration, Gary Cramblitt, 2006/09/04
- [Speechd] KTTS and SpeechD integration, Milan Zamazal, 2006/09/04
  - [Speechd] KTTS and SpeechD integration, Gary Cramblitt, 2006/09/04
    - [Speechd] KTTS and SpeechD integration, Milan Zamazal, 2006/09/04
- [Speechd] KTTS and SpeechD integration, Hynek Hanke, 2006/09/04
  - [Speechd] KTTS and SpeechD integration, Milan Zamazal <=
    - [Speechd] KTTS and SpeechD integration, Hynek Hanke, 2006/09/04
    - [Speechd] KTTS and SpeechD integration, Milan Zamazal, 2006/09/04
  - [Speechd] KTTS and SpeechD integration, Gary Cramblitt, 2006/09/04
  - [Speechd] KTTS and Sentence Boundary Detection, Gary Cramblitt, 2006/09/04
    - [Speechd] KTTS and Sentence Boundary Detection, Milan Zamazal, 2006/09/04

Prev by Date: [Speechd] KTTS and SpeechD integration
Next by Date: [Speechd] KTTS and SpeechD integration
Previous by thread: [Speechd] KTTS and SpeechD integration
Next by thread: [Speechd] KTTS and SpeechD integration
Index(es):
- Date
- Thread