Design suggestion: The server just for synthesis

speechd-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Design suggestion: The server just for synthesis

From:	Hynek Hanke
Subject:	Design suggestion: The server just for synthesis
Date:	Mon, 15 Nov 2010 00:07:14 +0100

On 12.11.2010 23:27, Andrei Kholodnyi wrote:
> - since TTS synthesis and audio are glued together now, it would be good
> to separate them and give people the possibility to retrieve TTS only if they 
> want.
> - besides that at the moment one instance of particular TTS engine is
> used per multiple clients which makes impossible to produce a separate audio 
> stream per client.
>    

Thanks Andrei, this is very precisely my own reply :)

Another considerations are:

- Message dispatching needs to be coordinated with Braille
and we need some new priorities as well (long message as
requested by KTTSD/Jovie). Also, we've got requests for
concurent streams (e.g. in different speakers). So as message
coordination is getting more complex, it would be good to have
a better separation of this task from the task of speech synthesis
and handling of output modules.

- Not only synthesis and audio are currently glued
together, but also speech  and callbacks/index marks. In fact,
callbacks/index marks are dispatched in the audio subsystem,
so there is actually no reason for them to pass back through
output modules and to complicate output modules with them.

- The emulation layer needs to go somewhere. Of course
it is possible to find a place in the current code as well,
but I don't think there is currently a place in the code where
it *belongs to*.

These are very many and very big tasks for a single
piece of code.

So we propose that the following separation:
     1) Message dispatching
     2) TTS management
     3) Audio subsystem
with well defined interfaces would be a good design.

Please note that separation doesn't have to mean
different processes. TTS API / TTS API Provider can
very well be just a library or something.

> - And finally there was a TTS API developed some time ago and we could
> try to use it instead of libspeechd API
> and also between server and modules.
>    

libspeechd API is a high-level API for the client applications,
using TTS API here would be too low-level. We need message
priorities and some higher level constructs. But a the libspeechd
API should be improved (even redesigned) based on the new
capabilities offered by TTS API.

One more thing I always point you towards TTS API: It was designed
and discussed between various parties. Gnome and KDE, TTS system
developers, FSG. It was also already partly implemented in both eSpeak
and Festival. It reflects concerns and requirements from various
sides and there are reasonable chances that these sides will be happy
if it is implemented.

Best regards,
Hynek Hanke

[Prev in Thread]

Current Thread

[Next in Thread]

Design suggestion: The server just for synthesis, (continued)
- Design suggestion: The server just for synthesis, Michael Pozhidaev, 2010/11/01

Prev by Date: Comments on the Text to Speech "algorithm"
Next by Date: Design suggestion: The server just for synthesis
Previous by thread: Design suggestion: The server just for synthesis
Next by thread: Design suggestion: The server just for synthesis
Index(es):
- Date
- Thread