speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Design suggestion: The server just for synthesis


From: Tomas Cerha
Subject: Design suggestion: The server just for synthesis
Date: Mon, 15 Nov 2010 15:01:47 +0100

Dne 15.11.2010 13:38, Andrei Kholodnyi napsal(a):
> Does it mean we want to provide to the application system wide
> capabilities instead of particular driver capabilities?

I'd say NO.

> e.g. if a particular driver does not implement SSML we would implement
> SSML for it inside provider?

I'd say YES.

> and deliver "can_parse_ssml" to the application?

If the applications care whether they get "emulated" or "real" SSML, I'd say we 
should
be able to tell them.  But this doesn't mean they need to care.

There is currently no specification of the client API, so it is up to the 
discussion to
decide which features of the lower level API we want to expose to the client.  
The
current SSIP is a good start and we can extend it by features needed by the 
clients.

> If yes, what about the capabilities which we can not implement e.g.
> generic drivers can not generate speech samples as output?

These would not be emulated.  There is a distinct set of capabilities which can 
not be
emulated from principal, so the applications need to be able to handle both 
situations
in this case or ignore the drivers which don't support the capability if it is 
essential.

> For me it is a key design question. someone shall aggregate/handle all
> these differences.
> If we do not do it, then each app shall do this job.

Sure. I believe we should do it wherever possible.

> E.g. app wants to know all voices it can get back as speech samples.
> currently it will probably do:
> for all drivers get capability "can_retrieve_audio"
> if can_retrieve_audio
>   list all voices
>   add them to my favorite list
> 
> whereas we can do it for the app with High level API like:
> list voices with capabilities can_retrieve_audio, i.e. hide particular
> driver capabilities
> 
> This I could imagine as a high level API on top of TTS API

If I understand what you mean, the difference is whether you think of a driver 
as a
property of voice or vice versa.  Otherwise it is equivalent.  Both approaches 
can be
implemented above TTS API.

>> An SSIP bridge can also be written on top of the new API for backwards 
>> compatibility.
>> Libspeechd, Python library and other client libraries could run without a 
>> change through
>> this brigde.
> 
> the only difference in SSIP versus TTS API AFAIR are priority handling
> and history. Not sure how it can be smoothly integrated.
> probably it can be added on top of TTS API as well, but there are APIs
> missing for it,
> probably some tags can be incorporated in the messages?

Yes, TTS API is a low level API.  Priorities are handled within the layer above 
it.
Thus the client API must have some features not present in TTS API 
specification.  Also
many features present in TTS API specification do not need to be exposed to the 
client API.

If it was not clear from the previous discussion, the ambition of TTS API is to 
become a
standard API for access to TTS engines.  Speech Dispatcher would be the 
consumer of this
API -- the layer between the clients and the drivers which implement TTS API.  
Another
speech service (like Speech Dispatcher) should be able to use the same API and 
reuse the
same drivers to access speech engines.  This other service might have a 
different client
API but we can also decide to standardize the client API.  Standardization of 
the client
API would be a benefit for assistive technologies and other client 
applications.  On the
other hand, TTS API is good for output modules (tts engine drivers).  One 
common driver
API can be used by differnt speech systems and the output drivers can be 
shared.  Both
levels of standardisation make sense, but we believe the low level API is 
easier to
standardize since it is easier to agree on a common set of low level features.  
So we
started with this one.

Thanks everyone for your valuable input.

Best regards, Tomas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]