speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Speech Dispatcher roadmap discussion.


From: Bohdan R . Rau
Subject: Speech Dispatcher roadmap discussion.
Date: Thu, 09 Oct 2014 13:50:35 +0200

W dniu 2014-10-08 09:32, Luke Yelavich napisa?(a):

> Hey folks.
> This has been a long time coming.

Better late than never :)
>
> * Assess whether the SSIP protocol needs to be extended to better
> support available synthesizer features

Yes!

Some years ago I proposed CAPABILITY command...

>
> Two questions that often get asked in the wider community are:
> 1. Can I get Speech Dispatcher to write audio to a wav file?


Let's assume - there are three possibilities:

a) Module can speak. Probably all modules can speak (excluding dummy 
module, which should be removed and replaced by internal server 
functionalities).

b) Module can write wave to file. For example: hardware synthesizers 
can not.

c) Module can return synthsized wave to server without writing to file. 
As previous, hardware synthesizers can not.

So there are three possible answers (minimum one) - for example:
SPEAK
FILE
FETCH

Analogically, server should return it's capabilities for current module 
- but server should do more.

For example: if module can only speak, there is no place for dance :(
if module can FILE or FETCH but not SPEAK, server is still capable to 
speak (for example fetching wave or reading waveform to from file and 
play it using internal methods).

Commands like 'speak', 'char' and 'key' may be prefixed with:

a) FETCH - it means, we want to fetch waveform from communication 
socket.
b) FILE <filename> - we want to save waveform to file.


Also, module should return possible modifications, like:
RATE
PITCH
VOLUME


> 2. How can I use eSpeak's extra voices for various languages?

SET SYNTHESIS_VOICE command should understand variants. There is no 
need to extend SSIP, for example simply 'name' may be in three forms:

voice_name - set voice and default variant
voice_name:variant - set voice and variant
:variant - switch variant of current voice

Another solution: use predefined voice names in module. I used this 
solution in one of my experimental (and dead) modules (txt2pho + mbrola) 
for German mbrola voices.

> * SystemD/LoginD integration

Is it problem of speech-dispatcher or pulseaudio?

> * Rework of the settings mechanism to use DConf/GSettings

As I agree: current settings mechanism should go to museum as fast as 
possible - but DConf and GSettings are worst candidates. Configuration 
file should be as simple as possible - in practice we need nothing more 
but hash array of strings. Hash tables are faster than GSettings...

I use similar solution in my experimental (but daily used by several 
persons, both completely blind and little visual impaired): 
http://tts.polip.com/files/sd_milvona-0.1.9.tar.gz

>
> * Separate compilation and distribution of modules
>
> As much as many of us prefer open source synthesizers, there are
> instances where users would prefer to use proprietary synthesizers. 
> We
> cannot always hope to be able to provide a driver for all
> synthesizers, so Speech Dispatcher needs an interface to allow
> synthesizer driver developers to write support for Speech Dispatcher,
> and build it, outside the Speech Dispatcher source tree.


Yes, yes, yes!

Look above :)

Milena does not use proprietary software (excluding Mbrola), but is 
specialized for single (not very popular) language, and depends on 
open-source, but extensively developed llibraries (milena, ivolektor 
etc) which should not be shipped together with speech-dispatcher 
(sometimes I published several versions of data files during one month).

I can imagine similar modules specialized for languages like Mongolian, 
Nynorsk or even Quenya and Klingon... but a these modules are 
interesting only for small group of users, it's no sense to put them 
into main speech-dipatcher distribution :)

As I spent some time developing independent modules, for me there 
should be something like:

a) something like libspeechdmodule - C library containing all needed 
functions and skeleton of module.

b) working solution for other languages (like Python). I tried to write 
skeleton for Python, but I'm not very happy with the results...


> * Consider refactoring client API code such that we only have one
> client API codebase to maintain, i.e python bindings wrapping the C
> library etc

For Python (cython):

As low-level Python binding should provide only direct interface to 
libspeechd, it's simple and - after created - does not need maintenance 
until C API will change. In fact, there is task for one person for two 
days (counting morning cafe and visit in pub). If needed, I can provide 
first version of Python extensions during weekend.

In fact, I had big problem with my simple application for Ubuntu and 
speech-dispatcher. I wrote my app in Python 2.7, and as there is only 
Python3 interface in Ubuntu... you can imagine results. My first idea 
was "write Python binding to libspeechd", but I decided to rewrite this 
app in C :)

GObject Introspection is nice idea, but I cannot imagine this solution 
with current version of speech-dispatcher library...

Suggested "ctype" solution is worst - ctype is good for simple 
functions, but not for something more sophisticated - like 
get_synthesis_voices().

As I use only Python and C in my applications, I won't say anything 
about other languages.

> * Moving audio drivers from the modules to the server

Little upgrade:

Allow module to use server audio output.

All your long story of audio problems affects only pulseaudio. For 
other audio systems there are different problems (for example - not 
working Alsa when loaded from dynamically linked library - is this bug 
corrected in Alsa?).

I assume the server audio system will be possible to change rate/pitch 
of synthesized wave (with sonic)...

I also have another suggestions, but it's topic for next mail :)

ethanak
-- 
http://milena.polip.com/ - Pa pa, Ivonko!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]