speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Speech Dispatcher roadmap discussion.


From: Luke Yelavich
Subject: Speech Dispatcher roadmap discussion.
Date: Fri, 10 Oct 2014 11:13:37 +1100

On Thu, Oct 09, 2014 at 10:50:35PM AEDT, Bohdan R. Rau wrote:
> W dniu 2014-10-08 09:32, Luke Yelavich napisa?(a):
> 
> >Hey folks.
> >This has been a long time coming.
> 
> Better late than never :)
> >
> >* Assess whether the SSIP protocol needs to be extended to better
> >support available synthesizer features
> 
> Yes!
> 
> Some years ago I proposed CAPABILITY command...

<Snip>

I really like this proposal, I think it makes sense.

> >2. How can I use eSpeak's extra voices for various languages?
> 
> SET SYNTHESIS_VOICE command should understand variants. There is no need to
> extend SSIP, for example simply 'name' may be in three forms:
> 
> voice_name - set voice and default variant
> voice_name:variant - set voice and variant
> :variant - switch variant of current voice
> 
> Another solution: use predefined voice names in module. I used this solution
> in one of my experimental (and dead) modules (txt2pho + mbrola) for German
> mbrola voices.

I hadn't looked into this in much detail, but your first solution makes sense, 
I am of the view that all voice name/listing should be dynamic, so not a fan of 
pre-defined voices.

> >* SystemD/LoginD integration
> 
> Is it problem of speech-dispatcher or pulseaudio?

It is not a problem so much as to make sure Speech Dispatcher can continue to 
work properly in multi-session multi-seat environments where security is 
important, and also makes sure Speech Dispatcher is not using any resources 
when it should not be. As Hynek outlined in the documentation I linked to and 
in the original discussion, Speech Dispatcher would not send any text to be 
synthesized if it new that the active session was not one that it belonged to. 
Sure pulse does this already, but why spend CPU cycles sending text to be 
synthesized to only have the audio get dropped when the currently active 
session on a seat is not one that the Speech Dispatcher server belongs to.
> 
> >* Rework of the settings mechanism to use DConf/GSettings
> 
> As I agree: current settings mechanism should go to museum as fast as
> possible - but DConf and GSettings are worst candidates. Configuration file
> should be as simple as possible - in practice we need nothing more but hash
> array of strings. Hash tables are faster than GSettings...

How do you propose we document what a setting is for? Does that allow for 
translatable descriptions of the setting so that users can know what a setting 
is for in their own language? Does the system you have in mind allow for 
locale-specific default settings if desired? Gsettings does these, and does 
them now. GSettings also allows a setting to be changed, and have that settings 
change be acted upon in real time as well.

It should also be noted that GSettings does support multiple backends. 
Currently the most commonly used backend is dconf, but you could write another 
backend if you so desired, and use that.

> >* Separate compilation and distribution of modules
> >
> >As much as many of us prefer open source synthesizers, there are
> >instances where users would prefer to use proprietary synthesizers. We
> >cannot always hope to be able to provide a driver for all
> >synthesizers, so Speech Dispatcher needs an interface to allow
> >synthesizer driver developers to write support for Speech Dispatcher,
> >and build it, outside the Speech Dispatcher source tree.
> 
> 
> Yes, yes, yes!
> 
> Look above :)
> 
> Milena does not use proprietary software (excluding Mbrola), but is
> specialized for single (not very popular) language, and depends on
> open-source, but extensively developed llibraries (milena, ivolektor etc)
> which should not be shipped together with speech-dispatcher (sometimes I
> published several versions of data files during one month).
> 
> I can imagine similar modules specialized for languages like Mongolian,
> Nynorsk or even Quenya and Klingon... but a these modules are interesting
> only for small group of users, it's no sense to put them into main
> speech-dipatcher distribution :)
> 
> As I spent some time developing independent modules, for me there should be
> something like:
> 
> a) something like libspeechdmodule - C library containing all needed
> functions and skeleton of module.
> 
> b) working solution for other languages (like Python). I tried to write
> skeleton for Python, but I'm not very happy with the results...

This is along the lines of what I had in mind as well.

> >* Consider refactoring client API code such that we only have one
> >client API codebase to maintain, i.e python bindings wrapping the C
> >library etc
> 
> For Python (cython):
> 
> As low-level Python binding should provide only direct interface to
> libspeechd, it's simple and - after created - does not need maintenance
> until C API will change. In fact, there is task for one person for two days
> (counting morning cafe and visit in pub). If needed, I can provide first
> version of Python extensions during weekend.
> 
> In fact, I had big problem with my simple application for Ubuntu and
> speech-dispatcher. I wrote my app in Python 2.7, and as there is only
> Python3 interface in Ubuntu... you can imagine results. My first idea was
> "write Python binding to libspeechd", but I decided to rewrite this app in C
> :)

I must admit I did ponder whether to ship python 2 and 3 bindings in 
Debian/Ubuntu. Since Orca was the only consumer so far as packages went, I 
wrongly assumed that I need only worry about Orca. Damn assumptions are 
dangerous things. :)

> GObject Introspection is nice idea, but I cannot imagine this solution with
> current version of speech-dispatcher library...

As I said, either a wrapper library would need to be written, or the current 
library refactored to be GObject based, and properly annotated.
> 
> Suggested "ctype" solution is worst - ctype is good for simple functions,
> but not for something more sophisticated - like get_synthesis_voices().

Interesting to hear, I don't know enough about ctypes to comment.

> >* Moving audio drivers from the modules to the server
> 
> Little upgrade:
> 
> Allow module to use server audio output.
> 
> All your long story of audio problems affects only pulseaudio. For other
> audio systems there are different problems (for example - not working Alsa
> when loaded from dynamically linked library - is this bug corrected in
> Alsa?).

No idea.

> I assume the server audio system will be possible to change rate/pitch of
> synthesized wave (with sonic)...

I don't see why not, worth considering.
> 
> I also have another suggestions, but it's topic for next mail :)

Looking forward to hearing about it.

Luke



reply via email to

[Prev in Thread] Current Thread [Next in Thread]