speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Speech Dispatcher roadmap discussion.


From: Luke Yelavich
Subject: Speech Dispatcher roadmap discussion.
Date: Wed, 8 Oct 2014 18:32:09 +1100

Hey folks.
This has been a long time coming. I originally promised a roadmap shortly after 
taking up Speech Dispatcher maintainership. Unfortunately, as is often the 
case, real life and other work related tasks got in the way, however I am now 
able to give some attention to thinking about where to take the project from 
here. It should be noted that a lot of what is here is based on roadmap 
discussions back in 2010(1) and roadmap documents on the project website.(2) 
Since then, much has changed in the wider *nix ecosystem, and there have been 
some changes in underlying system services, and there are now additional 
requirements that need to be considered.

I haven't given any thought as to version numbering at this point, I'd say all 
of the below is 0.9. If we find any critical bugs that need fixing, we can 
always put out another 0.8 bugfix release in the meantime.

The roadmap items, as well as my thoughts are below.

* Implement event-based main loops in the server and modules

I don't think this requires much explanation. IMO this is one of the first 
things to be done, as it lays some important groundwork for other improvements 
as mentioned below. Since we use Glib, my proposal is to use the Glib main loop 
system. It is very flexible, and easy to work with.

* Assess whether the SSIP protocol needs to be extended to better support 
available synthesizer features

Two questions that often get asked in the wider community are:
1. Can I get Speech Dispatcher to write audio to a wav file?
2. How can I use eSpeak's extra voices for various languages?

We should have a look at the SSIP protocol, as well as the features offered by 
the synthesizers we support today, and determine whether we need to extend SSIP 
to support everything that the synthesizers have to offer. This may require 
changes or additions to the client API, particularly for the wav file audio 
output that prospective clients may wish to use.

* Assess DBus use for IPC between client and server

Brailcom raised this back in 2010, and the website mentions analysis being 
required, however I have no idea what they had in mind. Nevertheless, using 
DBus as the client-server IPC is worth considering, particularly with regards 
to application confinement, and client API, see below. Work is ongoing to put 
the core part of DBus into the kernel, so once that is done, performance should 
be much improved.

Its worth noting that DBus doesn't necessarily have to be used for everything. 
DBus could be used only to spawn the server daemon and nothing else, or the 
client API library could use DBus to initiate a connection via DBus, setting up 
a unix socket per client. I haven't thought this through, so I may be missing 
the mark on some of these ideas, but we should look at all options.

* SystemD/LoginD integration

In many Linux distros today, SystemD is used for system boot and service 
management. Part of this is the use of LoginD for user session/login 
management, which replaces ConsoleKit. The roadmap documentation on the project 
website goes into some detail as to why this is required, but an email from 
Hynek goes into even more detail.(3) Even though he talks about ConsoleKit, it 
is the same with LoginD.

I am aware that some distros still do not use LoginD, so we may need to 
implement things such that support or other systems can be used, i.e if 
ConsoleKit is still being used dispite its deprecation, then we should support 
it also. I don't think Gentoo uses SystemD, so if someone could enlighten me 
what Gentoo uses for session management, I would appreciate it.

* Support confined application environments

Like it or not, ensuring applications have access to only what they need is 
becoming more important, and even open source desktop environments are looking 
into implementing confinement for applications. Unfortunately no standard 
confinement framework is being used, so this will likely need to be modular to 
support apparmor/whatever GNOME is using. Apparmor is what Ubuntu is using for 
application confinement going forward.

* Rework of the settings mechanism to use DConf/GSettings

There was another good discussion about this back in 2010. You will find this 
discussion in the same link I linked to above with regards to 
Consolekit/LoginD. GSettings has seen many improvements since then, which will 
help in creating some sort of configuration application/interface for users to 
use to configure Speech Dispatcher, should they need to configure it at all. 
Using GSettings, a user can make a settings change, and it can be acted on 
immediately without a server or module restart. GSettings also solves the 
system/user configuration problem, in that if the user has not changed a 
setting, the system-wide setting is used as the default until the user changes 
that setting. We could also extend the client API to allow clients to have more 
control over Speech Dispatcher settings that affect them, and have those 
settings be applied on a client by client basis. I think we already have 
something like this now, but the client cannot change those settings via an API.

* Separate compilation and distribution of modules

As much as many of us prefer open source synthesizers, there are instances 
where users would prefer to use proprietary synthesizers. We cannot always hope 
to be able to provide a driver for all synthesizers, so Speech Dispatcher needs 
an interface to allow synthesizer driver developers to write support for Speech 
Dispatcher, and build it, outside the Speech Dispatcher source tree.

* Consider refactoring client API code such that we only have one client API 
codebase to maintain, i.e python bindings wrapping the C library etc

This is one that was not raised previously, but it is something I have been 
thinking about recently. At the moment, we have multiple implementations of the 
API for different languages, python and C come to mind. There are others, but 
this may not be applicable to them, i.e guile, java, etc.

I have been pondering whether it would save us work in maintenance if we only 
had one client API codebase to maintain, that being the C library. There are 2 
ways to provide python bindings from a C library, and there may be more. Should 
we decide to go down this path, all should be considered. The two that come to 
mind are outlined below. I've also included some pros and cons, but their is 
likely more that I haven't thought of.

Using cython:
Pros:
* Provides both python 2 and 3 support
* Produces a compiled module that works with the version of python it was built 
against, and should only require python itself as well as the Speech Dispatcher 
client library at runtime
Cons:
* Requires knowledge of cython and its syntax that mixes python and C
* Requires extra code

Using GObject introspection:
Pros:
* Provides support for any language that has GObject introspection support, 
which immediately broadens the API's usefulness beyond python
* Has good python 2 and 3 support
* Little to no extra code needs to be written but does require that the C 
library be refactored, see below
Cons:
* Introduces more dependencies that need to be present at runtime
* Requires the C library to be refactored to be a GObject based library and 
annotation is required to provide introspection support

My understanding of both options may be lacking, so I have likely missed 
something, please feel free to add to the above.

* Moving audio drivers from the modules to the server

Another one that was not raised previously, but needs to be considered. I 
thought about this after considering various use cases for Speech Dispatcher 
and its clients, particularly Orca. This is one that is likely going to benefit 
pulse users more than other audio driver users, but I am sure people can think 
of other reasons.

At the moment, when using pulseaudio, Speech Dispatcher connects to pulseaudio 
per synthesizer, and not per client. This means that if a user has Orca 
configured to use different synthesizers for say the system and hyperlink 
voices, then these synthesizers have individual connections to PulseAudio. When 
viewing a list of currently connected PulseAudio clients, you see names like 
sd_espeak, or sd_ibmtts, and not Orca, as you would expect. Furthermore, if you 
adjust the volume of one of these pulse clients, the change will only affect 
that particular speech synthesizer, and not the entire audio output of Orca. 
What is more, multiple Speech Dispatcher clients may be using that same 
synthesizer, so if volume is changed at the PulseAudio level, then an unknown 
number of Speech Dispatcher clients using that synthesizer are affected. In 
addition, if the user wishes to send Orca output to another audio device, then 
they have to change the output device for multiple Pulse clients, and as a 
result they may also be moving the output of another Speech Dispatcher client 
to a different audio device where they don't want it.

Actually, the choice of what sound device to use per Speech Dispatcher client 
can be applied to all audio output drivers. In other words, moving management 
of output audio to the server would allow us to offer clients the ability to 
choose the sound device that their audio is sent to.

Please feel free to respond with further discussion points about anything I 
have raised here, or if you have another suggestion for roadmap inclusion, I'd 
also love to hear it.

Luke

(1) http://lists.freebsoft.org/pipermail/speechd/2010q3/002360.html
(2) http://devel.freebsoft.org/speechd-roadmap
(3) http://lists.freebsoft.org/pipermail/speechd/2010q3/002406.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]