Re: making Linux a hospitable place for TTS engines like Voxin

OK, I am convinced you know what you're doing, and I agree with changes you suggest. I get to work on a11y projects as my 20% time project. That gives me about 10 hours per week that I can offer, if you are willing to direct my efforts. I agree espeak is lower priority, since it just works everywhere today. I personally use voxin, which I have not been able to get working on the testing version of our internal Linux distro. I would like to make that the highest priority.

I see Gilles Casse submitted changes in August. Is he active on this list? sd_voxin returns 21 when I execute it manually, and never reads from stdin. I haven't been able to find all the source code in his github repo to rebuild it myself. If Gilles is around and can suggest how I might be able to get sd_voxin working, I could probably keep several blind programmers productive in the near term. Alternatively, I can set them up with my meta-module, but I'd prefer to get sd_voxin working and then work to make it binary portable.

I am also interested in adding a module for MaryTTS, which I find to be a nice TTS engine, much nicer than picotts IMO, and while it takes a non-FOSS toolkit from Microsoft to train the HMM, the tools are free as in beer, and it is easy to add new languages and voices to MaryTTS. I managed to get it working with Speech-Hub for NVDA several years back, and MaryTTS has improved since then. A reason it has not yet been integrated may be the non-FOSS tools needed for training the HMM, but I'm not sure. These binary blobs need to be installable somehow. We can make it as simple as copying a directory containing MaryTTS to the right directory on the target Linux machine.

Thanks, Samuel.

Bill

On Mon, Dec 21, 2020 at 10:59 AM Samuel Thibault <samuel.thibault@ens-lyon.org> wrote:

Bill Cox, le lun. 21 déc. 2020 08:46:29 -0800, a ecrit:
> I don't mean to disparage the current implementation of modules like espeak.c,
> which has much improved over the years. However, it is simply not portable at
> the binary level between Linux distros.

Well, yes, sure, that was never meant to be, and it's not usual for
Linux binaries to be portable across distributions.

> Just run ldd on sd_espeak:
[...]

Note that ldd also shows the subdependencies. To see the real direct
dependencies only, use

objdump -x sd_espeak | grep NEED

which on my debian shows

NEEDED libespeak.so.1
NEEDED libsndfile.so.1
NEEDED libdotconf.so.0
NEEDED libglib-2.0.so.0
NEEDED libc.so.6
NEEDED libltdl.so.7
NEEDED libpthread.so.0

The rest that you see are subdependencies of those libraries, not of
sd_espeak itself.

> Binary portability appears to have been a non-goal in
> speech-dispatcher.

Just like in almost all free software projects, since when you have the
source code you can just recompile it to get things working.

> Is there any chance I can contribute code to speech-dispatcher to
> fix this?

I started having a look at making it easy to write a speech dispatcher
module that doesn't use dotconf and glib.

Then by extending the protocol to allow server-side audio, that'd
drop libsndfile and libltdl as well, and even libpthread when the
module doesn't need it for itself. We're then left with the actual
synth (libespeak) and libc. I have pushed what I have so far in the
main-loop branch. I basically reimplemented the module protocol with
an MIT licence, which allowed to separate out the protocol parsing from
the dotconf parameter management etc. The idea being that proprietary
modules can link against that implementation to make it easy for them to
create a speech-dispatcher module.

> Also, link in libraries statically, such as espeak.a,

Most proprietary modules will not allow this with the current code which
is GPL. That's why I started rewriting the basis of modules with an MIT
licence, which will allow such linking.

> I think my prefered approach is to start with 1), and migrate eventually to
> 2). That way, users who need binary portability can start to benefit in the
> near term while the more complex tasks in 2) are implemented.

That can be an interim solution, yes.

> Espeak would be simpler, but some of the other engines that don't
> use the new module_utils_* code need a major rewrite.

I don't think they would need a complete rewrite, that can probably be
done progressively.

> If folks feel binary portability is a nice goal, but not worth the price (e.g.
> not being able to use glib),

modules that are shipped with speech-dispatcher don't pose portability
problems, we can let them use glib etc. For those modules that want
portability, we just need to give them an easy way to do so, that's what
I have been working on.

> I think we could move the audio queue into the speech-dispatcher
> daemon without making things more complex.

It's not completely that obvious: one still has to transfer the audio
from the module to the server. Not something impossible, but that's
still some additional complexity :)

> I am confused about what pitch range is for.

It is the range of pitch that the synth can use to express prosody. That
can be called "expressiveness", which is independent from the desired
pitch base.

> I don't think that quite works right now. For example, the BEGIN message is
> not sent to Orca until module_speak returns,

Yes, that's one of the things I plan to fix.

> > > and makes these binarys specific to not only the distro, but the
> > > distro version.
> >
> > That, however, is a very convincing argument. Making it simple for
> > vendors to just ship a binary to a known place, whatever the distro and
> > version, can simplify things a lot for them.
>
> I am relieved to hear you say that.

Well, I don't think I ever saw that argument raised before, and thus why
it never showed up as a goal of speech-dispatcher.

> A TODO for me is to look into sandboxing these shady binaries from TTS vendors
> :)

That could be useful indeed.

> I would be interested in the task of making sd_espeak binary portable,

I don't really understand why focusing on sd_espeak, which is shipped
with speech-dispatcher. I understand that this can be an interesting
testcase, to make sure that things work, though.

> > Anything I would have forgotten?
>
> Ha! We will only know what we forgot when we write the code!

Sure :)

But better ask for opinions before starting writing the code, to avoid
mistakes when we can :)

Samuel

From:	Bill Cox
Subject:	Re: making Linux a hospitable place for TTS engines like Voxin
Date:	Mon, 21 Dec 2020 18:21:48 -0800