Notifications
Clear all

espeak, espeak-ng, pyttsx3, and MBROLA

47 Posts
5 Users
5 Likes
14.6 K Views
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

Can you give a very abbreviated set of steps as where to start, like what to download and install and in what order as if you were starting over and wanted to skip over all the wrong turns you made on your initial journey.

I can try to do that, but it's actually quite complicated.  This is why I wanted to explain it in a series of videos.   It can also be different on different machines and platforms in subtle ways.   Last night after I made my previous post I installed it on three different platforms from scratch.

  • Jetson Nano - Ubuntu OS
  • Raspberry Pi 4 - Ubuntu OS
  • Raspberry Pi 4 - Raspbian

All three of these have slight differences in precisely how things need to be installed.  In fact, I'm trying to choose which system I want to use to make the video tutorials on.

Posted by: @codecage

Not really happy with sound of the voices, so thought I'd look into the Mbrola addition, but all of the links to Mbrola in the espeak documents give me a "Bad Gateway" error.

Yes,the stock eSpeak voice is not very good unless you like the sound of Stephen Hawking's voice.  And even that is a modified eSpeak voice.  (or something very similar)

There are many different ways to modify the eSpeak voice and I was going to go into that in the video.   However, none of them produce a voice that I was personally happy with.  Thus the move forward to the MBROLA voice.

Installing an MBROLA voice is quite involved.  But well worth it.  The MBROLA voice sound MUCH BETTER!  Also, MBROLA is just a voice, not a phoneme system.  So you still need eSpeak for the actual speech synthesizer and translator from TTS.

Some other things to consider:

There may be other TTS engines that are far easier to install and use, as well as having better voices too.  The reason I'm not interested in those for the following reasons:

  • They are dependent upon an Internet server overhead.
  • They are less configurable in terms of tailoring how they actually pronounce individual words.
  • They don't loan themselves for easy use in AI systems based on grammar and semantics

However, for someone who just wants an easy and quick system to get their robot to talk they might be easier to use for that purpose.   They might also want to just use commercial chatbots like IMB Watson, or Alexa, etc. so their robot basically becomes an IBM Watson or Alexa and they won't need to do anything at all in terms of trying to program the robot to understand and speak anything.

So depending on what the final goal will be can be a large consideration in how you want to go.

For my purposes of wanting to build a Linquistic AI system from the ground up eSpeak is the perfect foundation to start with because of the dictionary and language rules files.   It's also completely open source and independent of anyone's Internet server.   It's a program that is solely under my control.  Plus it's designed specifically for use with limited computing resources such as SBCs.

I'll try to make a couple quick  posts to explain the installation of eSpeak and how to modify the eSpeak voice.  But this will be a very limited explanation.  It really deserves an in-depth video explanation.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

To Install eSpeak:

This if for the original eSpeak, not eSpeak NG.  I don't plan on using eSpeak NG until I can figure out why I would actually need it.

To begin with if you are on a Linux machine it will most likely already be installed.  To find out you can just type in the following command:

espeak --version

If it is already installed it should report something like the following:

eSpeak text-to-speech: 1.48.03

It may also include a path to where it was installed.

To make it it speak simply type in the following:

espeak "The text you want it to say"

It should then speak the text.  If you don't hear anything you may need to check your audio output devices and make sure they are set as the standard output for the OS. 

Both the the sound of the voice and precisely how it says each word is fully configurable but that goes far beyond the scope of this post. 

on the command line you can specify a voice like so:

espeak -v en "the text you want it to say"

Although this may not change the sound much as it probably uses en as a defaul.

But now you can change the pitch of the voice, or frequency using +f1 thru +f5 immediately following the voice name with no space between the voice name and the +

espeak -v en_f1 "the text you want it to say"

upto

espeak -v en_f5 "the text you want it to say"

If you go higher than 5 it defaults to the original frequency 

You can also control the speed it speaks by using -s n where n is an integer . I think the default is 175 and you can't go much lower than 100 from the command line.  However, there are further options to control the voice that are beyond the scope of this post.

espeak -v en+f5 -s 100 "the text you want it to say"

espeak -v en+f5 -s 175 "the text you want it to say"

espeak -v en+f5 -s 200 "the text you want it to say"

To gain more control over the eSpeak voices you should ultimately do the following:

Find where it installed the espeak-data folder and copy the folder over to you home folder.  Espeak will look in your home folder for the espeak-data folder first.  So anything you change within the folder here will change how eSpeak speaks.

Within the espeak-data  folder there is a folder named voices.  Those are text files that you can modify to have even more control over how eSpeak speaks. 

The following wep page gives detailed information on what parameters you can play with to make a voice sound differently.

http://espeak.sourceforge.net/voices.html

MBROLA voices

Unfortunately, you'll probably never be truly satisfied with the standard eSpeak voice.  So you'll most likely want to move up to using MBROLA voices. 

Fortunately the MBROLA voices sound MUCH BETTER!

Unfortunately, they require quite a bit to install and get running properly.  I might make a post on how to install and use an MBROLA voice a bit later.  It's well worth the effort, as the MBROLA voice sounds far more human and less robotic. 

Edited to Add eSpeak Edit:

In addition to how the voice sounds, you can also modify precisely how it speaks individual words.  But this requires modifying the lang_list file and recompiling it into a lang_dict file.  Far beyond the scope of this post. 

There is also a companion program for espeak called espeakedit.

I believe it can be installed with the following command:

sudo apt-get install espeakedit

Then it can be launched by simply typing in espeakedit at the command line.

It's not a well-documented program and to explain it would require a lot.

However, I'll offer a couple tips to get you into trouble.

Choose the Text Tab

Type the word you want eSpeak to say in the top text windowl

Then click on Translate button at the bottom

It will translate your word into phonemes and display the phonemes in the bottom window. 

Then click on the Speak and it will say the word.

It will also then display a Prosody graph detailing precisely how eSpeak is speaking the word.   If  you right-click on the graph you will be given options to fiddle with it. 

To take full advantage of the espeakedit program you need to learn about the dictionary files.  And that's a whole other story.  But I thought I'd just toss in the espeakedit utility here to give you an idea of just how much control espeak has to offer.  You have total control over everything.  So it's an awesome package if this is the type of thing you are interested in.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1037
 

@robo-pi

Waiting patiently in my sequestered status for an explanation/video on the subject.  Kudos to you for delving into this. I have espeak and espeakedit both installed and have used to command_line espeak to listen to it speak common words and sentences that I type in. Like "Now is the time..." or "The quick brown fox..." and even "ryryryryryry" that the TTY guys will get a kick out of.  The best sounding of the canned voices at the moment seems to me to be the "en-us+m6" voice.  But they all seem to pronounce my wife's name exactly like the bluetooth client in my truck does when interfacing to my iPhone.  Her name is Puddin, pronounced like "pudding" but leaving off the "g" sound.  In the "en_list" file I found "pud   pUd" so I'm guessing the uppercase "U" is what makes it sound like the first syllable of "puddle.

I have created an "en_extra" file with "puddin" in it, but I'm unsure how to create the phonetic string that goes with it.  And then how to compile and update the "en_dict"

Again, many thanks for opening new doors to occupy my time while this pandemic drags on!

 

SteveG


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1037
 

@robo-pi

At the moment I'm playing on my Windows machine with espeak!  I will be checking to see if the RasPi I've been using for McWhorter's series on AI has espeak already installed, and if not, getting it installed.

Now looking forward to a post on getting the MROLA voices installed.   The main thing being where do I find them as the links in the espeak documents lead me to dead ends.

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

At the moment I'm playing on my Windows machine with espeak! 

By the way, the voice may sound different on a Windows machine.  I haven't done it on Windows yet.

Posted by: @codecage

I have created an "en_extra" file with "puddin" in it, but I'm unsure how to create the phonetic string that goes with it.  And then how to compile and update the "en_dict"

Here's what I've been doing.   Open espeakedit.   Type in puddin in the text window.  Then click on Translate.  And have it speak the word.  If you like the sound then these are the phonemes you can use in the dictionary list file.

If you don't like the sound then go back up to the top text window and try different spellings of the word "puddin"  In fact, try spelling it in ways that you think it should sound.  Then click on Translate and see the new phonemes. Then click on Speak and see if it sounds the way you like it.  If so, then copy those phonemes to be used in the dictionary list file.

Unfortunately setting up to compile a new dictionary is a bit involved.  Plus you'll want to create your own language name so you aren't writing over the original languages. 

What you need to do is find the lang_rules file and the lang_list file.  where "lang" is the name of the language.  For example, en_rules and en_list.    Then copy those to a new language name.  It doesn't need to be two letters  it can be a whole word.  Like codecage_rules and codecage_list.   Then you can modify codecage_list with your new word-to-phoneme definition.  save it. and then compile a new dict file like so

espeak --compile codecage

This will then create a new codecage_dict file based on your rules and list files.

Then use

espeak -v codecage "Hello Pudding" 

or

espeak -v codecage "Hello Puddin"

In fact, in your dictionary you can the words "pudding" and "puddin" pronounced differently.

Anyway, the possibilities are endless.

I'll try to explain how to install an MBROLA voice a bit later.

Edited to add:

I forgot, in order for the above to work you also need to create a codecage voice file.  And have that file point to the codecage language dictionary.

See, it's complicated and you need to have all the details.  This is why I'd like to do this in a video.  But as you and I both well know.  That's not going to happen any time soon.  Even if I was working on the video right now, which I'm not, it would most likely take a week or more to even produce it!

So the video isn't likely to be out anytime soon.

Anyway, I'm sure you'll figure it all out.   But it can be like pulling teeth!  There's just very little information out there on how to do all this stuff and sometimes you just need to figure it out on your own.

I'll try to explain an MBROLA installation.  In the meantime I need to do some real world work around here.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

Now looking forward to a post on getting the MROLA voices installed.   The main thing being where do I find them as the links in the espeak documents lead me to dead ends.

I'll post the following from my notes.  Hopefully this will help some.

(but this was all done on Linux)

First install the C compiler if not already installed.

sudo apt-get install git make gcc

Clone MBROLA from github

git clone https://github.com/numediart/MBROLA.git

cd MBROLA

make

sudo cp Bin/mbrola /usr/bin/mbrola

sudo mkdir /usr/share/mbrola

sudo mkdir /usr/share/mbrola/us1

Then download us1 voice (or the voice of your choice) from:

https://github.com/espeak-ng/espeak-ng/blob/master/docs/mbrola.md

And copy that voice into the /usr/share/mbrola/us1 directory.

(note all of the above is for Linux, not sure how that translates over to Windows)

On windows you might be able to just install mbrola.exe directly without having to make it.

In any case, if you succeed in getting the mbrola voice installed correctly you and then access it using 

espeak -v mb-us1 "Hi there Puddin"

Hopefully that will work.

By the way us1 is a female voice. us2 and us3 are male voices.

 

Hopefully this will get you going down the correct path.

The idea is that you need to first install mbrola.exe first and then download the mbroal voice to the correct directory.

GOOD LUCK! 

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1037
 

@robo-pi

Many thanks again!   Move on and get the items done that are demanding your time and I'll just plod along learning all the way.   Sometimes we learn the most that way anyway, especially in the long run.

SteveG


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1037
 

Guess I may try it on Linux.  All of the Mbrola links give me a "Bad Gateway" error message, and the only thig I seem to come across on GitHub seems to be Linux related.

Get on with your other work!

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

All of the Mbrola links give me a "Bad Gateway" error

I  had the same problem and couldn't find any decent help on the Internet!  I don't even remember exactly how I cracked the case, but I finally did, and it was well worth it.  The MBROLA voice sounds far more human.

I've tailored the voice to sound like a small female child.  Which is how I view my robot.  So it was well worth the effort.  I'm starting on a huge journey of Linquistic AI so it's good to have a robot voice that I'm really pleased with from the get go. And the MBROLA voice filled that bill.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1037
 

@robo-pi

I've attached 3 WAV files. Number 1 is the espeak "en" voice, number 2 is actually me recorded using Audacity saying just "Puddin" the way I, and just about everyone else, pronounces it, and number 3 is me again saying "Puddin" (using my pronunciation) then "not" followed by "Puddin" (using my best mimicking of  espeak's pronumciation).

 

Hope I haven't gone overboard!

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

Yes, there are ways to change the way eSpeak says things.    By using espeakedit you can figure out what phonemes to use,  and then go into the dictionary list file and add that association.

I had the same problem with Alysha.

Espeak would say is as a-lish-a

But I wanted it to be prounounced a-lee-sha

so I went into espeakedit and messed around until I got espeak to say a-lee-sha.

Then I used those phoneme in the dictionary file to define how espeak should say alysha.

And now it says it the way I like.

~~~~

Some people might argue that it would be too cumbersome to have to go in and define precisely how we would like eSpeak to say each and every word.   This is true.  Especially for someone who just wants to have a quick speaking robot and doesn't want to get involved with having to teach it how to correctly pronounce words.

However, for me this is not a problem at all since I'll be having my robot learn each word individual as it grows.  It will also be able to modify and update its own dictionary files.   So I'll be able to teach my robot how to correctly speak words just like I would teach a human child.   I would say the word to the robot (which it would be able to hear and analyze via the SRE called Pocket Sphinx.   And then repeat the word back to me, just like a human child would.  Then I would either tell the robot that this is correct, in which case it will update its own dictionary (just as a human child would), or it would try again until it finally says it the way I like.  Only when I give my approval will it update it's dictionary.  Just as we do with our human children.

So this is how I envision this working as the Linquistic AI project continues.

Of course there's far more to it then this.  The robot will also be asking me what these words mean, and updating it's semantic dictionary as well.  The semantic dictionary has nothing to do with eSpeak.  The semantic dictionary is part of the Linguistic AI program.

So the robot will be able to learn much like a human child, both how to correctly pronounce words (based on parental approval), but also how to learn what the words actually mean.  So the robot will be quite inquisitive about new words and concepts that it learns.  Just like a human child.

In fact, I'm actually modeling this after programs designed to teach young children how to speak and learn language. 

So eSpeak, while a powerful TTS, is only a very small part of my Linguistic AI program.   But I needed this powerful TTS at the foundation of my Linquistic AI project.

Of course there is also the SRE (speech recognition engine) too.  That will be Pocket Sphinx I hope.  Like eSpeak, Pocket Sphinx has many of the same features.

  • Independent of Internet servers
  • Designed for compact systems such as SBCs
  • Totally programmable with its own dictionary files too.
  • etc.

In my Linquistic AI program eSpeak, and Pocket Sphinx will be working together in terms of having their dictionaries constantly updated as the robot learns new words.

So I'm real excited about this whole Linguistic AI project.

Unfortunately Pocket Sphinx is lacking good tutorials as well.  So I'll need to struggle through learning exactly how to interface efficiently with pocketsphinx too.   But it appears to have the same sort of flexibility as eSpeak.,  So once again, Pocket Sphinx looks like the perfect tool for my Linguistic AI project.

 

 

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
(@starnovice)
Member
Joined: 5 years ago
Posts: 110
 

Does espeak have any vocabulary to start with or is everything from scratch?

Pat Wicker (Portland, OR, USA)


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @starnovice

Does espeak have any vocabulary to start with or is everything from scratch?

It comes with a gazillion words already associated with specific phonemes.

However, you may not be in personal agreement with the person(s) who defined that particular pronunciation.   So it's nice to be able to go in an modify things to suit your own taste if you so desire.

I actually went into the dictionary list and basically erased all the words they had defined.  So I have a dictionary list that's starting from scratch.  But that's precisely what I wanted for my specific project.

Plus I always have the original dictionary list to go back to if I should desire to use it.

In fact, you can switch between dictionaries on the fly by simply creating different "voice files" that point to different dictionaries.  So there's extreme flexibility here.

More interestingly.  You can totally empty the dictionary list of words entirely.  You'll still need an empty dictionary list file in order to compile a new dictionary.  But the point is that eSpeak can still speak pretty darn well,  just using the dictionary rules file alone.  No need to define any individual words.  Although when done this way it may speak some words quite weirdly.  But for the most part it does amazingly well just using the rules with no specific words defined at all.

So even when you empty the dictionary list it can still talk.

By the way, I didn't mention this earlier but there's more options for the espeak command line.

For example:

espeak -x "hello world"

Will not only say the words but it will also print out the phonemes it used:

h@;'oU w'3:ld

If instead of a  lowercase "x" you use an uppercase "X" it will tell you whether it found the word in the dictionary list, or created the word from scratch using which rules

For example:

espeak -X "hello world"

will produce the following message:

Found: 'hello' [h@;'oU] <----  meaning that it found 'hello' in the dictionary list

Translate 'world' <----- meaning that it had to translate the word 'world' from scratch.

Then it lists all the rules it used from the rules file to translate the word 'world' into phonemes.

It's an amazing piece of software I think.

I suppose I should add the following thoughts for whatever they might be worth:

It might potentially be faster when speaking large sentences or paragraphs if most of the words are FOUND in the dictionary.  The reason for this is because it always scans the dictionary list first and uses the defined word if found.  Only then does it go over to the RULES to translate a word from scratch. 

So there may be some benefit in having words that you want eSpeak to say already listed in the dictionary.

And of course a dictionary list will be extremely helpful if you want eSpeak to say specific technical jargon precisely as you would like to hear it.   For example, if you want it to speak a lot of technical medical or biological terms, etc.

The nice thing is that this system allows you to do this.  You can even have your robot chose specific dictionaries based on the topic of a given conversation.  In fact, that's definitely part of what I'll be doing with my Linguistic AI project.   After all, we humans do this too.  We often choose our vocabulary based on the context of the topics we are discussing.   So why shouldn't a robot do the same?

So yeah, you can have your program change dictionaries on the fly.  I'll definitely be taking advantage of that as my Linguistic AI program evolves.

There's no need to be permanently tied to a single dictionary.  Or even a single voice attribute for that matter. 

The flexibility is as free as your imagination. 😊 

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

A Plea for Help!

This was supposed to be my help thread asking others to help me! 🤣 

So now that we have a few interested grad students, how about looking into the following:

How to use the Spect Tab in eSpeak Edit?

I have no clue.  Anyone want to play around with that and see what you can find?

I haven't been able to find any tutorials on that yet.   But there are these documents (which I haven't yet read)

eSpeak Edit Program

User Interface - Formant Editor

Analysis

I haven't yet read these documents, or really tried to experiment with the formant editor yet.   So if you find anything interesting let me know.

I want to move on to the Pocket Sphinx SRE.  I've already spent too much time on eSpeak. 🤣 

But I would like to better understand how to use the eSpeak Edit program.  I'm sure I'll learn it eventually.  But if there are any ambitious "Grad Students" real or imagined, now's your chance to chip in. 😎 

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

Here are some short audio clips of an MBROLA voice.  It's not the greatest, but it's better than the standard eSpeak voice.

This is the voice raw (all default settings).  It's supposed to be a female voice.

This following is my customized MBROLA voice for Alysha.  I increased the pitch and slowed down the speaking rate to make her sound like a little child.

It's not the best sounding voice in the world, but it will do for my project.

Here is what she is saying in case you can't understand the speech.

"Hello, my name is Alysha, I am the customized MBROLA voice of Robo Pi's robot."

Anyone else have any luck with the MBROLA voices?

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Page 2 / 4