Notifications

Clear all

espeak, espeak-ng, pyttsx3, and MBROLA

Page 3 / 4 Prev Next

Help Wanted

Last Post by Robo Pi 4 years ago

47 Posts

5 Users

5 Likes

14.7 K Views

RSS

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-17 6:42 am

Here's a very small program I wrote in Python to use eSpeak. I've created a class called eSpeak with a few methods that can be called. This Python program uses the subprocess module to execute the eSpeak in the terminal command line.

PLEASE NOTE: You'll need to use eSpeak voice names that make sense on your computer. This is just an example code of a few things that can be done.

Here's the main program:

import subprocess as cmdLine
# The following line just clears the terminal window
cmdLine.call("clear")
# Import the eSpeak class from eSpeak.py
from eSpeak import eSpeak

# Choose from various voices
voice1 = "mb-us1"
voice2 = "mb-us2"
voice3 = "alysha"
# Instantiate the eSpeak class with a specific voice.
eS = eSpeak(voice1)  # default = 'mb-us1' (see class __init__)
# Define the speech to be spoken
speech = "Hello, My name is Alysha, I am the customized voice of robo pies robot"

# Call the method that says the speech
eS.say(speech)

# Call the method that prints the -X phonemes
eS.phonemes(speech)

# Call the method that sends speech to filename. 
filename = 'speech.wav'
eS.wavFile(speech, filename)

And the following is the eSpeak class that is also saved as eSpeak.py

# eSpeak utility class for Linguistic AI
# by Robo Pi

# Import subprocess to execute the espeak terminal commands
import subprocess as cmdLine

class eSpeak:

    # __init__ defines the voices to be used.
    def __init__(self, voice='mb-us1'): # mb-us1 is set as default.
        self.voice = voice

    # Speak the text
    def say(self, speech):
        # Define the command line
        command = 'espeak -v ' + self.voice + " " + chr(34) + speech + chr(34)
        # Execute the command in term terminal
        errorCode = cmdLine.call(command, shell=True)
        # Prints any reported errors
        print (errorCode)

    # Prints phonemes using -X  (the -q means quiet no speaking)
    # You still need to assign -v here for the dictionary reference
    def phonemes(self, speech):
        command = 'espeak -X -q -v ' + self.voice + " " + chr(34) + speech + chr(34)
        errorCode = cmdLine.call(command, shell=True)
        print (errorCode)

    # Sends the speech out to a wav file named by filename
    def wavFile(self, speech, filename):
        command = 'espeak -w ' + filename + ' -v ' + self.voice + " " + chr(34) + speech + chr(34)
        errorCode = cmdLine.call(command, shell=True)
        print (errorCode)

Just thought I'd share this in case anyone is interested.

DroneBot Workshop Robotics Engineer
James

M4krD4d reacted

ReplyQuote

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-17 10:03 pm

I'm happy with where I am with eSpeak at the moment.

I have it installed on four computers.
I have the MBROLA voice set up and customized to a place where I can work with it.
I have the dictionaries figured out to where I can use them
I have it set up so I can access it from Python.
I'll even be able to have Python modify the dictionaries on the fly.

So I have the TTS set up for my Linguistic AI project.

I'm ready to move on to Pocket Sphinx for the SRE (or Speech Recognition Engine)

I'll start a new thread for Pocket Sphinx. I'll probably put that in Help Wanted forum too just because I'm not sure where else to put it. Because right now I'm just trying to figure all this stuff out.

Interesting Information about eSpeak:

In my Internet travels I ran across an example where someone was using eSpeak to read the text on signs. Obviously they also needed to have a way to read signs. I think they were using OpenCV for that. Unfortunately I didn't save the link to that information so I don't remember where I saw it. But it was only just an example, not a full tutorial anyway.

None the less, I thought I'd add this information here just so you can know that this TTS system can actually be used to read text written on signs, or even from pages of books, etc. It might be interesting to have a robot who can actually read stuff.

But again, that requires OpenCV and a lot more than just eSpeak. But it's nice to know that this ability is out there to be had if someone wants to pursue it.

I'm moving on to Pocket Sphinx now which is far more complex than eSpeak. So this will most likely take a few more weeks just to get that tool squared away. But then I'll have a robot that can both understand and speak English. 😎

By the way, this can be done in many different languages too. But I'm just sticking with English.

For me both the TTS and the SRE will just be tools that I'll use in my Linguistic AI project. I could have done it with just typing in text and having the robot print out responses to the screen. But think of how much more fun it will be to have a robot that can actually hear and speak English? May as well go the fun route, right?

DroneBot Workshop Robotics Engineer
James

ReplyQuote

codecage

(@codecage)

Member Admin

Joined: 5 years ago

Posts: 1037

2020-04-18 2:41 pm

Yikes! Now you have me going off in still another direction. But first back to the McWhorter Learning AI on the Jetson Nano.

SteveG

ReplyQuote

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-27 12:05 am

Now that I have Pocket Sphinx all squared away I'm back over here at eSpeak again. I'm preparing to start making some videos on eSpeak. However, I'm having so much fun refining what I actually know about eSpeak that I haven't even started making any videos yet. There's so many options available that I've been having fun just exploring those and trying to decide exactly how I want to create the video presentation.

Anyone car to VOTE on which platform I should use to make the videos on eSpeak?

Raspberry Pi 4 - with Raspbian OS
Raspberry Pi 4 - with Ubuntu 18.04 OS
Jetson Nano - with Ubuntu 18.04 OS

Vote now or forever hold your voice! 🥂

Videos that follow may be on the following topics, although not necessarily in the following order.

Installing eSpeak and eSpeak Edit and some simple command line examples.
Introduction to modifying voices and installing MBROLA
Moving on to using Python to control eSpeak
Understanding dictionaries and how to compile them
Putting it altogether as a usable TTS system.

That might end up being 5 videos. Hopefully they'll come out consecutively relatively quickly once I actually get started making them.

If I succeed with the above, my next challenge will be to do the same thing with Pocket Sphinx SRE.

I'll be creating these videos in-part for myself so I can always go back and see what the heck I did because I always forget. 🤣

So which platform and OS do you vote for? (actually it's all pretty similar)

P.S. I'm not going to do this for Windows because Windows has Microsoft Speech Platform (MSP). If I do a video on Windows TTS and SRE it will be on MSP, not on eSpeak or Pocket Sphinx. I will also be using C# over there instead of Python. So if you have an interest in Windows MSP, and C# let me know. But for now I just want to do a video on eSpeak for Linux on SBCs.

DroneBot Workshop Robotics Engineer
James

codecage reacted

ReplyQuote

starnovice

(@starnovice)

Member

Joined: 5 years ago

Posts: 110

2020-04-27 12:10 am

@robo-pi Would there really be much difference among which board you use? If so I would like to see the Raspberry Pi 4. Thank you for thinking of us and being willing to tackle this.

Pat Wicker (Portland, OR, USA)

ReplyQuote

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-27 12:28 am

Posted by: @starnovice

Would there really be much difference among which board you use?

It wouldn't be a lot of difference. Mostly it has to do with where files are stored so various paths may be different. Especially with concern to using the MBROLA voices. They require a bit of set up, but I think they are well worth the effort.

There are also little differences in using the sound systems. Of course this will probably be different for everyone depending on what sound system you are using. I bought some inexpensive USB speakers to use for audio output. I think I had to install Pulseaudio on the Raspberry Pi to get that working properly. Although I don't really remember now. I'm all confused from having worked with Pocket Sphinx now. 🤣

In fact, I'll probably be making a lot of mistakes in the video so I'll have to ask everyone ahead of time to forgive me for that. I'm going to try to start with a basic system card. Although I may already have my sound system installed, and I'll probably just suggest that each person is going to need to deal with their own sound system based on whatever they are using. I'm just going to focus on eSpeak. If they are having sound problems they'll have to work that out for themselves.

Just for the record, here is the inexpensive sound system I'm using: (I use these on both my Raspberry Pies and on my Jetson Nanos). If you have an HDMI monitor with built-in speakers you'll probably be using that.

Logitech USB Speakers

DroneBot Workshop Robotics Engineer
James

codecage reacted

ReplyQuote

byron

(@byron)

No Title

Joined: 5 years ago

Posts: 1122

2020-04-27 11:29 am

Posted by: @robo-pi

Anyone car to VOTE on which platform I should use to make the videos on eSpeak?

Raspberry Pi 4 - with Raspbian OS

Its number 1 for me. Raspbian is tuned for the pi and make it very easy to use the GPIO pins. Who knows you may want to make some eyes sparkle with emotion on a certain tone of voice and trigger a tear drop on others. A little electronic integration will bring the voice to life.

ReplyQuote

codecage

(@codecage)

Member Admin

Joined: 5 years ago

Posts: 1037

2020-04-27 11:30 am

Posted by: @robo-pi

Vote now or forever hold your voice!

Jetson Nano!

Adding in SRE and TTS to the AI platform of my Jetson would seem like a perfect fit. IMHO 😎

SteveG

ReplyQuote

codecage

(@codecage)

Member Admin

Joined: 5 years ago

Posts: 1037

2020-04-27 11:43 am

Posted by: @robo-pi

There are also little differences in using the sound systems. Of course this will probably be different for everyone depending on what sound system you are using. I bought some inexpensive USB speakers to use for audio output.

And what are you using for the microphone? The RasPi has the audio jack with both audio in and audio out, but on the Jetson it would all have to be handled via USB, right? On the "finished" robot (will I live that long) it all has to be included, while in development stage on our benches we don't have to worry too much about size. But I doubt very seriously the Willy Nilly will have his own 4K HDMI screen on board.

I might just use a webcam with built in microphone and a single small speaker like this one.

SteveG

ReplyQuote

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-27 3:57 pm

Posted by: @codecage

And what are you using for the microphone?

You won't need a microphone for eSpeak as it's just TTS (Text-To-Speech). So eSpeak is just the computer talking. No microphone required. But you will need a microphone for Pocket Sphinx which is an SRE (Speech Recognition Engine).

Just to answer your question this is the microphone I bought, but it has doubled in price since I bought mine! It does seem to be a really nice mic though. It also comes with a nice desktop stand.

Moukey USB Condenser Microphone

DroneBot Workshop Robotics Engineer
James

ReplyQuote

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-27 4:02 pm

By the way, those speaker I pointed to earlier doubled in prices since I bought mine too! Fortunately I bought two sets of them while they were cheap. 😊

But now I can't bring myself to paying twice as much, so I'm ordering some of these. These are quite a bit smaller in physical size too, but according to the reviews they have BIG sound. And only $10 a set at the time of this post. I just ordered two sets of these before they double in price!

BeBomBasics USB Speakers

DroneBot Workshop Robotics Engineer
James

ReplyQuote

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-27 4:21 pm

Posted by: @codecage

And what are you using for the microphone?

By the way, when we do move on to Pocket Sphinx having a good quality microphone can be paramount to getting accurate translations. I find that Pocket Sphinx gets most of what I say close to 100% if I speak directly into the microphone from about 6 to 12 inches away. But if I set the microphone off to the side the accuracy drops quite a bit and it gets a lot of words wrong.

Again, this has to do with Pocket Sphinx (the SRE) not eSpeak (the TTS). eSpeak doesn't use the microphone at all. It's strictly an output application.

The other thing to note about about Pocket Sphinx (the SRE) is that, while it has a huge dictionary of words, its dictionary does not contain all possibly words. I've discovered (by looking through the dictionary) that the reason it gets some words wrong 100% of the time is simply because those words aren't in the dictionary.

If a word isn't in the dictionary then Pocket Sphinx will never recognize it.

Same thing for eSpeak. It can only speak words that are in its dictionary too.

And eSpeak and Pocket Sphinx have their own separate dictionaries. While they have a lot in common they are actually significantly different and not interchangeable. This can cause a lot of confusion if a person doesn't keep eSpeak and Pocket Sphinx clearly seperate in their mind. I find myself tying to do things to Pocket Sphinx that can only be done in eSpeak and vice-versa. So it can become confusing. They both have dictionaries, and dictionary tools. But they are totally separate and different from each other. So that's something to be aware of from the beginning.

The first set of videos are going to be on eSpeak (the TTS). Only after I produce those will I move on to Pocket Sphinx (the SRE).

I don't think I'll cover vosk (other SRE) at all. If I do it will be a really short video just explaining how to use it. Trying to mess with its dictionary requires quite a bit of explanation, and cannot be done on the fly. So I won't be using vosk for my applications. In my applications I want to be able to have my Linguistic AI program update its own dictionaries programmatically on the fly. Both eSpeak and Pocket Sphinx are well-suited for this purpose. So that's the story on that. I need quick as easy access to dictionary modifications. So eSpeak and Pocket Sphinx win the day! 👍

The vosk SRE may be of more interest to others who aren't concerned with being able to modify the dictionary. Vosk does decode speech a bit faster than Pocket Sphinx, and it also does a better job when it comes to accuracy of translation. But modifying the dictionary requires a Ph.D. in Kaldi (the underlying SRE that vosk is actually using). It also doesn't lone itself to being easily modified programmatically.

I could have had these videos made by now if I wasn't addicted to posting TMI. 🤣

I better get to work on making these videos.

DroneBot Workshop Robotics Engineer
James

codecage reacted

ReplyQuote

codecage

(@codecage)

Member Admin

Joined: 5 years ago

Posts: 1037

2020-04-27 4:22 pm

@robo-pi

My Logitech Webcam has a built-in mic, so may just try that first. But I did notice that the camera went from $36.00 to $140.00 or so! Price gouging big time! I know everyone is scrambling for a webcam but this price increase is beyond believable. Just ordered another of the speakers I noted in my earlier post. It hasn't gone up in price yet and looks like it will be delivered this Wednesday.

SteveG

ReplyQuote

Robo Pi

(@robo-pi)

Robotics Engineer

Joined: 5 years ago

Posts: 1669

Topic starter 2020-04-27 4:32 pm

Posted by: @codecage

But I did notice that the camera went from $36.00 to $140.00 or so! Price gouging big time!

I noticed this too! I imagine this has to do with so many people being stuck at home and wanting to upgrade their social media capabilities. A lot of people are probably turning using things like Skype (or whatever the hottest new video call media is today).

I'm thinking also that there is going to be a lot of price gouging going on. Especially on any items that are selling like hotcakes.

I want to pick up two more Raspberry Pies. That will be a total of 4 of them. I just like having systems set up at my fingertips where I can go in and just work on projects right where I left off. I have the space to set all these computers up so why not?

I've been picking up these Raspberry Pi 4's with 4GB of memory, a case with heatsinks and a fan, and they even come with a 32GB SD card. Basically ready to go save for keyboard, mouse and monitor. All for $99.

I hope they don't go up before I order them. It will be nice having 4 Raspberry Pies! I LOVE IT!

Plus I can always use them in robotic projects in the future! That's the main attraction. They serve as desktop developers for now, and as robot brains later. Can't beat that! 😎

DroneBot Workshop Robotics Engineer
James

ReplyQuote

codecage

(@codecage)

Member Admin

Joined: 5 years ago

Posts: 1037

2020-04-27 4:45 pm

@robo-pi

With all the hard work you have already put into the effort of choosing a SRE, I think I'm happy to follow in your footsteps down the Pocket Sphinx road! 👍

Now if I could just figure out the issue of not being able to see the SD cards on my Windows machines after writing the NVIDIA images to them! By the way, I did go thru the document you sent a link to, but it didn't really provide a solution in my case anyway.

I had just finished making myself a customized setup of NVIDIA Ubuntu on to a 16GB SD card to be used as a starting point for future variations, when Lesson 36 of Paul McWhorter's Learning AI on the Jetson comes out! Now I'm making a customized version of JetPack 4.3 on a 32GB SD card (actually a microSD card and I only had the one 16GB).

Do you any insight into if you trained Pocket Sphinx new words using the microphone where you get 100% understanding if it then can do a better job of recognizing that same word now spoken with the microphone not in that optimal position?

SteveG

ReplyQuote