Notifications
Clear all

Pocket Sphinx Speech Recognition on the Raspberry Pi 4

8 Posts
2 Users
6 Reactions
4,247 Views
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

Pocket Sphinx (Part 1): Installation

Here's my first video on Pocket Sphinx for the Raspberry Pi. This thing is a monster to install. Plus I can't even be certain that I'm doing it in the best way. All I can say is that it seems to be working.

NOTES:

This just gets the thing installed and you can test if from the command line. But running it from the command line isn't very much fun. I'm hoping to have the Python video out soon.

Also, be sure to check out the description under the video. It contains all the command line text in case you just want to copy and past it.

 

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
Quote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

I'm jumping ahead.  I have Pocket Sphinx going GUI already. 😊 

And so far it's doing a really great job at understanding what I'm saying.

The key is to speak very clearly into the mike and pronounce each word individually.  I get very close to 100% accuracy when I do that.  So I think this is going to work quite well for me to enable me to converse with Alysha in a meaningful way. 

DroneBot Workshop Robotics Engineer
James


   
codecage and huckOhio reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

This is a quick video just to show some of the progress I'm making.  If I actually just stick to programming and quit making videos I might actually get my Linguistic AI project done before I die. 🤣 

 

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

I stayed up all night again playing with my dictionaries. 🤣 

It's fascinating stuff.   I broke the Pocket Sphinx dictionary up in alphabetical order by letter.  There's approximately 7000 words for each letter.  Give or take a couple thousand depending on the popularity of the specific letter.  about 135 thousand words already defined with phonemes.

By the way the sounds in Pocket Sphinx are also called "phonemes" but they are using a different type of phoneme character set.   It would  be nice if I could get eSpeak and Pocket Sphinx to use the same type of phonemes.   This may yet be possible.  I think eSpeak as the option of using different types of phonemes so I'll need to look into seeing if I can get them both on the same page with phonemes.   If I can do that I may be able to have them both use the exact same dictionary file which would be really nice.   Except eSpeak allows for the addition of rules, while Pocket Sphinx had no need for rules.   So maybe keeping them separate will be better.   I will set up a system where they both contain precisely the same words.  That more realistic. In the real world a human typically knows both how to say a word and recognize it when it's spoken.  So Alysha should do the same thing as she learns new words.

Of course there will also be her "AI Mind Dictionary" which will be quite sophisticated.  It will be constructed of several multi-dimensional numpy arrays.   I can hardly wait to get to that point.  Thus my loss of sleep trying to get the program sorted. 😊 

I started working on this last night about 9:00PM thinking that I would work on it for a couple hours and then go to bed.  But before I knew it the sun had come up.  Time flies when you're working with dictionaries! 😎 

In any case, things are coming along really well thus far.   I learned more tricks with Pocket Sphinx.  I learned how to make it respond much faster.  Pretty much instantly.   The original code I was given as an example was setting up the entire system every time it started to listen.  I broke the code up so that I only need to have it initialize once in the beginning, and then I can have it re-listen using the original configuration.  That improved the response time significantly.

I also reduced the size of the dictionary to just a few words, and that improved the speed dramatically too.   A smaller list of words to look through results in finding results much faster.

One thing to note, is that Pocket Sphinx is going to always return something.   In other words, if it doesn't find a match, it's going to return the closest match it can find in the dictionary.  Or to put that another way, it's never going to return a "word not found flag".  It will always return something from the dictionary.

So it will be up to Alysha's AI mind to decide whether or not what she hears makes any sense to her.  So I'm looking forward to start working on that. Unfortunately I still have much to do with both the eSpeak and Pocket Sphinx dictionaries.  A lot more midnight oil will be burnt before I can move on comfortably to the actual AI project.

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

Shame on me!   I'm on a roll here and I can't stop programming!   And I need to work on my car to get it ready for inspection.

Anyway, I have interesting news to report. 

Pocket Sphinx uses only 39 phonemes all are represented by capital letters with no special character. 

eSpeak has almost twice as many phonemes that include all manner of special characters. 

However, I have found that I can make a direct correspondence between the 39 phonemes of Pocket Sphinx and 39 eSpeak phonemes.   And that seems to be sufficient.  The additional eSpeak phonemes are basically overkill.  Apparently they are intended as an attempt to give more flexibility in the way eSpeak can pronounce words.   But I'm discovering that they are basically overkill as the 39 phonemes that Pocket Sphinx uses seem to be sufficient. 

Here are a few words that I was able to directly correspond between Pocket Sphinx and eSpeak.

dictionary  D IH K SH AH N EH R IY   <- Pocket Sphinx phonemes.
dictionary d I k S V n 3 r i <- eSpeak phonemes.

good-bye G IH D B AY <- Pocket Sphinx phonemes.
goodby g U d b aI <- eSpeak phonemes.

frankenstein F R AE NG K AH N S T AY N <- Pocket Sphinx phonemes.
frankenstein f r a N k @ n s t aI n <- eSpeak phonemes.

In all the cases above, I had to change up the eSpeak phonemes that had originally been assigned to these words to better match up with the Pocket Sphinx phonemes in a directly one-to-one correspondence. 

But what I've discovered is that after I did that eSpeak was actually pronouncing the words better!

So now I can set up a system that reduces all eSpeak phonemes to only a pre-selected 39 phonemes that I will correlate with the Pocket Spinx phonemes directly. 

This should work out to be a very nice system as now Pocket Sphinx and eSpeak will essentially be "on the same page" when it comes to phonemes. 

They will still technically be using two different phoneme systems.  But since there will be a direct correlation between the two systems, it will basically be as if they are the same system.   This way I don't need to try to modify either eSpeak or Pocket Sphinx.   All I'm doing is using a sub-set of the eSpeak phonemes, only 39 of a previous set of about 48.   So I'll basically be ignoring about 9 phonemes that eSpeak has to offer.    They'll still be available for eSpeak, I just won't use them.  As I say they are basically overkill anyway.  They are so subtle changes that they simply aren't required. 

So these two systems are going to turn out to be very easy to use together.  Alysha will be able words to the Pocket Sphinx dictionary directly from her eSpeak dictionary including creating the necessary Pocket Sphinx phonemes.  

In short, Alysha (my AI program), will only need to maintain the eSpeak dictionary and that can then be used to automatically generate the companion Pocket Sphinx dictionary.  This will make building this system so much easier. 

So, I'm THRILLED!

Now you can see why I'm not working on my cars as I should be. 🤣 

I just had to get this sorted out so I could have peace of mind. 

Now I can go back to working on cars again.

Pocket Sphinx and eSpeak will work perfect together!  Yippee! 

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1046
 

@robo-pi

 

Maybe its time for a road trip for a few of us to take on some of these chores you can't find the time to get accomplished.  We want to keep you at the programming and learning grindstone!  Any car mechanics or yard gurus out there ready for a trip to PA?

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

Why don't I fix the cars and you guys write the program? 😎 

This is supposed to be a forum of programming gurus.

Plus I don't even know what I'm doing with Python to be honest.  I'm trying to keep things in different classes as much as possible because I know that each of these classes is going to grow in size as the project continues.

So far I have the following:

  1. One Main Program to launch them all.
  2. The main Tkinter Window class
  3. The Alysha class
  4. The Dictionary class to maintain the dictionaries
  5. The eSpeak class
  6. The Pocket Sphinx class
  7. The GUI_Funcs class (although I think that can be reduced to just a file of functions instead of a full-blow class)
  8. I can also see a need for a Sorting_Funcs file.  Just a file of dictionary and phoneme sorting routines.

And all of this is just the tip of the iceberg.   These are just the tools to get Alysha speaking and listening.

No AI yet.

This is why I want to keep the Main Program as simple as possible.  I will soon be creating classes for the AI methods.  And that's going to become quite complex.

So pretty much everything I mentioned above from 2 through 8, could be a project in an of itself for a single programmer or even a small team of programmers.   And that's before any AI is even started.

I can't wait to get to the AI stuff.  Although I have been working on it in terms of flow charts, grammar, and various algorithm.  Although I haven't written any actual code yet.

It will all be kept track of using numpy arrays to hold all the information.  So I want to build a numpy system that's well-thought-out to begin with so it will be easily expandable in modules that can be added to or subtracted from the main system without causing major havoc with other parts of the AI program.

I probably won't get to that until fall.

As you can see I still have a lot of work to do on 2 thru 8 above.   And that's just the TTS/SRE system.

Strangely I didn't even need TTS and SRE for the actual Linguistic AI program.  But think of how much more fun it will be to be able to actually talk with the robot instead of having to type everything in and read it back on a monitor. 😊

So it's worth the effort.

I was actually hoping to find a pre-programmed system like described above in 2 through 8, so I could just grab that and move on with the AI system.  But unfortunately no one thought to design a nice system like that, so I had to create it myself.   Thankfully I at least have Pocket Spinx and eSpeak to work with!  At least that much was done for me.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

I've been side-tracked into cleaning up my GUI program.  It's coming along pretty well.  I've decided to make it into just three windows.  The main window contains the main control buttons and also allows interaction with Alysha via both Pocket Sphinx and eSpeak.  The graphic below is a screenshot of the program. The two dictionary windows are for Pocket Sphinx and eSpeak dictionaries.  They pop up so they fill the screen nicely.  But they can still be moved around as separate windows.  I also have a "park" button on them that will snap them back to their original position.

I still have a lot of work today.  I want to add horizontal scrollbars to the dictionary windows.  I have it set up so the two windows are objects of a single class.  This way I only need to modify one piece of code and the new features are automatically updated in both windows.  This will also allow me to open many different dictionaries.  Later I'll be adding AI dictionaries.   So this is coming along pretty well. 

I still have a lot of work to do.  I want to be able to translate back and forth between eSpeak and Pocket Sphinx phonemes.   In fact, that's the code I'm currently working on.

dictionary shot

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote