Notifications
Clear all

Pocket Sphinx Speech Recogniztion Engine

48 Posts
5 Users
12 Likes
5,991 Views
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

I'm just starting with CMU's Pocket Sphinx.  This is a Speech Recognition Engine designed for small computers with limited resources.  Like eSpeak it's also totally independent of the Internet and runs entirely on the local machine.

Just for the sake of Clarity:

eSpeak

eSpeak is a TTS (Text-to-Speech) software originally written by Jonathan Duddington and Reece Dunn

While it has a lot of similar characteristic with Pocket Sphinx it's a totally separate software application and converts Text to Speech, not the other way around.

I created a thread about eSpeak here: espeak, espeak-ng, pyttsx3, and MBROLA

Pocket Sphinx

Pocket Sphinx is an SRE (Speech Recognition Engine) developed by Carnegie Mellon University CMU.  It's also known as CMUSphinx.  Although Pocket Sphinx is just part of the larger CMUSphinx project.  While having nothing to do with eSpeak it does share a lot of common features.  Internet independence, designed to run on computers with limited resources such as SBCs.  It also has its own dictionaries that can be fully modified.

So while these different software applications have many common features, they are actually quite different things.  Working with both at the same time can lead to great confusion if you don't keep track of the fact that they really have nothing to do with each other.

Having said this, they can share a much larger common grammar and dictionary programs which they can both interact with to form a higher-level Linguistic AI system.  This is where I am ultimately headed.

Focus on Pocket Sphinx

To begin with Pocket Sphinx is a small version of CMU Sphinx.  The larger version is called Sphinx4.  Sphinx4 would be the choice if you have a very large and fast computer.   For my purposes I have chosen Pocket Sphinx for use on SBCs.

CMUSphinx Tutorial for Developers

Here is a link to the CMU tutorial. CMUSphinx Tutorial for Developers

This tutorial should contain everything required to get up and running with pocketsphinx

Building an Application with PocketSphinx will explain how to install Pocket Sphinx.  By the way you do not need to install Sphinx4.  That's for larger computers.

After having correctly installed pocketsphinx you can test to see if it's working properly with the following command (note, you may need to include the additional paths they suggest in the tutorial)

pocketsphinx_continuous -inmic yes

My Experience Thus Far: 😎 

This is where I am currently at with Pocket Sphinx.  I have it installed and running with the above terminal line command.

So far it appears to be working amazingly well in terms of being able to understand my speech and convert that to text which it prints out on the screen.   I'm using a fairly inexpensive USB microphone I picked up on Amazon for about $20.

I'm very pleased with the accuracy of the SRE.  Plus this is working from a very large dictionary.  The dictionary is where it looks up words based on the speech it hears.  You can modify this dictionary to suit your needs.  Just like with eSpeak, my plan is to basically empty this dictionary and have it start all over from scratch where the robot filled the dictionary up as it learns new words.

In any case, that's where I'm at so far.  Basically just getting started with this.

A couple of cons:

  • There seems to be a quite large delay between the time I speak and the time it replies with the recognized text.  Although this could be because the dictionary is quite large.
  • It also prints out a lot of information to the screen when translating. Not sure if that's a con, but I don't know what any of that information means yet.  It's not errors.  So I just don't know what it's all about yet.
  • Finally, I'm not sure what happens while it's listening?  In a robot I can't have all resources devoted to waiting for someone to say something.  So I'm not sure how that is going to work just yet.

Where to go from here?

My next step will be to try to access pocketsphinx from within Python so that I can have full programmatic control over the SRE.  This should also be explained in the tutorial linked to above, except I think they are using C or C++. 

In any case, this is the road I'm heading down for the SRE.  I don't know where this will lead.  I'm thinking that the worse case scenario would be that I would need to dedicated a Raspberry Pi 4 to serve solely as this SRE allowing the robot to constantly listen for speech and only sending incoming messages over to another computer on the robot when meaningful text is interpreted.

Clearly there's lots to learn on this, and very few tutorials available. 

If anyone else is interested in using Pocket Sphinx please chime in.  We can work through this together.  The more people working on it, the faster we can all learn how to use it.

DroneBot Workshop Robotics Engineer
James


   
Quote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

Update and Finally Some Success!

This was really like pulling teeth!   Information is really hard to come by.   And not well-documented IMHO.

In any case, here are some of the resources I used:

I found this video for help with initially installing Pocket Sphinx

Here are a list of commands to install it.

sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev
mkdir speech_recognition
cd speech_regognition
git clone https://github.com/cmusphinx/sphinxba...
cd sphinxbase
./autogen.sh
./configure
make clean all
make check
sudo make install

export LD_LIBRARY_PATH=/usr/local/lib
sudo nano /etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf
/usr/local/lib
sudo ldconfig
================================================
git clone https://github.com/cmusphinx/pocketsp...
cd pocketsphinx
./autogen.sh
./configure
make clean all
make check
sudo make install

 

OPTIONAL: Don't need to install Sphinx Train unless you want to:

===============================================
git clone https://github.com/cmusphinx/sphinxtr...
cd sphinxtrain
./autogen.sh
./configure
make clean all
make check
sudo make install

 

I didn't do this svn thing either.  I have no clue what this is for.

++++++++++++++++++++++++++++++++++++++++++++++++++++
svn checkout svn://svn.code.sf.net/p/cmusphinx/code/trunk cmusphinx-code
cd cmusphinx-code
cd cmusphinx-code
./autogen.sh
./configure
make clean all
make check
sudo make install

~~~~~

As you can see it was quite complicated to install. (not too bad after knocking off the last two items)

And at this point I only had it working from the command line.  But at least it was working.  I'll make another post for what I did to get it working in Python.

 

DroneBot Workshop Robotics Engineer
James


   
zaknick and codecage reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

To get it Running in Python

I discovered that I also had to install pocketsphinx-python   And it took me quite a while to figure out how to do this, but the following commands worked.   I should note here that I also put this in the speech_recognition directory I had created in the instructions in the previous post. 

git clone --recursive  https://github.com/cmusphinx/pocketsphinx-python/ 
cd pocketsphinx-python
python setup.py install

After that  I was able to run the following code to decode the contexts of a file of raw audio. 

 

#!/usr/bin/env python
# This program works but uses files as the raw audio input.
# Saturday April 18th.  

import subprocess as cmdLine

from os import environ, path

from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

cmdLine.call('clear')

MODELDIR = "pocketsphinx/model"
DATADIR = "pocketsphinx/test/data"

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-lm', path.join(MODELDIR, 'en-us/en-us.lm.bin'))
config.set_string('-dict', path.join(MODELDIR, 'en-us/cmudict-en-us.dict'))
decoder = Decoder(config)

# Decode streaming data.
decoder = Decoder(config)
decoder.start_utt()
stream = open(path.join(DATADIR, 'goforward.raw'), 'rb')
while True:
  buf = stream.read(1024)
  if buf:
    decoder.process_raw(buf, False, False)
  else:
    break
decoder.end_utt()
print ('Best hypothesis segments: ', [seg.word for seg in decoder.seg()])

 I was surprised that the the above code worked.  But it did. 

Next I had to figure out how to get microphone input:  More stuff to install.  See next post.

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

More Information:

First, I had to laugh because the first time it wouldn't stop running because I didn't know to use ctrl-z to stop it.  So I started cussing at it.  And the funny thing was that it printed out my cuss words! 🤣 

So apparently it has cuss words in its dictionary. 

Here's a very small excerpt of the dictionary that comes with it:

assesses AH S EH S IH Z
assessing AH S EH S IH NG
assessment AH S EH S M AH N T
assessments AH S EH S M AH N T S
assessor AH S EH S ER
assessors AH S EH S ER Z
asset AE S EH T
assets AE S EH T S
assets' AE S EH T S
asshole AE S HH OW L
assholes AE S HH OW L Z
assicurazioni AH S IY K ER AE Z IY OW N IY
assiduous AH S IH D W AH S
assiduously AH S IH D W AH S L IY
assign AH S AY N
assigned AH S AY N D
assigning AH S AY N IH NG
assignment AH S AY N M AH N T
assignment's AH S AY N M AH N T S

The words are followed by phonemes that it uses to recognize them.  And just like with eSpeak you can modify this dictionary to your heart's content.  For my project I'm going to reduce the dictionary for only a handful of words and start building it up from there.  The eventual plan is to have the robot build it's own dictionary as it learns new words. 

The dictionary that comes with it contains, 137,723 words.   I imagine that this is also why it might take a while to do a translation.  With a smaller dictionary it would come back with the translation much faster, but would only be able to recognize far fewer words. 

There are also tricks to get around this by having different dictionaries that the robot can switch to depending on the context expected in a conversation. 

The Keyword Phrase Feature:

Pocket Sphinx can also be configured to only respond to a selected few keywords before going through an entire dictionary.  For example, in my case I may have "Alysha Listen" as a keyword phrase.   Only when she hears that phrase will she respond with "I am listening".   And then be prepared to go through larger dictionaries with anything that follows. 

Other options are available as well.   So just like eSpeak, PocketSphinx has a lot of wonderful features to play with. 

Anyway, I'm making progress!  Hallelujah for that!

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

More Gibberish!

I just wanted to mention that it may not be necessary to install everything laid out in the second post of this thread.  Installing just PocketSphinx-Python along with Pyaudio might be sufficient.   I'm going to start from scratch on another computer and see what the minimum installation actually is.   I was stabbing in the dark trying to get this installed for the first time.

 

SphinxTrain

There is a package I installed called SphinxTrain.   I think this is some sort of utility similar to eSpeak Edit was for eSpeak.  I think SphinxTrain allows you to fine tun or "train" the phonemes used for your specific speech.  If that's true, this might allow you to fine-tune PocketSphinx to your voice and precisely how you pronounce words.

I'm not sure about this yet.  As I've said before it's difficult to find information or tutorials on this stuff!  But I think that's the idea behind SphinxTrain.

In fact, yes, that's exactly what it is.  I just found it here:

Training an Acoustical Model

So apparently CMU PocketSphinx does have this training utility program available.  Apparently I've already installed it (see second post in this thread).  But I haven't tried to run it yet or see what it does.

But it sure looks like Pocket Sphinx has a lot of capability and options.  So it appears to be a good choice if you want that sort of thing.

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
robotBuilder
(@robotbuilder)
Member
Joined: 5 years ago
Posts: 2037
 
Posted by: @robo-pi

As you can see it was quite complicated to install.

Why do they make things so complicated to install?  The nightmare of trying to install a graphics library for use by a C++ program made me give up using C++ because the same graphics library could be used with a simple include statement using another language.  With Python you can usually install a library with an import statement?

 


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @casey

Why do they make things so complicated to install? 

I don't know why they do this.   Although, I think part if it has to do with the fact that I'm trying to install all this stuff on a Jetson Nano and a Raspberry Pi.  Both of which use ARM64 processors.   It might have been a lot easier if I was doing this on a computer that runs on an Intel or AMD procesor. 

The other thing too, is that apparently both eSpeak and PocketSphinx don't appear to be very popular with the general population.  In part because there are easier TTS and SRE options available for people who don't mind being tied to an Internet Server and who also may not be concerned with having an ability to configure the system from scratch.  In fact, many people would rather just use a system that has already been programmed by someone else. 

For my purposes I needed (or wanted) something that has the following features:

  • Independent from the Internet and any external server. 
  • Is totally open source and configurable.
  • Will run on limited resources and SBCs. 
  • And I can strip the dictionaries down and start from scratch. 

 

What I want to use these for actually requires that I build the dictionaries from the ground up.  I'm building a Linguistic AI system that can evolve on its own.  Note: I used to call this "Semantic AI", but have since quit using that term because that term is already being used by commercial companies for doing things that are totally unrelated to what I'm doing.   So now I'm calling my system "Linquistic AI".  Based on language. 

But yeah, there are most likely other TTS and SRE's that are far easier to use.   In fact, if a person is willing to be strapped to Windows and C# or C++ the Microsoft Speech Platform is a far better choice!  I wanted to move over to Linux and Python, as well as being compatible with SBCs based on ARM64 processors. 

So I've created my own nightmares. 🤣 

But I think in the long haul I'll be happy that I suffered though this early learning process.

I'm hoping to make videos on both eSpeak and PocketSphinx aimed specifically at SBCs, mainly the Jetson Nano and Raspberry Pi.  So that anyone following in my footsteps won't need to go through this teeth-pulling like I did. 

That's also why I'm trying to share everything here as I learn it.  It might help someone else who might be interested in going down this jungle path. 

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @casey

With Python you can usually install a library with an import statement?

Actual that's not correct.  The library need to be installed on the computer before it can be imported into Python.  Although Python does come with various libraries that are installed when you install Python.   Then all you need to do is import those into your Python code to use them. 

But you can't just say, "Import PocketSphinx" if you haven't first installed PocketSphinx.  I wish it was that easy!  That would have been a piece of cake. 

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
robotBuilder
(@robotbuilder)
Member
Joined: 5 years ago
Posts: 2037
 
Posted by: @robo-pi
Posted by: @casey

With Python you can usually install a library with an import statement?

Actual that's not correct.  The library need to be installed on the computer before it can be imported into Python. 

Ok.  My assumption from looking at Python code examples.

Is it that difficult? With a quick google it seems as simple as,
>pip install xxx library

Hopefully it is not as complicated as installing libraries in a c++ project 🙁

 


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1018
 

@robo-pi

This entire thread is fascinating!  I'm staying tuned and will be following along.

So much to learn and so little time remaining.

SteveG


   
ReplyQuote
Spyder
(@spyder)
Member
Joined: 5 years ago
Posts: 846
 
Posted by: @robo-pi

If anyone else is interested in using Pocket Sphinx please chime in. 

Me !


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

@spyder, @codecage.

A Possible Simpler Installation

I'm going to try to reinstall pocket sphinx from scratch on a fresh SD system card.   I'm thinking that I might be able to get by, by just installing pocketsphinx-python instead of of the whole shebang and then Pyaudio.  So it might turn out to be a simpler installation after all.   Hopefully that will work.

I'll let you know how that goes when completed.

The SphinxTrain Utility

I haven't found much on how to use the SphinxTrain utility yet.  But I did find the following video that looks interesting.  NOTE:  This is way more advanced than anything we need right away.  But I thought I'd share this resource for whatever it might be worth.  This would be training pocketsphinx to recognize your voice better.  Like all AI systems, the more training data you give it the better it gets.

So I'm just posting this video to show what's possible.  This is not something you need to do to make Pocketsphinx work.  In fact, this kind of training would be needed for accurate transcriptions of long narrations.   For shorter command phrases you wouldn't need to do this kind of training.

In any case, this is a pretty nice video so it's worth posting here just to keep track of these kinds of resources:

Detailed information like this is hard to find. So if you find any good tutorials on pocketsphinx please share what you find.

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
robotBuilder
(@robotbuilder)
Member
Joined: 5 years ago
Posts: 2037
 

This is getting interesting.  I will have to fire up my Raspberry Pi and Python editor and give it a go.   Still very much in the learning mode with regards to Python.

 

 


   
ReplyQuote
Spyder
(@spyder)
Member
Joined: 5 years ago
Posts: 846
 

@robo-pi

Take a look at this before you get too far with what yer doing. It's an IMG that has pocket-sphynx already installed. I just unpacked it and was about to burn it

https://docs.projectalice.io/set-up/


   
ReplyQuote
Page 1 / 4