Notifications
Clear all

Pocket Sphinx Speech Recogniztion Engine

48 Posts
5 Users
12 Likes
6,308 Views
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

Still need to figure out why my Win10 machines no longer recognize the SD cards after the starting image from NVIDIA is put on them.

I haven't had that problem.  But here's a page with some suggestions:

How to Fix SD Card Not Detected on Windows 10

I'm having extreme frustration trying to find information on how to modify (or even find) the vocabulary dictionary that vosk uses.

It's frustrating enough to find information that's difficult to understand.  But in this case I can't even find any information at all.

I'm about ready to go back to using pocket sphinx.   Although vosk does seem to be a quicker and possibly even more accurate decode of speech into words.   But if the dictionary is not easy to modify that's going to be a major problem for me.

Right now I'm trying to get vosk to recognize "alysha" as a wake word.  But apparently it doesn't have alysha in the dictionary.  Instead it keeps coming back with elisa.   I mean I could use elisa as the word it returns when I say alysha and deal with that in my python code.  But I'd rather just put alysha in the dictionary.

In fact, for my Linguistic AI projects I'm absolutely going to need to be able to modify the dictionary.

So as things are right now, even though I have vosk up and running and it seems to be working fairly well, without the ability to easily modify and edit the dictionary it's going to be worthless for my purposes.

So I may have no choice but to go back to pocketsphinx.  I know I can modify the dictionary there.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

The Final Judgement: Pocket Sphinx Wins!

After spending a lot of time pulling out my hair trying to find information on Vosk, I finally learned that it's not easy to modify the vocabulary dictionary for vosk.   So I'm going back to Pocket Sphinx for my Linguistic AI project.   Being able to build dictionaries from scratch and swapping them out is a major part of how I plan to build my Linguistic AI system.   I should mention here for anyone who might be interested, that this is also quite easy to do using Microsoft Speech Platform on Windows as well.

In any case, I'll be moving back to Pocket Sphinx for my project.  It actually appears to me to be decoding my speech very well.  It can also be trained for specific voices.  Although I haven't looked into how that feature works yet.

I should also point out that Vosk does decode speech a bit faster, and probably more accurately too.   So for someone who isn't concerned with customizing the vocabulary dictionary Vosk might be the better choice.  But since dictionary manipulation is paramount for what I want to do, I'll be going with Pocket Sphinx.

As always, different people have different criteria for what they need in their projects.   So which SRE is best for your purpose may differ.   There is a small delay between when I stop talking and pocket sphinx returns the decoded speech.  It's not bad, but definitely noticeable.  There may be parameters that can be adjusted for that.  IT could be just waiting to make sure you're done talking.  Or it could just take a long time to look through its huge dictionary.  If the latter is the case, then when I edit my dictionaries to only contain a few words it should respond much faster as well.

In any case, I'm going to move forward from here with Pocket Sphinx.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

So much fun that only a geek could have!

Okay, since I've moved forward with Pocket Sphinx I've been playing with the Pocket Sphinx dictionary.

The firs thing I did was break the main dictionary up into smaller text files for future reference.  I copied all the words that start with each letter of the alphabet and saved them each in their own file.   I'll later glean over those and create a new dictionary that only contains words that I'm interested in using for now.

The second thing I did was create a dictionary that only has a few keywords and the entire alphabet of letters only.

This is where the folly of the speech recognition really shows.  It has extreme trouble getting similar sounding letters correctly.    I'll need to look into seeing if I can improve on this because part of my idea is to be able to spell words for the robot when she asks me to spell a word.  So it would be paramount to get individual letters correctly.  Letters like b, d, e, p, and t are often reported incorrectly when just single letters are spoken.   But even a human may have difficulty knowing for sure which letter was spoken in some of these cases.   It does much better on larger words.

I was also right about it responding much faster with a smaller dictionary.  Less words to look up.

In addition to the individual letters of the alphabet I also have "my name is James" in the dictionary as individual words.   As well as "stop" and "listening", again as individual words.

Because these are the only words currently in the dictionary it nails it every time when I say 'stop listening'.  And so I can use those words to have the program stop listening.   That works very well.

Lot's more work to do for sure.

Vosk comparison

I can't really make a fair comparison with vosk because I have no way to reduce the dictionary like I did for pocket sphinx.   However, I did try repeating the alphabet to see how many letters vosk could recognize.  One problem with vosk is that because it has a large dictionary that cannot be easily reduced it refuses to recognize a lot of letters and instead responds with words that sound similar to the names of letters.   And sometimes words that aren't even close to the names of the letter.   So it probably has similar problems with very short sounds.

Microsoft Speech Platform

I found this to be the case with Microsoft Speech Platform too.  SREs tend to do better with more complex words, and even phrases.   This is because there's a lot more information to match up.  In fact, with Microsoft Speech Platform I was able to put short phrases into the dictionary and that worked very well as it could recognize phrases with far greater consistency then individual words.

Back to Pocket Sphinx

I haven't found any phrases in the pocketsphinx dictionary.  It appears to have a format of a single word followed by a space and then the phonemes.  So it's not possible to define phrases, at least not with spaces between the words.   It may be possible to define phrases as one huge word like 'stoplistening" as a single word.  I'll have to look into exactly what the capabilities are there.

In any case, I'm off to the races for Linguistic AI. 😊 

I'll just try to work around any limitations for now. I don't want to get too bogged down with the SRE.

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
Page 4 / 4