Notifications
Clear all

Pocket Sphinx Speech Recogniztion Engine

48 Posts
5 Users
12 Likes
6,418 Views
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1038
 

@robo-pi

OK, next question:  If you have two Jetson Nanos are both the version with just one camera port, or does one have a single port and the other have two camera ports?  Or do both have two camera ports.

My question is because my first Nano is a single camera model, but the one that gets here on Thursday is a two camera model.  So I need to know if the two camera model will work with JetPack 4.2.1.

I guess I can just do Paul's course with the older version of JetPack and try the newer versions with the Linguistic AI installs.  Just hope what you find as you proceed works the same under JP 4.2.2 or JP 4.2.3.

I would guess the odds of that working is pretty slim! 

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

OK, next question:  If you have two Jetson Nanos are both the version with just one camera port, or does one have a single port and the other have two camera ports?

Both of mine have the single camera port.   However, I'm toying with the idea of buying a third one because I really do like these.  If I get a third one I'll probably get the one with two camera ports just because it's available.  Not sure if I'll ever actually need them.  They might come in handy for doing stereo vision experiments at some point.  But I already have too many project in the fire.

Posted by: @codecage

So I need to know if the two camera model will work with JetPack 4.2.1.

I honestly don't know, but I would think it would.   I'm thinking the OS will sense the extra camera port kind of the same way it automatically senses additional USB ports, etc.  I think the only thing you would need to change would be in the OpenCV code, or however you are accessing the camera.

Although all of the above is just a guess.  You might want to do a search on the NVIDIA forums to be sure.

Posted by: @codecage

Just hope what you find as you proceed works the same under JP 4.2.2 or JP 4.2.3.

I get tired of trying to keep up with all the latest systems.   So I'll probably stick with JP 4.2.1 until I'm forced to upgrade.

It's my understanding also that other OS systems can be run on the Jetson Nano.  Although I haven't tried that.  I would imagine the that JetPack is specifically designed by NVIDIA to take advantage of the CUDA GPU, etc.

In fact, one of the reasons I bought two Jetson Nanos close together was because I wanted two identical systems before they upgraded the hardware to where it's no longer compatible with my system SD cards.   I like being able to boot up any of my SD cards on either Nano.   As far as I know the two I have are identical.  And that's what I wanted. 

I also bought two identical Raspberry Pi 4 systems from the same supplier.   I might be tempted to buy some more of those too. 😊 

I like having a lot of the same systems laying around.  Makes me feel more secure in case one quits working I can just plug in the next one and keep on working.  I save all my actual data on backup drives and thumb drives.  So if a computer dies, I don't lose my working data files with it.

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1038
 
Posted by: @robo-pi

If I get a third one I'll probably get the one with two camera ports just because it's available.

I ordered direct from NVIDIA and the two camera model was the only one offered.  Although I didn't really look for the one camera model.  The two camera model was the one that came up by default.

I was thinking the newer versions of JetPack may have been introduced to support the two camera model, but after some thought I realized that what you had said about the OS recognizing the additional device that the older JetPack version might work just fine.  Anyway I'm going to use the JP 4.2.1 when I fire up my new Jetson Nano.  I now need to get two more cameras so I can find out for sure.

I really liked your idea about get a working minimal OS image customized to your like to use as a starting point to make additional OS images for building different variations to test installations.  When looking at my inventory of SD cards I found I had only one 16GB card, and it was a real 'sloooow' card !  I had a bunch of 32GB and above cards however.  I put the JP 4.2.1 image on the 16GB card and will use it to configure my own "starter' SD card.  I guess the slowness of this 16GB card won't be too big of a deal as it won't actually used in a real live installation.  Only as a starting point to build other cards.

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

I guess the slowness of this 16GB card won't be too big of a deal as it won't actually used in a real live installation.  Only as a starting point to build other cards.

Yes.  It won't matter if it's a slow card.  You won't be using it live anyway.

I think I'm going to stick with using eSpeak and Pocket Sphinx for my Linguistic AI project.  I haven't been able to get vosk up and running yet.  And Kaldi, while incorporating all the new AI techniques of recognizing audio waveforms of words, it does seem to be overkill for what I need.

So I'll go back to eSpeak and Pocket Sphinx and just get those optimized.  I'll probably end up putting them on my 16GB main system card just so I always have those available.  And of course I'll set up the customized versions of them too.  This way I'll have my customized robot voice on all my system cards.

By the way.  I do the same thing on my Raspberry Pi 4 systems. One system card to rule them all. 😊 

Then you only need to deal with adding specialized software on top of that for other projects.

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

I finally got vosk working.   And it does seem to decode very well.   However, I don't know as much about vosk in terms of how to modify its dictionary, etc.   And finding information on it is like pulling teeth.

Also Kaldi was a monster to install.  However, I'm being told that I didn't even need to install Kaldi and all I needed was vosk.  vosk itself installs very easily and quickly.    So now that I have vosk up and running I'll try it again on my other Nano without Kaldi and see if it really does work all alone.   If so I might go with vosk, if I can find information on how to modify its dictionary.

Still not sure whether I'll go with vosk or pocketsphinx yet.   Both are quite difficult to find good information on.  I'm guessing that vosk is the better speech recognizer.  But until I can find more information on how to work with it and modify its dictionary I can't really use it very well.   I need it to do more than just recognize speech.  For my AI project I need to be able to build the dictionary from the ground up.  I know I can do that with Pocket Spinx, but I don't know if I can do that with Vosk yet.

 

 

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1038
 

@robo-pi

You are a real go getter and I can see you don't like to see obstacles in your path.  And if there is one you keep bashing your head against it until it gives up!  

SteveG


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1038
 

@robo-pi

Clicked save on that last post before finishing my train of thought!  Was actually wanting to put an emoji at the end of the bashing your head sentence.  🤣   So I did here!

And we haven't had any prose from you in awhile!  How about something along the Linguistic AI path?

SteveG


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1038
 
Posted by: @robo-pi

This all fits on a 16 GB SD card with room to spare and I'll probably redo this later adding a few more favorites. 

In any case, this is just my new "System Image".   When I want a new system I just burn this image onto a larger SD card.

How do you actually make an img file from the 16GB SD card?

Once I have put a Jetson Nano img file onto the SD card, my Win10 machine no longer "sees" the card as an additional drive, so therefore I can not copy anything from the drive, with Windows anyway.  Interestingly enough, Windows knows something was inserted in to the card reader as I hear the tune Windows plays when inserting or removing a device.

I've noticed that balenaEtcher sees the card as '\\.\PhysicalDrive2' (in my case), and not as a Windows drive letter once the image is written to the card.

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @codecage

How do you actually make an img file from the 16GB SD card?

I've been doing it on Windows using Win32DiskImager.

Your SD card should show up in the File Explorer.  It should show up as a bunch of drives.  But you only need to choose the first drive letter at the top of the list.

Remember to READ the card and not Write to it!  It will save it as a *.img file.  Where you'll need to give it a name before you start. 

The image file should then be 16GB.   And you can then burn that 16GB image back out to a larger SD card.

Although when you put it back into the Jetson Nano the Nano may report the new system card as being only 16GB.  If that's the case, you'll need to use Disk Tools to expand the partition to use the full capacity of the larger SD card.   It's real easy to do that.

I don't know if you can use balenaEtcher to do this or not.  I've been using Win32DiskImager and it works just fine.  

 

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

Simple Vosk:

Ok, I just installed vosk on my second Nano.  This time I DID NOT install Kaldi.  Kaldi is a monster and not required to just use Vosk as a simple SRE. 

Here's the installation as it worked for me. 

pip3 install https://github.com/alphacep/vosk-api/releases/download/0.3.3/vosk-0.3.3-cp36-cp36m-linux_aarch64.whl

The above should install vosk very quickly.  It's only 2.5 MB.

Then I also needed to install the following:

sudo apt-get install libgfortran3

Again, this went very quickly.

Finally I needed to download the us-en model as it says to do in the program code:

print ("Please download the model from https://github.com/alphacep/kaldi-android-demo/releases and unpack as 'model-en' in the current folder.")

You need to extract this to the folder where your Python Code will reside. 

NOTE: I needed to rename the above folder to just "model-en" for this to work.

Here's the Python code:

Note: I modified this code.  If I remember correctly I got the original code from here:

https://github.com/alphacep/vosk-api/tree/master/python/example

It's called test_microphone.py

There are some other examples there as well.

Here's my slightly modified version of the code. 

 

#!/usr/bin/python3


# I added the following 4 lines of code to clear the terminal window
import time
import os
os.system('clear')
time.sleep(1)
# END of my additions

from vosk import Model, KaldiRecognizer

if not os.path.exists("model-en"):
    print ("Please download the model from  https://github.com/alphacep/kaldi-android-demo/releases  and unpack as 'model-en' in the current folder.")
    exit (1)

import pyaudio

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()

model = Model("model-en")
rec = KaldiRecognizer(model, 16000)

Decoded_speech = ""
print("STARTING HERE:") # <--- I added this line.
while True:
    data = stream.read(2000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        Decoded_speech = rec.Result() # <--- I added this line.
        # print(rec.Result()) # <--- I remarked this line out.
        break # <-- I added this break
    else:
        pass # <-- I added this pass.
        # print(rec.PartialResult()) # <-- I remarked this line out.

print("My Print Statement: = {0}".format(Decoded_speech)) # <--- I added this line.
print("Done") # <--- I added this line

The output when I speak "Hello my name's James" into the mic is as follows:

STARTING HERE:
My Print Statement: = {
"result" : [{
"conf" : 1.000000,
"end" : 3.450000,
"start" : 3.060000,
"word" : "hello"
}, {
"conf" : 1.000000,
"end" : 3.690000,
"start" : 3.540000,
"word" : "my"
}, {
"conf" : 1.000000,
"end" : 4.050000,
"start" : 3.690000,
"word" : "name's"
}, {
"conf" : 1.000000,
"end" : 4.800000,
"start" : 4.050000,
"word" : "james"
}],
"text" : "hello my name's james"
}
Done
james@james-desktop:~/vosk$

It seems to work a lot faster than PocketSphinx.  

~~~

I still have a lot more work to do learning about vosk. 

I'd like to get a simpler reply from it for one thing.  I would like for it to just return "hello my name's james" instead of all the other garbage it prints out.  It also prints out a lot of other stuff that I didn't bother copying here. 

In any case, I can 'dig" the result out of this mess if I need to using Python string manipulation commands.  I'm also told that Json can be used to extra just what follows the "text": keyword. 

So I might end up using vosk because it does seem to be a faster decoder.   Supposed to be more accurate too. 

~~~

I still need to learn about the vosk dictionary and how I might be able to modify and manipulate that too.  That's a key feature I'll definitely need in order to use this SRE for my Linguistic AI project. 

I think I'm going to take a break from this stuff for a while and do some outdoor work!  It's too nice outside to be sitting in here staring at a monitor all day. 😊 

But at least I learned that you don't need to install Kaldi which is a HUGE MONSTER.   So that's nice.  Vosk is pretty lightweight software.  Maybe even lighter than pocketsphinx. 

Hard to find information on though.  I discovered everything I know from a fellow on Sourceforge.   Without his help I wouldn't have had a clue how to get this up and running. 

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
(@starnovice)
Member
Joined: 5 years ago
Posts: 110
 

@robo-pi Wow James, that is a lot of progress!  I suppose the next step is how to translate what you say to actions.

Pat Wicker (Portland, OR, USA)


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  
Posted by: @starnovice

@robo-pi Wow James, that is a lot of progress!  I suppose the next step is how to translate what you say to actions.

Actually that's the easy part.   Unless you're going to be telling the robot to do things it doesn't know how to do and has to figure out what exactly you're telling it to do.

One you get the spoken input into words, you should be able to have your program do whatever the words say.   Typically you start out with having special commands that are easily translated into tasks the robot already knows how to do.  For example if you say "Go to the Kitchen".  Your robot is only going to be able to respond to that command correctly if it already knows where the kitchen is, and how to get there.

I'll no doubt have my robot responding to specific commands like that since I'll have that capability.  But that's what I consider to be "Canned Intelligence". 

My ultimate goal is for the robot to hear language and actually figure out what you are saying on its own.  That's what my Linguistic AI project is all about.   However, I don't plan on having it do this anytime soon.  In fact, programming it too quickly basically just ends up being sophisticated "Canned Programming".  Although that can be very useful, and impressive to other people who don't understand how it works.

My goal is quite different.  I want to get the robot program to slowly learn words, one at a time, building up its understanding of what's being said, and what it is saying.

I'm using this outline of human development as a potential model:

Your Child's Talking Timeline

If I can get my robot to the stage of a Toddler in 12 months I will consider that to be extreme success.  Especially if it appears to have an understanding of what it's saying.   Exactly how I'll achieve this I'm not yet sure.  But I have a lot of ideas to work with, and I'll be making use of multi-dimensional arrays to make connections between words and concepts.

Based on this human time line I'll have 24 months, or 2 years to get my robot to the level of a human preschooler.

Actually I think if I get that far my robot will know far more than just 50 to 100 words and be able to put sentences together greater than just 2 or 3 words.

Of course my robot has the advantage of having me program its brain. 😎 

I'm going to do a bit of a cheat and basically give it the alphabet early on.  That's going to be the foundation of the dictionary.  Letters A thru z.  And I'll also have it learn how to spell words.  So when I teach it a new word, my robot will be asking me "How do you spell that?'  And I'll spell it for the robot and she can then put this word into her dictionary along with definitions and meanings that will grow over time just like a human child continually upgrades their vocabulary.

One great advantage for the robot is that she won't be easily distracted or ever tire of learning.   So it will be like having a student who pays attention 100% of the time and never gets tired, bored or fidgety.

It's bound to be an interesting experiment.   This is why I want to take a little time here at the beginning to get a rock solid TTS and SRE system in place before I begin.  I think eSpeak is fine for the TTS.   All that needs to do is being able to speak the words the robot learns.  The SRE needs to be fast as I don't want to have to be waiting on the SRE all the time.   So vosk will probably take precedence over pocketsphinx just on the speed issue alone.

But I need to learn how to manipulate and modify the vosk dictionary.  I haven't looked into that yet.  I hope it's as easy as pocketsphinx was.  If not, I may need to go back to pocketsphinx and settle for a longer delay in the translations.   Having full control over the dictionary is more important for my project.

 

DroneBot Workshop Robotics Engineer
James


   
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1038
 
Posted by: @robo-pi

I've been doing it on Windows using Win32DiskImager.

Your SD card should show up in the File Explorer. 

There in lies the rub!  Before I put the image on the card I see the card as a lettered drive.  After the image is written, I no longer can "see" the drive.  I have Win32DiskImager and have used it in the past.  Maybe I need to go back to the Etcher program if it is making the card "disappear!"

And the imaged card seems to boot just fine in the Jetson Nano, so I know I'm getting a good image copy.

SteveG


   
ReplyQuote
Robo Pi
(@robo-pi)
Robotics Engineer
Joined: 5 years ago
Posts: 1669
Topic starter  

@codecage

Computers are the most masochistic thing that humans ever invented.   I think I used to be happier before computers existed. 🤣 

What I'd really rather be doing is building a '32 Ford deuce coupe to take my girl out to a drive-in movie.  How did we ever give that up to try to teach computers how to think?

We must be insane.

Technology truly sucks
I'd rather be working on trucks
or build a deuce coupe
so a gal I could swoop
and go to the prom in a tux

We'd dance with delight
from dawn till midnight
and love would fill the air

But instead we just fight
for the next coding byte
as into the screen we stare

Will the code run?
Are we near being done?
Or will the computer just crash?

We're addicted to chips
that process our scripts
and our life is nothing but trash.

DroneBot Workshop Robotics Engineer
James


   
codecage reacted
ReplyQuote
codecage
(@codecage)
Member Admin
Joined: 5 years ago
Posts: 1038
 

@robo-pi

Absolutely insane!

I now have my Jetson Nano connected to a 4K TV and man is 3840x2160 resolution something to behold!

Working on building that 16GB customized install.  Still need to figure out why my Win10 machines no longer recognize the SD cards after the starting image from NVIDIA is put on them.  Otherwise not sure how I can use it to make an image that can be used to clone my customized version in the future.

I did discover that the "nano" editor isn't on the NVIDIA image even though several references on the Internet stated that "nano" is included by default on Ubuntu Linux systems.  Not a big deal and easy to install.

But I now find I am way past insane!  🤣 😎 

 

SteveG


   
ReplyQuote
Page 3 / 4