ESP32 I2S voice memo recorder with text to speech and language translation
This my latest. It's an ESP32 voice memo recorder with azure powered speech to text and language translation… looks like we can finally finish building that great big tower we started on all those centuries ago.
In this video, I give an explanation of how I made this, including a description of the code used.
The code is available from my github repo:
I thought I'd share it here because I was inspired by Bill's article and YouTube video on I2S. I was waiting in the car for my daughter to finish orchestra practice (it's a long practice and a long drive, so I can't drop her and come back), and I decided to watch the Dronebot workshop video on I2S. I thought it was really cool. So, I decided to do a project.
Looks like a neat project. I am considering adding Speech-To-Text functionality for Drone controller project I am working on that uses ESP32 WiFi to control a Tello Edu model drone. I will be pairing the ESP32 circuit to a Raspberry Pi that will provide the speech conversion. I am looking at the Google APIs for speech but there are a lot of account hoops to jump through. Maybe Azure is better in that regard. I may have more questions once I watch your video all of the way through.
@jjs357 Hi. It's quite simple to set up the Azure stuff, and i haven't supplied any credit card information. I'm not sure how long it will take for the free sign up credit to run out or if it gets renewed. I've been a member for about six months and used it in a few projects, but still the pricing for Azure doesn't seem very transparent. You can also try the facebook wit.ai, to see if they do text to speech.
@jonnyr Thanks for the encouragement. Based on what I saw here:
I think I will give Deepgram a first shot. No cost API key, no credit card needed, some free speech recognition credits. Plus lots of Python examples on Deepgram's blog. My use case is a Raspberry Pi providing the WiFi connection to Deepgram's speech recognition with a USB serial connection to my ESP32 device connected through its WiFi to the drone. Spoken word to text that is then fed to the drone command by command. Text commands are already implemented so I am hoping the speech to command text is a simple add-on. The question I have is what is the real time response like once a spoken command is uttered to when it can be sent to the drone. I don't expect joystick level response times of course.