Simple ESP32-CAM Object Detection

DroneBot Workshop

(@dronebot-workshop)

Workshop Guru Admin

Joined: 5 years ago

Posts: 1084

Topic starter 2023-06-25 4:30 pm

Train an ESP32-CAM for Object Detection using a free Edge Impulse account. All without writing any code!

Object Detection is a Machine Learning (ML) technique that uses a computer with a camera to identify classes of objects it has been trained to recognize. Although this used to require massive computers and large datasets, it is now possible to perform decent object detection using microcontrollers.

While most microcontroller-based object detection projects use 64-bit chips like the powerful Kendryte K210, you can also use a 32-bit ESP32 to get the job done. And that’s exactly what we will do today - take a 9-dollar ESP32-CAM board and train it to detect objects.

I’ll be using a lantern battery and Robie, a 40-year-old Radio Shack robot, as my two “test objects” We’ll train a model using the powerful online neural network services of Edge Impulse; you’ll need an account, but don’t worry, it’s completely free.

Once we have our model trained, we’ll export it in the form of an Arduino Library. The library even includes a sample sketch we can use to test our model; no code to write! I’ll try it on an ESP32-CAM and ESP-EYE board.

Here is the Table of Contents for today's video:

00:00 - Introduction
02:34 - Object Detection
06:27 - Edge Impulse
08:15 - Workflow
10:47 - Look at ESP32-CAM & ESP-EYE
12:48 - Capturing Images - Webcam Setup
14:34 - Edge Impulse Setup
15:39 - Image Capture
17:57 - Label Images
20:32 - Create an Impulse
24:52 - Export to an Arduino Library
25:25 - Importing Library to Arduino IDE
27:41 - Testing with ESP-EYE
29:43 - Testing with ESP32-CAM
32:04 - EloquentESP32CAM Library
34:14 - Collect Images Sketch
35:35 - Capturing Images - ESP32-CAM
39:12 - Import & Label with Edge Impulse
42:46 - Build & Deploy Impulse
46:45 - Testing with ESP32-CAM
48:28 - EloquentArduino Code & Demo
51:32 - Conclusion
53:25 - Robie the Robot!

I will actually show you two methods of capturing images of your subject(s) - using a webcam or using the ESP32-CAM board itself. I’ll also introduce you to a library that can make doing all of this a lot easier.

The results are good, good enough to consider using the ESP32-CAM as an inexpensive object detection sensor.

Hope you enjoy the video!

"Never trust a computer you can’t throw out a window." — Steve Wozniak

rommudoh and robotBuilder reacted

Quote

robotBuilder

(@robotbuilder)

Member

Joined: 5 years ago

Posts: 2043

2023-06-25 7:15 pm

@dronebot-workshop

So I assume here that there is an implementation of a neural network on the esp32-cam that uses weights acquired by the image samples being run through a neural network at Edge Impulse?

My thoughts on collecting images is software combined with an automatic physical image acquiring setup to move the camera back and forth and a rotating platform?

How many objects can it learn to recognize and how well can it reject objects that it is not trained to recognize?

Critical tasks like a consistent and accurate recognition of something like a stop sign has not been achieved yet meaning a self driving car might not stop.

ReplyQuote

Ron

(@zander)

Father of a miniature Wookie

Joined: 3 years ago

Posts: 6979

2023-06-26 12:10 pm

@dronebot-workshop What timing, the latest Raspberry Pi Weekly has an article on Facial Recognition for Grizzly Bears (a specialized version of object recognition)

First computer 1959. Retired from my own computer company 2004.
Hardware - Expert in 1401, and 360, fairly knowledge in PC plus numerous MPU's and MCU's
Major Languages - Machine language, 360 Macro Assembler, Intel Assembler, PL/I and PL1, Pascal, Basic, C plus numerous job control and scripting languages.
Sure you can learn to be a programmer, it will take the same amount of time for me to learn to be a Doctor.

ReplyQuote

Inq

(@inq)

Member

Joined: 2 years ago

Posts: 1900

2023-07-07 11:44 am

Posted by: @zander

↑

@dronebot-workshop What timing, the latest Raspberry Pi Weekly has an article on Facial Recognition for Grizzly Bears (a specialized version of object recognition)

@dronebot-workshop, @zander, @robotbuilder,

That almost answers a question that I have. I want to study and play around with @dronebot-workshop article on a ESP32-CAM. Noting that most of the learning is done on more powerful machines so a micro processor can handle using it, I'm curious what are the practical factors. This Grizzly Bear piece is another data-point feeding that curiosity.

Real-world example - Say the idea is to build a person following robot by this type of system. IOW, no radio transmitter in the pocket kind of Micky Mouse. That is not only follows a person, but a specific person. The acid test would be to follow the person down a hall, say at a high-school between classes.

In the @dronebot-workshop video, Bill says "hundreds of images" are needed for a rather simple 2-D object. How would the problem scale for a specific person considering, that both fore, side and aft shots of the person are needed. Do any of you have a feel for this? Is it an N, N-Squared, N-Cubed type problem?

Is the data size representation of the person (provided by the learning computer to the running computer) N, N^2, N^3, etc?
The time to navigate through that data, how does that go up? IOW... though the database might be N^3, the walking may be more branch like and thus more of an N problem.
The Grizzly case sounds like it couldn't be done on a ESP32-CAM, but can be handled by a RasPi. But, I'm not getting whether it is a specific bear or just all Grizzly bears or even if it could differentiate between a Grizzly, Brown, Polar, Black bears.
Or... is the state of the art saying that even a PC or couldn't handle the specific person case? Say even a PC with a high-end NVidia GPU?

3 lines of code = InqPortal = Complete IoT, App, Web Server w/ GUI Admin Client, WiFi Manager, Drag & Drop File Manager, OTA, Performance Metrics, Web Socket Comms, Easy App API, All running on ESP8266...
Even usable on ESP-01S - Quickest Start Guide

ReplyQuote

Inq

(@inq)

Member

Joined: 2 years ago

Posts: 1900

2023-07-07 12:08 pm

@robotbuilder - This question might be more in your area. You seem to reference vacuum bots quite a bit. My vacuum ($99 cheapy on sale) still has a homing to the re-charge station. By observing it, I can tell it doesn't go to it by memory. It'll bounce off intervening things as it hunts for it. But it also doesn't seem to cross some infrared line, suddenly turn and make a B-line toward it. It seems to recognize a 180° field that it will start heading toward it. But it also seems correct for off-alignment. IOW - When it gets about a meter away, and is at some acute angle, it'll suddenly turn ~perpendicular to the direct path and move until it is directly in front of the station. Also, it seems to do pretty good, it can be 20 meters away down hallways, in bedrooms, etc and still get back to the station (with lots of corrections and some miss paths into other bedrooms).

What kind of sensors is this using that allows for this? It's a Eufy robovac 25c.

3 lines of code = InqPortal = Complete IoT, App, Web Server w/ GUI Admin Client, WiFi Manager, Drag & Drop File Manager, OTA, Performance Metrics, Web Socket Comms, Easy App API, All running on ESP8266...
Even usable on ESP-01S - Quickest Start Guide

ReplyQuote

Ron

(@zander)

Father of a miniature Wookie

Joined: 3 years ago

Posts: 6979

2023-07-07 12:25 pm

@inq Couple of quick FYI points. First the version of esp32cam software that has facial recognition is known to have major performance problems and is why it is not supported very well. It was totally dropped at one point but there has been some fixes applied to at least keep it on ICU status. The latest version has NO facial recognition.

The second point is to do with facial recognition. My iPhone uses that for security and it recognizes me with just a single face on sample even if masked and wearing glasses. There is provision for a 'with glasses and with a Mask. I have with a mask enabled, but not with glasses added as my glasses fog up when wearing a mask.

I just checked, and I do NOT see a newer processor version of the module with a camera beyond the S2 series. Also the most common board is the AI-THINKER and I see no new version on their web site.

I have been looking at moving to the PICOW platform utilizing Arducam camera modules. Cheaper board, but more expensive camera BUT a better camera.

I have dozens of esp32-cam modules, and I am afraid they are rapidly becoming e-trash.

First computer 1959. Retired from my own computer company 2004.
Hardware - Expert in 1401, and 360, fairly knowledge in PC plus numerous MPU's and MCU's
Major Languages - Machine language, 360 Macro Assembler, Intel Assembler, PL/I and PL1, Pascal, Basic, C plus numerous job control and scripting languages.
Sure you can learn to be a programmer, it will take the same amount of time for me to learn to be a Doctor.

Inq reacted

ReplyQuote

robotBuilder

(@robotbuilder)

Member

Joined: 5 years ago

Posts: 2043

2023-07-07 1:11 pm

@inq wrote:

"You seem to reference vacuum bots quite a bit."

Because they are simple but real working robots that actually work and if you can duplicate in your own robot their ability to navigate and find their charger you are doing very well.

Also they are a cheap source of motorized encoder wheels with battery and bumpers.

"But it also doesn't seem to cross some infrared line, suddenly turn and make a B-line toward it. It seems to recognize a 180° field that it will start heading toward it."

It will be detecting a flashing IR light in the charging base. Although I can't see the flashing light using my iPhone as the iPhone must be filtering out IR light I can see it with my old digital camera. So if you have an older digital camera point it at the charger while it is plugged in.

Here is one example of how it is done.
https://www.researchgate.net/figure/Charging-station-localization-system-based-on-IR-sensors_fig1_350932640
There is a button [Download full-text] for a detailed explanation.

This topic needs it own thread.

Inq reacted

ReplyQuote

robotBuilder

(@robotbuilder)

Member

Joined: 5 years ago

Posts: 2043

2023-07-07 1:25 pm

@inq wrote:

"That is not only follows a person, but a specific person. The acid test would be to follow the person down a hall, say at a high-school between classes."

First it would have to identify the person as THE person to follow. So it would have to see their face and have facial recognition software (maybe using OpenCV). Then it would have to track the movement of the body with that face. It would not have to keep recognizing the face anymore than you would following someone you had just identified. No different to following a moving target once you lock onto it.

After recognizing the face maybe associate it with other visual cues the face is attached to. A simple example is if they were wearing a red jumper then the robot could simply track the color.

ReplyQuote

Inq

(@inq)

Member

Joined: 2 years ago

Posts: 1900

2023-07-07 1:51 pm

Posted by: @robotbuilder

↑

Because they are simple but real working robots that actually work and if you can duplicate in your own robot their ability to navigate and find their charger you are doing very well.

It wasn't an accusation needing defending, it was recognition of expertise.

Posted by: @robotbuilder

↑

It will be detecting a flashing IR light in the charging base. Although I can't see the flashing light using my iPhone as the iPhone must be filtering out IR light I can see it with my old digital camera. So if you have an older digital camera point it at the charger while it is plugged in.

My Pixel didn't see it either. Dug up an old Samsung, it did.

Posted by: @robotbuilder

↑

Here is one example of how it is done.
https://www.researchgate.net/figure/Charging-station-localization-system-based-on-IR-sensors_fig1_350932640

Great reference! Thanks

Posted by: @robotbuilder

↑

This topic needs it own thread.

The IR sensor post yes, but I don't plan on furthering this line of discussion. It was an idle offshoot. The other is directly applicable to the topic at hand.

VBR,

Inq

3 lines of code = InqPortal = Complete IoT, App, Web Server w/ GUI Admin Client, WiFi Manager, Drag & Drop File Manager, OTA, Performance Metrics, Web Socket Comms, Easy App API, All running on ESP8266...
Even usable on ESP-01S - Quickest Start Guide

ReplyQuote

Inq

(@inq)

Member

Joined: 2 years ago

Posts: 1900

2023-07-07 2:04 pm

Posted by: @robotbuilder

↑

First it would have to identify the person as THE person to follow. So it would have to see their face and have facial recognition software (maybe using OpenCV)...

I wasn't asking how.

The main topic has a lowly ESP32-CAM doing a useful, conceptual example. @zander gave a second instance using RasPi that could do an undefined level better. I was merely asking for an educated/experienced/gut-feel estimate of what might be needed CPU wise to handle the Follow-a-specific-person case using this Object Detection using a free Edge Impulse technique. I'm sure Bill did a lot more behind-the-scenes testing and research and probably has a feel for what I'm asking. Certainly far more than me out here in the cheap-seats before I pick it up and explore for myself.

@zander - I'm new to the RaspPi Weekly. I hunted around a little, but couldn't find the article you mentioned. Could you provide a link, please? 🤩

3 lines of code = InqPortal = Complete IoT, App, Web Server w/ GUI Admin Client, WiFi Manager, Drag & Drop File Manager, OTA, Performance Metrics, Web Socket Comms, Easy App API, All running on ESP8266...
Even usable on ESP-01S - Quickest Start Guide

ReplyQuote

Ron

(@zander)

Father of a miniature Wookie

Joined: 3 years ago

Posts: 6979

2023-07-07 2:13 pm

@inq Here is the Bear link https://www.raspberrypi.com/news/bearid-face-recognition-for-brown-bears/

First computer 1959. Retired from my own computer company 2004.
Hardware - Expert in 1401, and 360, fairly knowledge in PC plus numerous MPU's and MCU's
Major Languages - Machine language, 360 Macro Assembler, Intel Assembler, PL/I and PL1, Pascal, Basic, C plus numerous job control and scripting languages.
Sure you can learn to be a programmer, it will take the same amount of time for me to learn to be a Doctor.

Inq reacted

ReplyQuote

Inq

(@inq)

Member

Joined: 2 years ago

Posts: 1900

2023-07-07 2:27 pm

Posted by: @zander

↑

@inq Here is the Bear link https://www.raspberrypi.com/news/bearid-face-recognition-for-brown-bears/

Thanks. Great article and also directly applicable to my question about horsepower needed. Although the article has a gif (video) showing a frame-by-frame detection at what looks at least 30 fps, the article implies the RaspPi 4 takes the images, but analyzes them later. IOW, It's not clear if the video is what the RaspPi did real-time or in a subsequent batch process or if it was an earlier example using the laptops that were mentioned.

3 lines of code = InqPortal = Complete IoT, App, Web Server w/ GUI Admin Client, WiFi Manager, Drag & Drop File Manager, OTA, Performance Metrics, Web Socket Comms, Easy App API, All running on ESP8266...
Even usable on ESP-01S - Quickest Start Guide

ReplyQuote

Ron

(@zander)

Father of a miniature Wookie

Joined: 3 years ago

Posts: 6979

2023-07-07 2:36 pm

@inq Just an afterthought FYI, for my game camera use case the PICO is probably good enough as is the esp32CAM but for your case a full RaspberryPi 4 with 8GB RAM and maybe even the 64bit OS although I am not sure of that and it is a bit slower. There is even a CS lens adapter which opens up a lot of lens choices, then with a CS to C adapter even more lenses. Those are 32 TPI and I bet every pawn shop will have dozens if not more of old but often very good glass. Just try 'cs mount lens' then 'c mount lens' in amazon.

First computer 1959. Retired from my own computer company 2004.
Hardware - Expert in 1401, and 360, fairly knowledge in PC plus numerous MPU's and MCU's
Major Languages - Machine language, 360 Macro Assembler, Intel Assembler, PL/I and PL1, Pascal, Basic, C plus numerous job control and scripting languages.
Sure you can learn to be a programmer, it will take the same amount of time for me to learn to be a Doctor.

Inq reacted

ReplyQuote

Ron

(@zander)

Father of a miniature Wookie

Joined: 3 years ago

Posts: 6979

2023-07-07 2:48 pm

@inq I did not read the article in depth, but I can tell you the esp32CAM module did do real time facial recognition, but it turned out to be just a bit too much for the processor back then. Let's just say it was semi-successful. If they update to the new S3 version of the chip then it should be easy, but I have not seen any plans for that yet.

First computer 1959. Retired from my own computer company 2004.
Hardware - Expert in 1401, and 360, fairly knowledge in PC plus numerous MPU's and MCU's
Major Languages - Machine language, 360 Macro Assembler, Intel Assembler, PL/I and PL1, Pascal, Basic, C plus numerous job control and scripting languages.
Sure you can learn to be a programmer, it will take the same amount of time for me to learn to be a Doctor.

ReplyQuote

robotBuilder

(@robotbuilder)

Member

Joined: 5 years ago

Posts: 2043

2023-07-07 3:16 pm

@inq wrote:

"I was merely asking for an educated/experienced/gut-feel estimate of what might be needed CPU wise to handle the Follow-a-specific-person case using this Object Detection using a free Edge Impulse technique."

I haven't done anything with Edge Impulse. My gut feeling is that although useful as a pattern recognizer it is far from what I would call "seeing" or "understanding" the visual world which is what interests me most.

Inq reacted

ReplyQuote