or anyone else that might like to comment or give suggestions...
Today I decided to use AnyDesk to be able to program and send commands from my desktop laptop to the robot laptop. So now I can sit in my office/bedroom and drive it around the house like I have seen on many utube robots. I think it is possible to do the same between a Windows PC and the Linux RPi should I ever get the time to do a Rpi version. However manual control is not the goal. I am interested in autonomous control of the motors and reading the sensors in order to carry out some spoken command.
My visually controlled robot was coded using FreeBASIC code but to perhaps be more interesting to others I decided I had better write a Python version. Things have moved on since I first joined the forum and I know Python a bit better and AI makes learning and using it easier.
Although I am not just using the original target detection code in the project I decided to use AI to help write a Python version of my said target FreeBASIC code. The Python version wasn't all that good until I told it to write an exact copy of my algorithm although it had to use OpenCV so it wasn't a perfect copy.
At least anyone with Python installed and a camera can run it.
This is an output (laptop camera used) of the FreeBASIC version.
This is an output (laptop camera used) of the Python version.
For some reason it used yellow dots on the centroids of each found blob. I had done likewise on the target blobs so maybe I gave it a different FB version to translate.
Here is the Python code.
import cv2
import numpy as np
from dataclasses import dataclass
import random
# ============================================================
# CONFIGURATION
# ============================================================
IMGW = 640 #1100
IMGH = 360 #600
CAMERA_INDEX = 0
MIN_BLOB_AREA = 10
TARGET_DISTANCE = 5
SHOW_BINARY = False
SHOW_LABELS = False
# ============================================================
# BLOB STRUCTURE
# ============================================================
@dataclass
class Blob:
area: int
xMin: int
yMin: int
xMax: int
yMax: int
xCentroid: int
yCentroid: int
# ============================================================
# FAST LOCAL THRESHOLD
# ============================================================
def fast_local_threshold(gray):
"""
Faster replacement for the original pixel-loop threshold.
Uses OpenCV adaptive threshold internally.
Much faster than manual nested loops.
"""
binary = cv2.adaptiveThreshold(
gray,
255,
cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV,
7, # neighborhood size
7 # threshold offset
)
return binary
# ============================================================
# CAMERA
# ============================================================
cap = cv2.VideoCapture(CAMERA_INDEX)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, IMGW)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, IMGH)
if not cap.isOpened():
print("ERROR: Could not open camera.")
exit()
print("ESC = quit")
print("B = toggle binary view")
print("L = toggle label view")
# ============================================================
# MAIN LOOP
# ============================================================
while True:
ret, frame = cap.read()
if not ret:
break
frame = cv2.resize(frame, (IMGW, IMGH))
# ========================================================
# GRAYSCALE
# ========================================================
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# ========================================================
# FAST THRESHOLD
# ========================================================
binary = fast_local_threshold(gray)
# Optional cleanup
binary = cv2.medianBlur(binary, 3)
# ========================================================
# CONNECTED COMPONENTS
# ========================================================
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(
binary,
connectivity=8
)
blobs = []
# stats columns:
# [x, y, width, height, area]
for label in range(1, num_labels):
x = stats[label, cv2.CC_STAT_LEFT]
y = stats[label, cv2.CC_STAT_TOP]
w = stats[label, cv2.CC_STAT_WIDTH]
h = stats[label, cv2.CC_STAT_HEIGHT]
area = stats[label, cv2.CC_STAT_AREA]
if area < MIN_BLOB_AREA:
continue
cx = int(centroids[label][0])
cy = int(centroids[label][1])
blob = Blob(
area=area,
xMin=x,
yMin=y,
xMax=x + w,
yMax=y + h,
xCentroid=cx,
yCentroid=cy
)
blobs.append(blob)
# ========================================================
# FIND CLOSEST CENTROIDS
# ========================================================
hit1 = None
hit2 = None
min_distance = 999999.0
blob_count = len(blobs)
for i in range(blob_count):
b1 = blobs[i]
for j in range(i + 1, blob_count):
b2 = blobs[j]
dx = b2.xCentroid - b1.xCentroid
dy = b2.yCentroid - b1.yCentroid
dist = np.hypot(dx, dy)
if dist < min_distance:
min_distance = dist
hit1 = b1
hit2 = b2
# ========================================================
# DISPLAY
# ========================================================
display = frame.copy()
# Draw all blobs (optional)
for blob in blobs:
cv2.circle(
display,
(blob.xCentroid, blob.yCentroid),
2,
(0, 255, 255),
-1
)
# ========================================================
# TARGET DETECTION
# ========================================================
if hit1 is not None and hit2 is not None:
dx = hit2.xCentroid - hit1.xCentroid
dy = hit2.yCentroid - hit1.yCentroid
distance = np.hypot(dx, dy)
# Draw line between nearest blobs
cv2.line(
display,
(hit1.xCentroid, hit1.yCentroid),
(hit2.xCentroid, hit2.yCentroid),
(0, 255, 255),
1
)
if distance < TARGET_DISTANCE:
# Draw target rectangle
cv2.rectangle(
display,
(hit1.xMin, hit1.yMin),
(hit1.xMax, hit1.yMax),
(0, 128, 255),
2
)
cv2.rectangle(
display,
(hit1.xMin - 2, hit1.yMin - 2),
(hit1.xMax + 2, hit1.yMax + 2),
(0, 0, 255),
1
)
size_text = f"{hit1.xMax-hit1.xMin}, {hit1.yMax-hit1.yMin}"
cv2.putText(
display,
size_text,
(hit1.xMax + 5, hit1.yMax),
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
(0, 255, 255),
1
)
cv2.putText(
display,
"TARGET",
(30, 40),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 255, 0),
2
)
else:
cv2.putText(
display,
"NO TARGET",
(30, 40),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 0, 255),
2
)
# ========================================================
# FPS DISPLAY
# ========================================================
fps = cap.get(cv2.CAP_PROP_FPS)
cv2.putText(
display,
f"Blobs: {len(blobs)}",
(30, 80),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(255, 255, 255),
2
)
# ========================================================
# SHOW WINDOWS
# ========================================================
cv2.imshow("OpenCV Connected Components Tracking", display)
if SHOW_BINARY:
cv2.imshow("Binary", binary)
if SHOW_LABELS:
color_labels = labels_to_color(labels)
cv2.imshow("Labels", color_labels)
# ========================================================
# KEYBOARD
# ========================================================
key = cv2.waitKey(1) & 0xFF
if key == 27:
break
elif key == ord('b'):
SHOW_BINARY = not SHOW_BINARY
elif key == ord('l'):
SHOW_LABELS = not SHOW_LABELS
# ============================================================
# CLEANUP
# ============================================================
cap.release()
cv2.destroyAllWindows()
Due to my front drive camera giving up the ghost, I had been playing with an rpi camera linked to a small rpi Zero2 board and steaming the video over a POE link to be examined by openCV running on a rpi5 or my desktop mac. At the moment I run motion detection within a specified central screen area which works very well, and I intend to see what can be done for some sort of object recognition later on. Basic stuff like is it a man, van, rabbit or deer.
As this was conveniently all set up, after changing your code to read from my video stream I gave it a run. The camera set up is still in my office pending the printing of a suitable case, and my office is a rather more busy environment than your office, think more like a junk room 😀.
Your example code displayed a small window frame with a lot of Centroid dots and a few brief random bounding boxes popping up from time to time. I then increased the window display W and H to 1296, 972 which equates to the video stream size which works perfectly with the openCV motion detection with loads of bounding boxes placed on any area of the screen with motion. But when I tried with your Centroid example, having the screen displayed at this resolution made the displayed window creep to a jerky slow crawl.
I'm sure the program can be tweaked to work better in my environment and I may have a further play with it when I get round to it.
Whats your aim with recognising a small bob on the wall from a camera on your bot. Are you going to try to recognise other blobs on the walls to triangulate the bots position or something?
My laptops use the Windows OS and I used its camera and/or a connected webcam. I have used that setup for decades although only to play with image processing algorithms. The first camera I used was a monochrome security camera on an MSDOS machine. Back in those days it was all machine code as the machines were slow compared with a modern PC.
This was the first webcam I used.
Lots of fun writing my own image grab code via the bidirectional 8 bit Centronics port (used c++ code) rather than use the programs that came with it. I used the same method to get color images that he demonstrates.
Had more spare time then to indulge my interest in electronics and programming. Back then the programming environment and hardware was much simpler to use and learn.
What's your aim with recognising a small blob on the wall from a camera on your bot.
I used the target as a beacon because it was fast and simple which I figured out to run on a slow game boy camera a long time ago.
In computer vision, a blob (Binary Large Object) refers to a group of connected pixels in an image that share common properties, such as color, brightness, or texture. I binarize the image into black or white pixels thus in this case a blob is a group of connected black pixels. You can then create a description of the blobs and store them in a list. The target algorithm searches that list for two blobs with the same or close enough centroid values. It usually works because two blobs with the same centroids doesn't occur often but always occurs in this target shape.
A pattern can be a pattern of patterns and that is how some robot visual systems navigate a house.
Are you going to try to recognize other blobs on the walls to triangulate the bots position or something?
Yes. Find its position and orientation and then compute a path to another position. I know what to do in theory but real hardware doesn't work as well as theoretical hardware and software takes some tweaking.
James Bruton used the target image with his BinBot 9000 project but he used complex AI to recognize the simple target. I hard code it because I know how but the result is the same which is recognition of the pattern along with its position within the image and the dimensions of a rectangle drawn around the pattern.
So now I can sit in my office/bedroom and drive it around the house...
I think this is the right direction, at least to start. The idea is to have a mobile camera (and other sensors) that can capture data (telemetry). That data is the input for algorithms to determine position and environment.
However manual control is not the goal.
Well, you always need manual control as a fall back while working out the automated control. It's really a safety feature.
I am interested in autonomous control of the motors and reading the sensors in order to carry out some spoken command.
Taken at face value, this isn't that far a technical reach from manual control. There are numerous speech to text packages that can convert "Move left. Move forward. Rotate camera 45 degrees." speech into their corresponding manual controls. While that may not sound interesting, it's a necessary intermediate step to more complex autonomous control. Using an AI agent may allow you to bypass it, but you still need to train the AI agent how to issue commands. It's "Robbing Peter to pay Paul."
But I know you have a more complex idea in mind. You want the robot to be autonomous in the sense that it understands its surroundings. without an external connection. An external connection would be used to issue complex commands, "Go to the back door.", "Locate Bob."
Don't underestimate how complex this is. It's much more complex; Vastly more complex. The image processing alone is an entire subfield of computer science. Integrating that with sensor data is equally complex.
Integrating them together with AI is the current siren song that companies like NVIDIA and QualComm are selling.
What I'm saying is I think you want a simple mobile platform to take pictures. You can then use those pictures to experiment with OpenCV to your hearts content. And experiment with the sensor data, etc., etc.
At least, that's my goal with the SmartCar. If I can get it to deliver images and telemetry to my laptop, I'm satisfied. Processing the images, etc. is less interesting to me. I don't have enough time to invest in that.
The one who has the most fun, wins!
The image processing alone is an entire subfield of computer science. Integrating that with sensor data is equally complex. Integrating them together with AI is the current siren song that companies like NVIDIA and QualComm are selling.
Amazing stuff and I don't really understand how it all works in any detail. Ultimately I think it comes down to mathematical calculations repeated millions of times a second in things called neural networks which have advanced beyond the simple ANNs I do know something about. I have tried to read books on how it all works but lack the mathematical background to follow most of it.
What I'm saying is I think you want a simple mobile platform to take pictures. You can then use those pictures to experiment with OpenCV to your hearts content. And experiment with the sensor data, etc., etc. At least, that's my goal with the SmartCar. If I can get it to deliver images and telemetry to my laptop, I'm satisfied.
Which is what I have (see above) and hope to post the experiments and code as show and tell posts 🙂
Processing the images, etc. is less interesting to me. I don't have enough time to invest in that.
That has been one of my interests for decades and have invested a lot of time with it. If your sensory data is visual then the system has to process that data to achieve some desired outcome like navigate a house or recognize an object to pick up. Even if you use AI vision it is still combined with hand written code to make use of it.
The idea of a robot "understanding" anything is a philosophical question unless you can provide an operational definition for the word.
For me I see the level at which I have been getting my robot to process and use sensory data is more at the level of a simple version of an insect brain which in real insect brains is amazing. I think simple robots can do useful things and the robot vacuum, mopping robots are an example.
Spent some time working on a Python version of my code particularly the PC to Arduino communications and OpenVC replacement of the target recognition code. Getting OpenCV versions of my image processing routines has been less than satisfying.
Processing the images, etc. is less interesting to me. I don't have enough time to invest in that.
Back in my day there was no OpenCV so I invested a lot of time in it. I actually enjoy programming at that level. Boring??? Maybe but I love all that low level understanding which is all there was before OpenCV.
When Windows came along it all became very boring and long winded using the Window API to make use of the graphic cards. You had no choice because even at a low level you could not directly access the graphic cards without a boring long winded set of rules something called DirectX.
But when I tried with your Centroid example, having the screen displayed at this resolution made the displayed window creep to a jerky slow crawl.
I am using a PC not a RPi maybe that is why? I don't see speed issues. It seemed smooth and fast enough when I changed the screen to increased the window display W and H to 1296, 972. I was using W = 1100 and H = 600 with the webcam camera as that fits my laptop screen best.
I am not stuck on just using the target pattern. There are other possible visual patterns to use as visual beacons. It is just convenient for the time being because it works and enables testing the use of visual beacons to navigate.
Ideally a robot needs to look around and choose natural patterns as beacons.
When it comes to actually implementing ideas in an actual robot issues arise that you might not have thought of. For example when I first tried orientating to a target the robot reacted erratically to false positives so I had to modify the code and target recognition to fix that. Instead of just looking for two blobs with similar centroid values I added shape recognition in the form of returning a value for how circular the blob is.
Blob.circularity = 4*PI*(Blob.AREA1 / Blob.perimeter ^2)
This involves tracing the outline of a blob (perimeter) and computing its area (AREA1) at the same time.
This is a screen grab where I have displayed the outlines. The yellow dot appears when the circularity value is greater than 0.6
To enlarge image, right click image and choose "open link in new window"



