Notifications
Clear all

Raspberry Pi End-Cloud Collaboration --- Intelligent AI Glasses

1 Posts
1 Users
0 Reactions
193 Views
Yassin
(@yassin)
Member
Joined: 4 months ago
Posts: 45
Topic starter  
Hey everyone, I’d like to share a new project I’ve been working on today. I’d really appreciate any valuable feedback from you all. Below are the details of my project—please check it out!
 
Introduction
 
This project is specifically developed to be used with drones, aiming to create a pairable AI glasses solution that can collaborate with UAVs. Focusing on the collaborative architecture of on-device AI acceleration and cloud-based vision large models, it addresses the opportunities and challenges brought by the rapid popularization of generative AI on UAV edge devices.

 
Using Raspberry Pi as the core hardware, the project builds a lightweight, low-latency AI glasses prototype with real-time environmental perception and intelligent interaction capabilities. By integrating efficient real-time inference of on-device NPU with the powerful generation and understanding capabilities of the cloud, it forms a complete closed loop from real-time local data processing to in-depth cloud analysis and decision-making. This provides a practical end-cloud collaborative solution for UAV FPV assistance, situational awareness, AR-enhanced flight, real-time image information overlay and other scenarios.

 
Project Features
 
☑️ On-device NPU real-time vision detection

☑️ Deep semantic analysis via cloud large models

☑️ Only upload objects of interest to the cloud, saving bandwidth and protecting the privacy of full images

☑️ Designed for pairing and collaboration with drones

 
Hardware Structure

Bill of Materials:

- Raspberry Pi 5 Motherboard
- Raspberry Pi HAT+ (NPU with 13TOPS computing power)
- HDMI LCD Display
- IMX219 USB Camera
- USB HID Buttons
1
Software Framework
 
Operating Environment:

- Python 3.11
 

Software Models:

- On-device Small Model: YOLO

- Cloud-based Large Model: Qwen3-VL-Plus
 

Collaboration Mechanism:

- HTTPS Network Communication
 
AI HAT+ Installation and Configuration
 
Introduction:
 
AI HAT+ is based on Hailo-8L and Hailo-8 neural network inference accelerators, offering two models with 13 and 26 TOPS. This project uses the 13 TOPS model, which is suitable for medium workloads and has performance similar to AI kits. AI HAT+ communicates using the PCIe interface of Raspberry Pi 5. The host Raspberry Pi 5 will automatically detect the on-board Hailo accelerator and use the NPU to perform supported AI computing tasks.

 
Hardware Installation:
 
Connect the AI HAT+ kit via the PCIe interface, insert it into the pin header of the Raspberry Pi, and then fix it with copper pillars, as shown in the figure below.

2
Software Installation:

 

Update System Software:

     1  sudo apt update && sudo apt full-upgrade
3

Check EEPROM Firmware Version:

     1 sudo rpi-eeprom-update
4
Install NPU Dependencies:
     1 sudo apt install hailo-all
5

Restart to Take Effect:

     1 sudo reboot
 
Verify Installation:
     1 hailortcli fw-control identify

6
Run the Demo:
     1 sudo apt update && sudo apt install rpicam-apps
     2 rpicam-hello -t 0 --post-process-file /usr/share/rpi-camera-assets/hailo_yolov6_inference.json
7
Software Development:

 

The software of this project is secondary developed based on the official detection routine. To realize the selection of interested targets, it is first necessary to draw a crosshair at the center of the screen.

8
Then add a text display layer to the screen to present the in-depth analysis results of the cloud-based large model on the screen.
9
Since this project does not upload the entire image to the cloud for large model analysis (which would waste network bandwidth and may leak background privacy information), the program will intercept the image of the target of interest. The key code for intercepting the image of the target of interest is as follows:
 
# 检查检测框是否包含中心点(使用归一化坐标)
if (bbox.xmin() <= frame_center_x <= bbox.xmax() and
bbox.ymin() <= frame_center_y <= bbox.ymax()):
# Calculate bbox area
bbox_area = bbox.width() + bbox.height()

# Find the detection closest to center
if bbox_area < min_area_in_center:
min_area_in_center = bbox_area
center_detection = {
'detection': detection,
'bbox': bbox,
'confidence': confidence,
'distance': bbox_area
}
if center_detection is not None:
center_bbox = center_detection['bbox']

# 转换归一化坐标到像素坐标
xmin = int(center_bbox.xmin() * frame.shape[1])
ymin = int(center_bbox.ymin() * frame.shape[0])
xmax = int(center_bbox.xmax() * frame.shape[1])
ymax = int(center_bbox.ymax() * frame.shape[0])

# 确保坐标在有效范围内
xmin = max(0, xmin)
ymin = max(0, ymin)
xmax = min(frame.shape[1] - 1, xmax)
ymax = min(frame.shape[0] - 1, ymax)

if xmax > xmin and ymax > ymin:
# 裁剪图像
cropped_frame = frame[ymin:ymax, xmin:xmax]
# 转换为BGR并保存
cropped_bgr = cv2.cvtColor(cropped_frame, cv2.COLOR_RGB2BGR)
cv2.imwrite("test.jpg?x-oss-process=image/watermark,g_center,image_YXJ0aWNsZS9wdWJsaWMvd2F0ZXJtYXJrLnBuZz94LW9zcy1wcm9jZXNzPWltYWdlL3Jlc2l6ZSxQXzQwCg,t_20", cropped_bgr)
trig_ai_vl_event.set()

Call the cloud-based large model for in-depth analysis through the intercepted target of interest image. The key code is as follows:

def ai_vl_thread():
# 初始化OpenAI客户端
client = OpenAI(
api_key="sk-xxxxxxxxxxxxxx",
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
while True:
trig_ai_vl_event.wait()
trig_ai_vl_event.clear()

# 将本地图片转换为base64
base64_image = encode_image("test.jpg?x-oss-process=image/watermark,g_center,image_YXJ0aWNsZS9wdWJsaWMvd2F0ZXJtYXJrLnBuZz94LW9zcy1wcm9jZXNzPWltYWdlL3Jlc2l6ZSxQXzQwCg,t_20")

# 创建聊天完成请求
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
{"type": "text", "text": "解释这张图片,100个字以内"},
],
},
],
stream=True
)

#print("AI回复:")
ai_vl_reply = ""
for chunk in completion:
if chunk.choices:
delta = chunk.choices[0].delta
if hasattr(delta, 'content') and delta.content:
ai_vl_reply += delta.content
display_text_pipeline.set_property("text", ai_vl_reply)
#print(delta.content, end='', flush=True)

 
After completing the hardware installation and software configuration, the project has initially realized the basic functions of drone-matched AI glasses. However, in the actual application process, there are still some key challenges and questions to be solved, which also need the valuable suggestions of all engineers:
 
  1. How to further optimize the compatibility between the AI HAT and the Raspberry Pi main board to avoid the problem of unstable NPU operation caused by firmware version differences?
  2. In the actual flight scenario of the drone, how to reduce the delay of cloud model calls to ensure that the analysis results can be fed back in real time to assist flight decision-making?
  3. How to balance the performance of on-device AI inference and the power consumption of the device, so as to adapt to the long-time operation needs of the drone’s outdoor flight?
  4. For the privacy protection of drone flight data, what more targeted optimization measures can be taken to avoid the leakage of sensitive information?
 
We sincerely look forward to your valuable opinions and solutions to the above problems, so as to continuously improve the stability and practicality of the project and better realize the matching and collaboration between AI glasses and drones.
 
Cheers,Yassin

Yassin | Building Compact, High-Current Connections for Drones & Robots


   
Quote