arduino-cli monitor -p /dev/ttyUSB0 -b 115200 --config-file "$ARDUINO_DATA_DIR/arduino-cli.yaml" Dual-Mode Vision-Guided Robotic Arm — System Visualiser

Dual-Mode Vision-Guided
Robotic Arm

A 6-DOF autonomous sorting robot with interrupt-safe human-in-the-loop control — local gesture, remote MQTT, and full AI vision pipeline.

ROS2 Jazzy YOLOv8 CUDA MediaPipe MQTT HiveMQ MoveIt2 Gazebo Harmonic ESP8266 + PCA9685
Scroll to explore

Remote Console
Installable Mobile Web App
The remote controller is now built as a PWA with joystick controls, install prompt support, and mobile-first responsiveness.
📲

PWA Install (Android + iOS)

Open the remote console in your phone browser and use Add to Home Screen / Install App. It launches like a native application in standalone mode.

Supported ✓
🎮

Precision Joystick Mode

Dual virtual joysticks + dedicated wrist/gripper hold buttons reduce accidental jumps compared to camera-only control on mobile.

Low Latency
🧰

Control Tuning Panel

Runtime sliders for speed, deadzone, smoothing, and publish rate allow quick tuning based on network conditions and operator preference.

Operator Friendly

Reset Pose Safety

A dedicated reset command immediately returns target joints to neutral values for safe recovery when remote motion becomes unstable.

Safety Focus
Controller URL: /src/robot_arm_remote/web/index.html
Install tip: Use HTTPS host, then tap "Install App" in the controller UI.
MQTT topic: natraj/robot_arm/teleop/target_state

Hardware Overview
What's Inside the System
Every physical component and its role in the pipeline.
🦾

6-DOF Robotic Arm

Metal servo arm kit with 6× MG996R servos, aluminium brackets, claw gripper.

On Hand ✓
📡

ESP8266 NodeMCU

Wi-Fi microcontroller acting as ROS2 ↔ PCA9685 bridge. Replaces Arduino Mega for wireless capability.

On Hand ✓
🔌

PCA9685 16-ch

I²C servo driver. Takes joint angle commands from ESP8266, outputs PWM to all 6 servo channels.

On Hand ✓

5V / 10A PSU

Dedicated servo power supply. Prevents brownout under peak MG996R stall current (~1.8A × 6 servos).

On Hand ✓
📷

1080p Webcam

Laptop-mounted. Used exclusively for LOCAL gesture control via MediaPipe.

On Hand ✓
📱

User's Phone

Fixed overhead via IP Webcam app. MJPEG stream → YOLOv8 object detection. Free, 1080p+.

On Hand ✓
🧭

MPU6050 IMU

6-axis accelerometer + gyroscope on wrist link. Detects cumulative servo drift. I²C address 0x68.

To Order ₹199
💻

Laptop (RTX 3060)

Main compute: ROS2, Gazebo, YOLOv8 (CUDA), MoveIt2, MediaPipe. Ubuntu 24.04 LTS.

On Hand ✓

Operating Modes
Three Modes. One Unified System.
Click each mode to see its complete data flow from sensor to servo.

Default state. The arm runs the full AI vision pipeline independently — no human input required. YOLOv8 detects objects overhead, MoveIt2 plans collision-free paths, the arm picks and sorts continuously.

📱 Phone CameraIP Webcam MJPEG
YOLOv8CUDA RTX 3060
Object Class+ 2D bbox
HomographyPixel → 3D XYZ
MoveIt2 IKCollision-free path
PCA9685PWM to servos
Cycle: Detect → Localise → Plan → Pick → Place in bin → Return to home → Repeat
Inference: ~15–20 ms per frame at 1080p (YOLOv8n on CUDA)
ROS2 Topics: /camera/image_raw/detected_objects/target_pose/arm_controller/joint_trajectory

Override triggered when MediaPipe detects a hand in the laptop webcam. Autonomous mode pauses immediately. The operator's 21 hand landmarks are mapped to 6 joint angles in real-time.

💻 WebcamLaptop built-in
MediaPipe21 landmarks
Landmark Map→ 6 joint angles
EMA FilterJitter removal
MoveIt2Safety clamping
PCA9685PWM to servos
Landmark mapping:
Wrist position → Base rotation (joint_1)
Palm orientation → Shoulder + elbow (joints 2, 3)
Finger curl angles → Wrist + end joints (joints 4, 5)
Thumb–index gap → Gripper open/close (joint_6)
Latency: ~30–50 ms (CPU-only MediaPipe, no GPU needed)

Global override. Any person anywhere in the world opens the ngrok URL in their phone browser. MediaPipe.js runs in-browser — no app install. Joint angles publish via MQTT to HiveMQ cloud, bridged into ROS2.

🌍 Remote PhoneBrowser only
MediaPipe.jsIn-browser
MQTT JSONHiveMQ cloud
mqtt_bridgeROS2 node
MoveIt2IK + limits
PCA9685PWM to servos
End-to-End Latency Breakdown
MediaPipe.js (browser)
~30 ms
MQTT internet transit
~80 ms
ROS2 bridge + MoveIt2
~60 ms
Servo movement
~120 ms
Total: ~290 ms — fully usable for teleoperation

Camera Architecture
Three Cameras. Three Different Jobs.
Each camera is dedicated to a single purpose. There is no overlap or conflict.
📱

Camera 1 — User's Phone

Fixed overhead, mounted on a stand above the workspace. Streams MJPEG via IP Webcam app on local Wi-Fi. Always on during autonomous mode.

IP Webcam MJPEG
→ cv2.VideoCapture("http://...")
→ /camera/image_raw
→ YOLOv8 inference (CUDA)
→ /detected_objects
FOR: Autonomous Sorting
💻

Camera 2 — Laptop Webcam

Built-in webcam, always active. MediaPipe monitors continuously for a hand appearing in frame — the presence of a hand triggers LOCAL mode.

usb_cam ROS2 node
→ /gesture_camera/image_raw
→ MediaPipe HandLandmarker
→ 21 landmarks (x, y, z)
→ /gesture_node → /control_mode
FOR: Local Gesture Control
🌍

Camera 3 — Any Remote Phone

Remote operators use their own phone. Browser opens ngrok URL. No app install. MediaPipe.js runs in the browser tab itself.

Phone front camera
→ MediaPipe.js (browser)
→ JSON {j1…j6} via MQTT
→ HiveMQ cloud broker
→ mqtt_bridge_node.py → ROS2
FOR: Remote MQTT Control
Why NOT OV7670? — Raw parallel bus (10+ GPIO lines), no JPEG compression (900KB/frame), VGA max resolution, no ROS2 driver, notoriously unstable wiring. A bare sensor module is architecturally incompatible with a ROS2 + YOLOv8 pipeline.
Why NOT ESP32-CAM? — 640×480 max, requires custom firmware flash, worse image quality than phone, costs ₹400 vs ₹0 for the phone you already own.

ROS2 Architecture
Node Graph — Who Talks to Whom
The Mode Manager is the single authority. Every control path passes through it.
/camera_node Phone MJPEG overhead /vision_node YOLOv8 + OpenCV Pixel → 3D XYZ /sorting_planner Object → bin → target XYZ pose message /gesture_node MediaPipe webcam /mqtt_bridge HiveMQ → ROS2 /mode_manager AUTO / LOCAL / REMOTE Single control authority /motion_planner MoveIt2 IK + OMPL /arm_controller JointTrajectoryCtrl /esp8266_bridge Serial → ESP8266 PCA9685 + Servos Physical hardware LEGEND Autonomous pipeline Local gesture control Remote MQTT control Mode manager → execution Physical hardware

Hardware Wiring
Power + Signal Architecture
Two completely separate power domains. One shared I²C bus. Common ground is mandatory.
5V / 10A PSU Servo Power V+ (red) GND (blk) DC Jack Female screw terminal PCA9685 16-channel servo driver V+ GND VCC SCL SDA CH0–CH5 PWM 50Hz 500–2500µs 20-22 AWG ESP8266 NodeMCU 3.3V GND D1(SCL) D2(SDA) USB flash to laptop logic only I²C MPU6050 I²C addr 0x68 — wrist link shares I²C 6× MG996R PWM from CH0–CH5 PWM lines ⚡ SERVO POWER DOMAIN (5V/10A) 🔌 LOGIC DOMAIN (3.3V ESP8266)
FromToSignalWireNote
PSU V+ (barrel)PCA9685 V+ 5V servo power 20–22 AWG redThrough DC jack screw terminal
PSU GNDPCA9685 GND Power return 20–22 AWG blackThrough DC jack screw terminal
ESP8266 3.3VPCA9685 VCC Logic power only Any thin wireNOT servo power — IC logic only
ESP8266 GNDPCA9685 GND Common ground Any thin wireMANDATORY — shared reference
ESP8266 D1 (GPIO5)PCA9685 SCL I²C clock 24–26 AWGPull-ups already on PCA9685 board
ESP8266 D2 (GPIO4)PCA9685 SDA I²C data 24–26 AWGPull-ups already on PCA9685 board
MPU6050 SCL/SDASame I²C bus I²C (addr 0x68) 24–26 AWGNo address conflict with PCA9685

Development Progress
Phase-by-Phase Status
Simulation is complete. MoveIt2 is the immediate next step.

Phase 0 — URDF & Simulation Environment

40-link CAD → clean 12-link URDF. 9 Gazebo Harmonic bugs fixed. RViz2 TF tree — fully working. joint_state_relay.py written.

COMPLETE

Pre-Phase — MediaPipe Hand Detection

MediaPipe installed and verified. 21 landmarks extracted from webcam at 30+ FPS. Landmark→joint mapping not yet written.

VERIFIED

Phase 1 — MoveIt2 Setup (NEXT)

Install MoveIt2. Run Setup Assistant on robot_arm_description package. Configure planning group 'arm', end-effector 'gripper', SRDF, kinematics solver.

UP NEXT
🔄

Phase 2 — Gesture → Arm Live Control

Complete landmark-to-joint-angle mapping. Publish JointTrajectory → /arm_controller. Live arm mirroring in Gazebo simulation.

IN PROGRESS

Phase 3 — YOLOv8 Autonomous Sorting

IP Webcam stream → YOLOv8 inference → pixel-to-world coordinate → sorting planner → pick-sort-return cycle in Gazebo.

PENDING

Phase 4 — MQTT Remote Control

Flask web app served via ngrok. MediaPipe.js in browser. MQTT pub/sub to HiveMQ. mqtt_bridge_node.py decodes JSON → JointTrajectory.

PENDING

Phase 5 — Hardware Deployment

Flash ESP8266 firmware. Wire PCA9685. Implement ros2_control hardware interface. MPU6050 drift correction. Real servo validation.

PENDING

Safety & Reliability
Eight Independent Safety Layers
Each layer operates independently — failure of one does not compromise the others.

Remote Timeout

No MQTT command for 3 seconds → arm returns to home pose → AUTO mode resumes automatically.

📐

Angle Clamping

All joint angles hard-clamped to URDF limits. A phone cannot send a value that would physically damage the arm.

〰️

EMA Jitter Filter

Exponential moving average on all gesture inputs. Eliminates hand tremor and network jitter from becoming servo oscillation.

Trajectory Duration Floor

150 ms minimum time per joint command. Prevents instantaneous angular jumps that would strip gears or rip brackets.

🗑

QoS 0 — No Queue

Old MQTT commands are dropped immediately. No stale command backlog that could cause sudden unexpected movements.

🔄

Safe Mode Switching

Mode manager waits for the active motion to complete before switching modes. No mid-trajectory context changes.

🏠

Home Position Recovery

Any error, disconnect, or timeout triggers a return to a predefined safe home pose before resuming.

📏

URDF Joint Limits

Physical joint limits enforced at URDF level. MoveIt2 OMPL planner respects these limits in all trajectory planning.