Week 11

Machine Week

The BRAINROT 9000

The HTMAA 2025 EECS Machine takes its inspiration from two terrors of tech in pop culture: the evil authoritarian Dalek of the Doctor Who universe and the ubiquitous Brain Rot of the contemporary social media era.

The BRAINROT 9000 is a Dalek-shaped autonomous robot that identifies human targets and drives towards them, automatically scrolling and passively amplifying overstimulating but vacuous "brain rot" social media content.

BRAINROT 9000 in action - autonomous target following with auto-scrolling content

My Contributions

As part of the EECS team, I focused on the vision-guided navigation system:

  • Flask server development: HTTP server receiving camera images from ESP32
  • YOLO integration: YOLO11n object detection for person tracking
  • Threshold control logic: Direction-based motor control using velocity differences
  • Velocity calculations: Worked with TA Quentin on differential drive kinematics

Other team members contributed the mechanical chassis design, Dalek exterior fabrication, auto-scroller mechanism with conductive PLA arm, and power distribution system. TA Quentin developed the ESP32 firmware for camera streaming and motor control.

View complete project on GitLab →



Project Overview

For Machine Week, I developed an autonomous vision-guided robot system that combines computer vision, embedded systems, and differential drive robotics. The machine uses a YOLO (You Only Look Once) object detection model running on a Flask server to process real-time camera streams from an ESP32, compute optimal navigation commands, and control stepper motors for autonomous object tracking and following.

Complete robot system overview
Complete autonomous robot system with ESP32 camera and motor control

My Individual Contributions

My primary contributions to this machine week project:

  • Flask Server: Designed and implemented the Flask web server (hello.py) that receives camera images, runs YOLO detection, and sends velocity commands back to the ESP32
  • YOLO Integration: Integrated YOLO11n model for real-time object detection with visualization
  • Direction-Based Control Logic: Implemented the threshold-based control in the ESP32 firmware that converts velocity differences into motor commands
  • Velocity-to-Steps Calculation: Worked with TA Quentin to calculate the conversion from velocity (m/s) to stepper motor steps to match Flask server output


System Architecture

The system consists of three main components working together in a closed-loop control architecture:

1. Vision Processing (Flask Server)

The Flask server runs on a host computer and serves as the computational brain of the robot. It implements several key endpoints:

  • /_data - Main JSON endpoint that accepts image uploads (multipart or raw JPEG) and returns detection results with ground coordinates and drive commands
  • /detect_image - Debug endpoint that returns annotated images with bounding boxes for visualization
  • /esp32_upload - Specialized endpoint for ESP32 camera stream processing
  • /manual - Manual control interface for testing motor responses
Flask server architecture diagram
Flask server architecture showing data flow from camera to motor commands

2. ESP32 Camera System

The ESP32-CAM module captures video frames and streams them to the Flask server over WiFi. The ESP32 runs custom Arduino code (stream.ino) that handles:

  • Camera initialization and configuration (resolution, JPEG quality)
  • HTTP POST requests to send JPEG frames to the Flask server
  • Receiving motor command responses
  • Serial communication with stepper motor drivers
ESP32-CAM module mounted on robot
ESP32-CAM providing real-time vision input

3. Motor Control System

The robot uses differential drive with two stepper motors controlled via the ESP32. Initially, the motors were connected to a Teensy microcontroller using a bit-bang protocol, but I migrated the system to use the ESP32 as the primary microcontroller for better integration and reduced component count.



YOLO Object Detection

Model Selection & Configuration

I chose YOLO11n (nano variant) for this application due to its excellent balance between detection accuracy and inference speed. The model runs at approximately 30-60 FPS on the host computer, enabling real-time robot control. The detector is implemented in model/detector.py with the following features:

  • Confidence threshold of 0.5 to filter false positives
  • Support for multiple object classes (person, bottle, cell phone, etc.)
  • Bounding box visualization with labels and confidence scores
  • Batch processing capability for efficiency

Detection Pipeline

The detection pipeline follows these steps:

  1. Receive JPEG image from ESP32 camera (typically 640×480 or 800×600)
  2. Decode JPEG to PIL Image format
  3. Run YOLO inference to detect objects and generate bounding boxes
  4. Extract bottom-center pixel coordinates of each detection
  5. Transform pixel coordinates to ground-plane coordinates using homography
  6. Compute drive commands based on object positions
  7. Return commands to ESP32 for motor execution
YOLO detection output with bounding boxes
YOLO detection showing tracked object with bounding box and ground projection

Direction-Based Control (Demo Implementation)

For the machine week demonstration, I implemented a simplified direction-based control system instead of full homography calibration. This approach uses the velocity difference between left and right wheels (computed by the Flask server from detection positions) to determine the robot's turning behavior.

The ESP32 firmware uses simple threshold logic to convert velocity differences into discrete motor commands:

if (v_left - v_right >= 0.15)  {
    s1 = 1;   // turn right
    s2 = 1;
}

if (v_left - v_right <= -0.15) {
    s1 = -1;  // turn left
    s2 = -1;
}

if (abs(v_left - v_right) < 0.15) {
    s1 = -1;  // go straight
    s2 = -1;
}

if (v_left == 0.0 && v_right == 0.0) {
    s1 = 0;   // stop
    s2 = 0;
}

This simplified approach proved effective for the demonstration, allowing the robot to track and follow detected objects without requiring precise distance measurements. The threshold of 0.15 m/s difference was experimentally tuned to provide smooth turning behavior.

Note: While the Flask server includes homography calibration code for potential future use with accurate spatial positioning, the demo implementation uses this simpler velocity-difference approach in the ESP32 firmware for more robust real-time performance.

Robot testing different threshold values
Threshold tuning and motor response testing


Motor Control & Kinematics

Differential Drive Kinematics

The robot uses differential drive, where independent control of left and right wheel velocities enables turning and forward motion. The key parameters:

  • Wheel base: 0.45 meters (45 cm) - distance between left and right wheels
  • Wheel radius: 0.0485 meters (4.85 cm) - half of wheel diameter
  • Wheel circumference: 0.1 meters - used for velocity-to-steps conversion

Control Algorithm

The Flask server computes differential wheel velocities based on object detection positions. I implemented a threshold-based controller in the ESP32 firmware that interprets these velocity commands:

  1. Server detects object and computes v_left and v_right velocities
  2. ESP32 receives velocity commands via JSON response
  3. Firmware calculates velocity difference: Δv = v_left - v_right
  4. Decision logic based on thresholds:
    • If Δv ≥ 0.15: Turn right
    • If Δv ≤ -0.15: Turn left
    • If |Δv| < 0.15: Go straight
    • If both velocities zero: Stop

This simplified approach proved more reliable for the demo than the initial precise velocity-to-steps conversion I worked on with TA Quentin.

ESP32 Firmware (with TA Quentin)

TA Quentin developed the ESP32 firmware that handles camera streaming and motor control. I contributed the threshold-based control logic and worked with him on calculating the velocity-to-steps conversion:

  • Conversion formula: steps_per_meter = ratio_gear × microsteps / (π × diameter)
  • 2 Mbaud serial to stepper drivers
  • 1kHz timer for position updates
Motor driver and ESP32 wiring
ESP32 connected to stepper motor drivers

Manual Control Interface

I implemented a web-based manual control interface in the Flask server at /manual for testing motor responses:

  • Forward: Both wheels same speed forward
  • Backward: Both wheels same speed backward
  • CW (Clockwise): Rotate in place clockwise
  • CCW (Counter-Clockwise): Rotate in place counter-clockwise
Manual control web interface
Web-based manual control interface for testing


Communication Protocol

ESP32 to Flask Server

The ESP32 sends images to the Flask server via HTTP POST requests. The image data is sent as multipart/form-data or raw JPEG bytes. Response format (JSON):

{
  "status": "ok",
  "detections": [
    {
      "v_left": 0.12,
      "v_right": 0.08,
      "distance_m": 1.3,
      "stop": false
    }
  ]
}

Flask Server to ESP32 Motors

Motor commands are sent from the Flask server to the ESP32 via serial communication. The protocol is simple CSV format:

v_left,v_right\n

Example: 0.15,0.12\n commands left wheel at 0.15 m/s and right wheel at 0.12 m/s.



Demonstration Videos

Robot Following Object

This video demonstrates the robot autonomously following a detected person. The YOLO model detects the person, calculates their ground-plane position, and the robot adjusts its velocity to follow while maintaining a safe distance.

Robot autonomously following detected person

Webcam Testing - Face Detection & Motor Control

Initial testing was performed using a webcam instead of the ESP32 camera. This video shows the detection system identifying faces and computing appropriate motor commands. The annotated video displays bounding boxes and the calculated drive velocities in real-time.

Webcam testing showing face detection and motor command generation

Manual Control Testing

Before implementing autonomous control, I tested the motor system using the manual control interface. This video shows the robot responding to forward, backward, and rotation commands.

Manual control testing of motor system

Homography Calibration Process

Documentation of the calibration process showing the placement of markers and the resulting transformation accuracy.

Calibration markers on ground with camera view
Homography calibration setup and verification


Technical Challenges & Solutions

Challenge 1: WiFi Latency

Problem: Initial tests showed 200-400ms latency between image capture and motor command execution, causing jerky motion and overshooting.

Solution: Optimized the pipeline by:

  • Reducing JPEG quality from 80% to 60% (smaller file size, faster transfer)
  • Lowering camera resolution from 800×600 to 640×480
  • Implementing frame skipping: process every 3rd frame instead of every frame
  • Using connection pooling in the ESP32 HTTP client

Result: Reduced latency to 80-120ms, acceptable for tracking slow-moving objects.

Challenge 2: Homography Accuracy

Problem: Initial homography calibration showed errors of 10-15cm at distances beyond 1 meter due to lens distortion and calibration point placement.

Solution: Improved calibration by:

  • Increasing number of calibration points from 4 to 5
  • Spreading points across the full camera field of view
  • Using high-contrast markers for precise pixel identification
  • Validating with test points at multiple distances

Result: Reduced error to 3-5cm at 1 meter distance.

Challenge 3: Motor Stalling on Sharp Turns

Problem: Motors would stall when the pure-pursuit controller commanded very sharp turns (small turn radius).

Solution: Implemented acceleration limiting:

  • Maximum velocity change per update limited to 0.05 m/s
  • Minimum turn radius clamped to 0.3m (prevents in-place rotation at high speed)
  • Added velocity ramping in ESP32 firmware

Result: Smooth turns without stalling, though sacrificing some agility.

Challenge 4: ESP32 Migration

Problem: Migrating from Teensy to ESP32 required rewriting motor control code and dealing with ESP32's limited hardware timers.

Solution:

  • Used ESP32's hardware timer API for pulse generation
  • Implemented software timer fallback for second motor
  • Carefully tuned interrupt priorities to avoid WiFi/serial conflicts
  • Added mutex protection for shared variables


Code Structure

The project consists of Python Flask server code and Arduino C++ firmware:

Python Flask Server (my work):

  • hello.py - Flask application with YOLO detection and motor command endpoints
  • model/detector.py - YOLO detection wrapper
  • drive_controller.py - Drive calculations (includes homography code for future use)

ESP32 Firmware (TA Quentin, with my threshold logic):

  • connected_motor_control.ino - Camera streaming and motor control with threshold-based decisions

📥 Download Source Code

The complete source code for the YOLO-Flask robot system:

Download Complete Project (ZIP)

Individual files:



Group Project Integration

This vision-guided robot was part of a larger machine week project involving multiple team members. Other contributions included:

  • Mechanical chassis design and fabrication
  • Power distribution system with voltage regulation
  • Integration of additional sensors (ultrasonic, IMU)
  • Collaborative debugging and system testing

My focus was on the software stack (vision processing, control algorithms, and ESP32 firmware), which enabled the autonomous tracking capabilities demonstrated in the videos above.

Machine week team working on robot
Team collaboration during machine week


Future Improvements

Potential enhancements for future iterations:

  • Multi-object tracking: Track multiple objects simultaneously and prioritize targets
  • Obstacle avoidance: Integrate ultrasonic sensors for collision prevention
  • Path planning: Implement A* or RRT for navigation around obstacles
  • SLAM integration: Build a map of the environment for better navigation
  • Edge deployment: Port YOLO to run directly on ESP32 using TensorFlow Lite
  • Battery monitoring: Add voltage sensing and low-battery warnings
  • Speed optimization: Reduce latency further with UDP instead of HTTP


Lessons Learned

  • System integration is often more challenging than individual components
  • Latency is critical for real-time robotics - every millisecond counts
  • Proper calibration (homography) is essential for accurate spatial reasoning
  • Hardware timer management on microcontrollers requires careful planning
  • WiFi can introduce unpredictable delays - wired connections or lower-level protocols are better for hard real-time control
  • Testing individual components (manual control, webcam testing) before full integration saves debugging time