HeadlinesBriefing favicon HeadlinesBriefing.com

Sim-to-Real Drone Control with Reinforcement Learning

Hacker News •
×

8 motors are commanded directly at 50 Hz via serial link, bypassing traditional PID loops to grant the RL policy full thrust reallocation authority during failures. This architecture is critical for handling the 90° same-type failure, where dual motor loss creates both yaw torque imbalance and spatial asymmetry. The system must balance 2:1 thrust ratios across surviving motors, achieving a 5,572 gf yaw-balanced ceiling—enough to sustain a 2.8 kg drone (vs. its 1 kg weight).

The simulation uses MuJoCo for its CPU efficiency, enabling 128 parallel environments on a Mac. Key challenges addressed include 15–30 ms loop latency (randomized aggressively to mimic real-world delays) and motor lag (20–50 ms response time). The model relies on empirical data: mass, inertia tensor, thrust curves, and hover throttle points. Randomized variables like mass (±10%), thrust constants (±15%), and sensor noise ensure robustness. PPO is chosen over SAC due to sim-step costlessness, with asymmetric actor-critic allowing the critic to know motor failures during training—data the actor won’t receive in reality.

Deployment targets an RPi 4 running ONNX-exported policies, with 45k parameters enabling sub-millisecond inference. The 50 Hz loop reads attitude data, runs inference, and writes motor commands. The experiment involves live testing: flying the drone, triggering motor failures via transmitter, and assessing survival rates. This approach mirrors Mueller & D'Andrea’s quad research but scales to octocopters, prioritizing survival over precise control. The 90° failure case remains the hardest, requiring the policy to sacrifice yaw for stability, spinning slowly while maintaining level flight.