HeadlinesBriefing favicon HeadlinesBriefing.com

Throttling API Gateway with Token Bucket and Sliding Window

DEV Community •
×

A MLOps Engineer developed a two-layer throttling system for an AI platform to prevent resource abuse by malicious actors spinning up expensive EC2 P5e/P5en instances. The first layer uses AWS API Gateway with a built-in Token Bucket algorithm. This acts as a blunt, account-level guardrail, issuing 429 Too Many Requests errors when the shared token pool is exhausted.

For more precise control, the second layer implements a Sliding Window rate limiter in ElastiCache for Valkey. Unlike the simple token counter, this tracks individual request timestamps per user within a moving time frame. It uses Sorted Sets to evict old logs and count recent activity, allowing for user-specific throttling and identifying abusive customers.

The proposal faced internal resistance, but the first API Gateway layer was implemented. The author provides AWS CDK and Valkey Lua script examples, noting that while API Gateway is convenient, the sliding window offers surgical precision for those with trust issues. This approach balances cost, security, and user experience in a cloud-native environment.