Rate limit algorithms

14 Apr 2024

A rate limiter controls the rate at which users or services can access resources. Here are the most common rate limit algorithms:

Token bucket

Bucket contains n tokens.
When a request arrives, remove 1 token and process the request.
If the bucket is empty, drop the request.
At fixed intervals refill the bucket.

Leaky bucket

Acts like a bucket with a hole in the bottom so water drips out at a fixed rate. Requests can come at different rates but the server processes them at a constant rate like a drip:

When a request arrives, put the request in the bucket.
At a fixed interval, process a request from the bucket.

Fixed window

Simple to implement but allows bursts of traffic if a bunch of requests are made near the window expiration threshold:

On the first request, store the time stamp and the count (1).
On subsequent requests, check the time stamp: if the time window has expired, store the new time stamp and reset the count to (1).
If the window hasn’t expired, increment and store the count.

Sliding window

Ensures the rate limit is strictly followed at every point in time (no traffic bursts like the fixed window algorithm.) The algo stores the time stamps of processed requests in a list (uses much more memory and CPU than other algorithms). Assuming the window is 1 minute wide:

When a request arrives, loop through the list and count the number of requests processed within the last minute.
If the number of requests in the last minute < LIMIT, add the request to the list and process it.