Memoryless Scheduling Should Be Your Default
When someone says “run this hourly,” what do they actually mean?
Most developers reach for 0 * * * * in cron, or time.Tick(time.Hour) in Go, or setInterval(fn, 3600000) in JavaScript. These all fire at fixed intervals. But fixed intervals are rarely what you actually want. They’re just what the tools make easy.
The better default is memoryless scheduling: intervals drawn from an exponential distribution. Your task still runs about once per hour on average, but the exact timing is random. This post argues that memoryless should be your default, and fixed intervals should require justification.
What’s wrong with fixed intervals?
Three things:
1. Accidental synchronization. Two systems that both run “every hour” will eventually sync up and hit shared resources simultaneously. This causes thundering herds, lock contention, and correlated failures. The more systems you have, the worse this gets. People try to work around this with hacks like “run this every minute at the 15 second mark” and “run that at the 24 second mark.” But all it takes is one “run this every 17 seconds” and your perfect arrangement is lost.
2. Correlation with periodic behavior. Many systems have periodic patterns—daily traffic spikes, hourly batch jobs, garbage collection cycles. Fixed-interval sampling can systematically miss or over-represent states that correlate with your interval. Your “hourly” health check might always run during a quiet period and miss every traffic spike.
3. Biased sampling. There’s a theorem in queueing theory called PASTA: Poisson Arrivals See Time Averages. It says that if you sample a system at random (Poisson-distributed) times, your samples are an unbiased representation of the system’s time-averaged state. Fixed-interval sampling doesn’t have this property.
What is memoryless scheduling?
What we want is a way to guarantee that events happen at an average of once per hour, but where you cannot predict exactly when the next event will happen even if you know when all the previous events happened. This is called the “memoryless” property, and there’s only one statistical process that has it: the Poisson process.
A Poisson process generates events where the inter-arrival times follow an exponential distribution. In Python, it’s one line:
import random
import time
# Sleep for a random duration averaging 1 hour
time.sleep(random.expovariate(1.0 / 3600.0))
The expovariate function returns exponentially-distributed random numbers. The parameter is the rate (events per second), so 1.0 / 3600.0 gives you an average of one event per hour.
In Go, the equivalent is:
import (
"math/rand"
"time"
)
// Sleep for a random duration averaging 1 hour
duration := time.Duration(rand.ExpFloat64() * float64(time.Hour))
time.Sleep(duration)
That’s it. That’s the core of memoryless scheduling.
A production-ready implementation
The one-liner works, but production code needs bounds. True exponential distributions are unbounded—you might get an interval of 10 hours or 10 milliseconds. Operationally, you often want guarantees.
Here’s the implementation we use at Triple Pat:
package memoryless
import (
"context"
"math/rand"
"time"
)
type Ticker struct {
Expected time.Duration
Min time.Duration
Max time.Duration
}
func (t Ticker) randomWaitTime() time.Duration {
var wt time.Duration = -1
// Resample until within bounds
for wt < t.Min || (t.Max != 0 && wt > t.Max) {
wt = time.Duration(rand.ExpFloat64() * float64(t.Expected))
}
return wt
}
func (t Ticker) Tick(ctx context.Context) <-chan time.Time {
c := make(chan time.Time) // Unbuffered is important
go func() {
defer close(c)
for ctx.Err() == nil {
duration := t.randomWaitTime()
select {
case <-time.After(duration):
select {
case c <- time.Now():
default: // Don't block if receiver is busy
}
case <-ctx.Done():
return
}
}
}()
return c
}
A few design choices worth noting:
Resampling vs. clamping. When we get a value outside bounds, we resample rather than clamp. Clamping (forcing out-of-bounds values to Min or Max) introduces spikes at the boundaries. Resampling preserves the distribution shape within the allowed range.
Unbuffered channel. If the receiver is busy when a tick arrives, we drop it rather than queueing. This prevents ticks from “bunching up” if the receiver falls behind.
Min and Max are optional. Set Max to 0 to disable the upper bound. This lets you use the same code for bounded and unbounded cases.
When do you actually need fixed intervals?
Fixed intervals aren’t always wrong. Here are legitimate reasons to use them:
External coordination. If you need to sync with an external system that expects requests at specific times, you need fixed intervals.
Human expectations. Daily reports that arrive “around 9am, give or take a few hours” will confuse people. Some things genuinely need to happen at predictable times.
Rate limiting. If you’re allowed exactly N requests per hour by an external API, you might need fixed intervals to stay within limits.
Debugging. Fixed intervals are easier to reason about when tracking down timing issues.
But notice these are all about external constraints. For internal operations—garbage collection, cache invalidation, health checks, metric generation, background sync—memoryless is almost always better.
Real-world applications
I’ve used this pattern for years across different projects:
Internet speed tests. When measuring broadband quality, you want samples that represent actual usage patterns. Fixed hourly tests might always miss peak congestion. Memoryless sampling gives unbiased measurements. (See mlab-test-runner for an early implementation.)
Synthetic alert generation. At Triple Pat, we generate fake alerts on a memoryless schedule to test that our alerting pipeline works. Because Prometheus scrapes at fixed intervals, making our signal generation memoryless ensures the scraper sees our test alerts with unbiased probability.
Mirror synchronization. Our distributed check-in service has multiple servers that need to sync with each other. Memoryless sync intervals prevent them from all trying to sync simultaneously.
Database cleanup. Expired records need to be deleted, but it doesn’t matter exactly when. Memoryless scheduling spreads the load and avoids coordinated spikes.
The industry is getting this wrong
Every major observability tool—Prometheus, OpenTelemetry, Grafana Alloy—scrapes metrics at fixed intervals. This means the entire industry is collecting potentially biased samples by default.
If your system has any periodic behavior that correlates with your scrape interval, your metrics may be systematically misleading. A service that’s slow for 10 seconds every minute might look perfectly healthy or constantly broken, depending on when your collector happened to start.
The PASTA theorem tells us this is fixable: use Poisson-distributed scrape intervals. But none of the major collectors support this. It’s a gap in the ecosystem.
Summary
Memoryless scheduling is the right default for periodic tasks. Fixed intervals should require justification.
Use memoryless when:
- The task is internal (no external coordination needed)
- The timing doesn’t need to be predictable to humans
- You want unbiased sampling of system state
- You have multiple instances that might accidentally synchronize
Use fixed intervals when:
- External systems expect requests at specific times
- Humans need predictable schedules
- You’re working around rate limits
- You’re debugging timing issues
The implementation is simple—a few lines of code. The hard part is remembering to use it instead of reaching for cron or time.Tick out of habit.
Use it!
The full implementation is available as a gist. A similar implementation is also available in the M-Lab Go library. Both are designed to be drop-in replacements for time.Tick().
All code in this blog post is CC0, which means you can freely use it however you want.
Happy (randomly-timed) coding!