A one-line fix for a 1,500-user bug
The best bug fixes are the boring ones: a single line, reviewed in a minute, that quietly un-breaks something for a lot of people. This is the story of one of those — a missing ContinueAsNewAsync() that left around 1,500 users stuck in the wrong referral tier.
The symptom
Support flagged that some users weren't being moved between referral tiers when they should have been. Not all of them — a subset. The kind of "some but not all" report that immediately tells you the cause has a boundary somewhere: a date, a flag, a code path that only some records pass through.
The shape of it
Tier transitions were handled by a recurring background job — SyncReferralCampaignsJob — that processed users in batches. The job looked healthy. It ran, it logged, it finished. But it was processing the first batch and then stopping, instead of continuing through the rest.
The boundary turned out to be a date. The enqueue logic had been added on 11 May 2026, so anyone downgraded before that date was never affected — which is exactly why it looked like "some users."
The cause
The job was written as a continue-as-new style workflow: process a slice of work, then re-enqueue itself to pick up the next slice. The re-enqueue call was missing. So it did exactly one pass and considered itself done.
// before — runs once, never advances
await ProcessBatchAsync(batch, ct);
// ...and then nothing. The workflow completes.
// after — hand off to a fresh run with the next cursor
await ProcessBatchAsync(batch, ct);
await ContinueAsNewAsync(nextCursor);
One line. The whole fix was telling the job to keep going.
Why it's worth writing down
The fix was trivial. Finding it wasn't — and that's the actual lesson. A job that "completes successfully" is not the same as a job that did all its work. Green logs can hide a workflow that quietly stopped early. The signal I was missing wasn't an error; it was the absence of the batches that should have followed.
Two things I took away from it:
- Instrument progress, not just success. "Processed N of M" beats "done."
- When a bug hits "some but not all," find the boundary first. The date, flag, or version that splits the two groups usually points straight at the change that introduced it.