Reading a cost anomaly alert like a FinOps engineer
An anomaly alert tells you spend went up. It doesn't tell you why, or whether you should care. Here's the triage process I run on every alert before escalating.
Cost Management's anomaly detection is genuinely useful — it catches spend spikes you'd otherwise discover three weeks later on an invoice. But the alert itself is just a flag: "spend on subscription X is higher than the model expected." It doesn't tell you whether that's a runaway resource, a planned scale-up, or just noise. Here's the triage I run every time one lands in my inbox.
Step 1: Is this expected?
Before anything else, check whether someone meant for spend to go up. A new environment stood up for a launch, a load test, a Black Friday scaling event — these all look identical to a cost anomaly from the outside. A two-minute Slack message ("anyone scale anything up this week?") saves an hour of investigation.
Step 2: Find the resource group, not just the subscription
Anomaly alerts are usually scoped at the subscription level, which is too coarse to act on. Drill into Cost Analysis, group by resource group, and sort by day-over-day change. In my experience, 90% of anomalies trace back to one of three things:
- A forgotten resource — a VM, App Service plan, or SQL database spun up for testing and never torn down.
- A scaling event that didn't scale back down — autoscale rules with a low threshold for scaling out and no corresponding scale-in condition.
- A usage-based service crossing a pricing tier — Azure OpenAI, Cognitive Search, or Cosmos DB suddenly processing meaningfully more volume.
Step 3: Check the meter, not just the total
Once you've found the resource group, look at which meter is driving the cost — compute hours, data egress, API calls, storage transactions. This matters because the fix is completely different depending on the answer. A spike in data egress usually means something is pulling more data across a region boundary than it should; a spike in compute hours usually means something is running that shouldn't be.
Step 4: Decide — fix, accept, or watch
Every anomaly investigation should end in one of three outcomes, written down somewhere your team can see it later:
- Fix it — deallocate the forgotten resource, correct the autoscale rule, add a budget alert at the new baseline.
- Accept it — the spend is real and intentional; update your forecast so it stops triggering alerts.
- Watch it — you're not sure yet; set a follow-up reminder for a week out, with a specific threshold that would tip you toward "fix."
The failure mode I see most often is teams investigating an anomaly, deciding it's "probably fine," and not writing that decision down — so the same alert gets re-investigated from scratch by someone else next month.
Set tighter anomaly thresholds as you mature
Cost Management's anomaly detection improves as it learns your baseline. Once you've got 60+ days of stable spend data, tighten the alert sensitivity — catching a 15% deviation early is far cheaper than catching a 60% deviation after a billing cycle closes.
Closing thought
A cost anomaly alert is a prompt to investigate, not a verdict. The teams that get real value from FinOps tooling are the ones that treat every alert as a five-minute structured triage — resource group, meter, decision, written down — rather than either ignoring it or panicking over it.