Why most enterprise AI initiatives die after the pilot, and what I’ve been learning about making them stick.
Where this is coming from
For the last year I’ve been doing two things at once. Running strategy, planning, and operations work for ScaleCadence clients and on engagements. And rebuilding that work with AI: forecast review, board prep, variance commentary, renewal-risk scoring, the schema diagnostic step that has to happen before any analysis gets near a CEO.
Some experiments stuck. Most didn’t. The pattern in what stuck taught me more about AI’s organizational potential than any vendor briefing. This post is what I’ve learned about making AI transformation last, not just launch.
Why AI matters, narrowly
The wide claim is that AI replaces strategy, finance, and operations work. I don’t believe it, and most people doing this work seriously don’t either. The dynamics are wrong. Operator decisions are messy, context-heavy, political. None of that is what AI is closing in on.
The narrower claim is the interesting one. Operators have always had three constraints: time, fluency in adjacent domains (SQL, contracts, customer research), and the ability to interrogate data at scale. Modern AI relaxes all three at once. Not perfectly. But materially. A planning analyst with Claude open as a working tool writes SQL she couldn’t have written, drafts commentary in a third the time, and asks twice as many questions per quarter. That’s not a 10% productivity story. It redefines what a competent analyst’s output looks like.
The cliché “AI won’t replace you, but a person using AI will” under-states the bar. The new baseline isn’t “uses AI.” The new baseline is “uses AI deliberately, with judgment, on workflows that survive the year.” That last part is where the difficulty lives.
Why most AI initiatives don’t survive Year 2
The empirical picture as of mid-2026:
- About 80% of enterprise applications shipped or updated this year embed at least one AI agent.
- Only 31% of enterprises have at least one AI agent running in production.
- Among those that do, 99% report operational efficiency gains.
- But only 14% of CFOs report measurable ROI.
The gap between “efficiency reported” and “ROI measured” is where most AI initiatives quietly die. They don’t get killed in a budget meeting. They drift. The pilot owner moves roles. A new model version ships and nobody re-evaluates. The integration breaks after an upstream schema change. Nine months later, the workflow still exists but nobody trusts the output, so nobody uses it, so the project closes.
Three patterns, in roughly the order they show up:
The pilot was designed for the demo, not the workflow. Vendor demos run on idealized data. Production data needs a cleanup step nobody scoped, a failure-mode review for plausibly-wrong outputs, and a human review that fits the existing operating cadence rather than adding a new ritual. Pilots that don’t budget for these die the moment the demo wears off.
No named owner, no eval discipline. In 2024, 11% of enterprises had a dedicated “AI agent owner.” In 2026 it’s 56%. The companies whose workflows survived Year 2 are almost entirely in that 56%. Without someone whose job is the workflow’s performance, quality drifts and nobody notices. The eval, if it ever existed, was a one-time scoring exercise during the pilot, not a discipline.
Data preparation was treated as a one-time fix, not as ongoing work. The pilot included a manual data cleanup by a smart analyst. Cleanup never got operationalized. Three months later the upstream system adds a field, retires another, or changes a definition. The model still runs. The output still gets read. But the inputs are subtly different from what the prompts were designed for. Nobody notices until a CFO asks “why is this number off.”
What sustainability looks like
What I’ve seen survive, both in my own work and in deployments I’ve had a hand in:
Embed before you scale. Pick one workflow. Run it end-to-end. Iterate weekly. Don’t start an “AI program;” start an AI workflow. Most of what determines durability isn’t the model or the prompt, it’s the operational rhythm. You only learn the rhythm by running one workflow long enough to debug it under real conditions.
Build the eval harness in week 2, not month 6. The single highest-leverage move that’s also the most skipped. Build it as a Google Sheet if you have to: 30 to 50 representative inputs, the expected output (or a rubric for evaluating it), a column for the most recent score. Re-run weekly. Any model swap, prompt change, or upstream data change re-runs the full set before going live. Without evals, you don’t have a workflow. You have a hope.
Own the orchestration layer. Most agent failures originate there, not in the model. Owning the orchestration (even as a thin Python wrapper around direct API calls) means you can debug the failure modes you actually have. Same logic applies to vendor strategy: pick one model vendor, one disclosure, one set of failure modes to learn deeply. I’ve picked Anthropic for production. The point isn’t which vendor; the point is pick one deliberately and explain why.
Budget cost economics honestly. Uber’s CTO disclosed that the company spent its entire 2026 AI budget in four months, with engineers averaging $500 to $2,000 per month in API costs. AI is cheap per call and expensive per quarter. Budget for usage growth at the start, not as a mid-year surprise. Allocate cost back to the workflow owner. Workflows whose value is real but whose cost is invisible eventually get cut by someone who only sees the cost.
Design for the human-in-the-loop, not around it. Most vendor pitches want to remove the human. Most surviving deployments do the opposite: make the human review step fast, structured, and trusted. A clear interface for output, inputs, and the model’s reasoning. A short set of accept / edit / reject options. A log of every override, used to update the prompt or data prep step. The human-in-the-loop isn’t friction. It’s the mechanism by which the workflow earns trust, and trust is what lets it still be in use a year later.
The frame
The mistake in calling this “AI transformation” is that it sounds like a one-time event. It isn’t. Sustainable AI in StratOps work looks more like a new operating discipline than a project. Closer to how a serious FP&A team treats forecast accuracy than how an IT team rolls out a platform.
Named owners. Recurring evals. Cost visibility. Data hygiene as ongoing maintenance. Vendor strategy chosen, not drifted into. None of it glamorous. All of it what determines whether the AI is still useful next year.
