Quick Summary
- On April 23, 2026, Anthropic published a public postmortem identifying three product-layer changes that degraded Claude Code performance between March and April 2026.
- The bugs were not model weight changes — they were a configuration decision, a caching regression, and a misguided system prompt edit.
- All three fixes were shipped by April 20 in CLI version v2.1.116. Anthropic reset usage limits for all affected subscribers as compensation.
- This is a transparency win — and a case study in why AI system reliability is harder than traditional software reliability.
For several weeks in early 2026, developers and power users of Claude Code were quietly losing their minds. Something felt off. The model seemed less capable of sustained reasoning. It appeared to “forget” context mid-session. Complex coding tasks that previously sailed through now required repeated nudging. Across GitHub issues, X, and Reddit’s r/claude, users were describing it as “AI shrinkflation” — implying Anthropic was secretly dialling back capability to save money.
Anthropic wasn’t. But something was absolutely wrong — and on April 23, 2026, the company did something rare in the AI industry: it published a detailed public postmortem. No spin, no vague language. Three specific bugs, named, dated, and explained. This post breaks all of them down — technically for engineers, and in plain terms for everyone else.
What is a Postmortem?
A postmortem is a formal document that engineering teams write after something goes wrong. It describes what happened, why it happened, how it was detected, how it was fixed, and what changes are being made to prevent recurrence. In simple terms: Anthropic is saying “here’s exactly what broke, why, and how we fixed it” — publicly.
The Background: What Users Were Actually Experiencing
Between early March and late April 2026, users were reporting symptoms that didn’t fit any single obvious explanation. For non-technical users, the experience was that Claude Code seemed to have gotten “dumber” — shorter, less nuanced responses, and lost track of earlier parts of a conversation. For engineers: multi-turn agentic sessions were degrading, Claude was ignoring established context after idle periods, and reasoning quality on complex tasks had noticeably dropped.
What the postmortem reveals is that it wasn’t the model that broke. It was the scaffolding around it — three separate layers of configuration and product infrastructure that overlapped in a six-week window and compounded each other’s effects.
Bug #1: The Reasoning Effort Downgrade (March 4, 2026)
What Anthropic changed
On March 4, Anthropic changed the default reasoning effort setting in Claude Code from high to medium. Reasoning effort is a configuration parameter — a dial that controls how much internal “thinking” the model does before generating a response.
Why they made the change
The intent was reasonable: at the high setting, Claude Code’s interface could appear to “freeze” for several seconds while the model worked through complex reasoning chains. Anthropic believed most users would prefer a faster interface — and that for simpler tasks, the medium setting would be indistinguishable.
What actually happened
For simple tasks, they were right. For complex, multi-step development tasks — the kind professional developers were using Claude Code for — the difference was material. Tasks requiring sustained reasoning across multiple context windows, architectural decisions with many interdependent variables, or debugging complex systems all benefit significantly from higher reasoning. According to Anthropic’s postmortem, the change “resulted in a noticeable drop in intelligence for complex tasks.”
The plain-English version
Imagine asking a thoughtful consultant to answer your questions in half the time to reduce waiting. For simple questions, the answers are the same. For complex strategic questions, you get a shallower response — but faster, so you might not immediately notice the depth is missing.
The fix
Anthropic reverted this change on April 7, 2026 — five weeks after deployment — after user feedback indicating they preferred to default to higher intelligence and opt into lower effort for simple tasks.
Bug #2: The Caching Logic Regression (March 26, 2026 — CLI v2.1.101)
This is the most technically interesting of the three bugs, and the one that caused the most confusion among developers trying to diagnose it themselves.
What caching is in Claude’s context
Claude Code maintains a “thinking history” — a cache of reasoning context that accumulates during a session. Think of it as the model’s working memory. When working on a coding task across multiple turns, Claude uses this cache to remember what decisions were made, what approaches were tried, and the current state of the codebase. Without it, each response has to reconstruct context from scratch.
What the bug did
A caching optimisation shipped in CLI v2.1.101 on March 26 was intended to prune old thinking history from sessions inactive for more than an hour. The intent made sense. The bug was in the implementation: instead of clearing the thinking history once when a session exceeded the idle threshold, the code cleared it on every subsequent turn after the threshold was crossed. So if you stepped away for more than an hour and continued working, the cache was wiped on every single message. The model was effectively starting from scratch on every response for the rest of the session.
Why this was so hard to diagnose
This class of bug is insidious because it doesn’t produce a hard error. No crash, no error message, no 500 response code. The model continues functioning — it just becomes progressively more amnesiac. A developer could spend hours assuming their own prompts were the problem before realising the system state was broken. The postmortem notes this “bypasses standard test suites” because most quality evaluations use short, contained multi-turn demos — not long, idle-interrupted real work sessions.
The engineering lesson
Traditional software metrics — latency, error rates, uptime — will show nothing wrong. The signal is in subtle proxy metrics: token counts per turn (a spike in input tokens suggests cache is being re-derived), reasoning consistency, and response length drift. Logging cache_read_input_tokens from the API usage object on every turn is a practical way to detect this kind of regression before users complain.
// Log on every API call to detect cache invalidation spikes
logger.info("turn_completed", {
turn: turnIndex,
inputTokens: usage.input_tokens,
cacheHit: usage.cache_read_input_tokens ?? 0,
// A sustained drop in cacheHit signals a Bug #2-style regression
});
The plain-English version
Think of it like a colleague who, after any break longer than an hour, completely forgets every decision made before the break — and keeps forgetting even after you remind them. The underlying capability is unchanged; the memory access is broken.
Bug #3: The System Prompt Verbosity Restriction (April 16, 2026)
What Anthropic changed
On April 16, Anthropic added an instruction to Claude Code’s internal system prompt restricting responses between tool calls to under 25 words. The goal was verbosity control — reducing explanatory text between tool invocations, which in some workflows was producing lengthy commentary users didn’t need.
What actually happened
The constraint was too aggressive. Forcing responses to under 25 words between tool calls prevented Claude from expressing reasoning longer than a sentence. According to Anthropic’s postmortem, this caused a measurable 3% drop on one coding quality evaluation benchmark — significant for a change that was supposed to be cosmetic.
The mechanism: when a model is constrained to very short outputs, it cannot fully articulate its reasoning chain before making decisions. This affects the quality of those decisions in complex multi-step tasks. The verbosity restriction was asking the model to think less out loud — and it responded by thinking less.
The broader principle this reveals
System prompt engineering is not just about what you tell the model to do — it’s about what you implicitly prevent it from doing. Constraints on output format, length, and structure have direct downstream effects on reasoning quality. In traditional software, a length constraint on a function output has no effect on the function’s internal logic. In large language models, it does. Teams writing custom system prompts: adding brevity constraints without empirically testing them against your specific task types carries real risk.
The plain-English version
If you told a surgeon they could only give a three-sentence briefing before any procedure, some briefings would be fine. For complex surgeries, the forced brevity would mean skipping steps that matter.
The Three Bugs Side by Side
| Bug | Date | Type | Impact | Fixed |
|---|---|---|---|---|
Reasoning effort downgrade (high → medium) | March 4 | Config change | Intelligence drop on complex tasks | April 7 |
| Caching regression (v2.1.101) | March 26 | Code bug | Session amnesia after 1hr idle | April 20 (v2.1.116) |
| System prompt verbosity restriction | April 16 | Prompt change | 3% coding benchmark drop | April 20 (v2.1.116) |
What Anthropic Did About It
All three fixes were shipped by April 20, 2026, in Claude Code CLI version v2.1.116. Anthropic also reset usage limits for all affected subscribers as of April 23 — acknowledging that users had been paying for a degraded experience. This reset applied automatically. For enterprise and professional users, this sets a precedent for how Anthropic interprets its obligations when product-layer decisions cause unintended degradation.
The Transparency Question
Anthropic’s decision to publish this postmortem in detail is worth naming explicitly, because it’s not the norm. Most AI labs respond to user complaints with statements like “we’re constantly working to improve our models” — language that doesn’t confirm or deny specific problems. Publishing a dated, versioned, technically specific postmortem is a different standard.
This matters practically: when an AI lab publishes this level of detail, developers can check their own systems against the timeline, identify whether their complaints match a known bug window, and update integrations with confidence. For teams evaluating AI tools professionally: transparency of this kind is a meaningful vendor assessment criterion. A lab that documents its failures in detail is one whose claims about fixes can be verified.
Anthropic set this precedent with its September 2025 postmortem documenting three infrastructure-layer bugs: a routing error, a TPU configuration issue causing Chinese-language responses to English prompts, and an XLA compiler regression. The April 2026 postmortem continues that pattern.
What to Do Right Now
Update to v2.1.116 or later. All three bug fixes are in this release. Check with claude --version and update via your package manager.
Set reasoning effort explicitly. Rather than relying on defaults, explicitly configure reasoning effort to high for complex workflows in your Claude Code configuration. This insulates you from future default changes.
Add token-level monitoring to long sessions. Log input_tokens, output_tokens, and cache_read_input_tokens on every API turn. A sustained spike in input_tokens without a corresponding increase in cache_read_input_tokens signals cache invalidation — the signature of a Bug #2-style regression.
Audit system prompts for implicit brevity constraints. Any instruction restricting output length or format should be tested against your specific task types before deployment.
Build evaluations that match real usage patterns. Short multi-turn demos do not expose time-based or long-context regressions. If your workflow involves sessions longer than 30 minutes, your eval suite needs to match that.
Frequently Asked Questions
Did Anthropic secretly make Claude worse to save money on compute?
No. According to Anthropic’s April 23, 2026 postmortem, the degradation was caused by three specific bugs and product decisions: a reasoning effort configuration change, a caching regression, and an overly aggressive system prompt constraint. None were intentional capability reductions.
What is Claude Code CLI version v2.1.116?
v2.1.116 is the Claude Code CLI release shipped by April 20, 2026, containing all three bug fixes. Check your version with claude --version and update via your package manager if needed.
What are “thinking blocks” and why do they matter?
Thinking blocks are the internal representation of reasoning Claude has done during a session — the model’s working memory. They allow Claude to maintain coherent context across multiple turns without re-deriving earlier conclusions on every response. When Bug #2 invalidated them on every turn after idle time, Claude effectively started from zero on each message, which manifested as forgetfulness and repetition.
Were all Claude models affected, or just Claude Code?
The April 2026 postmortem specifically covers Claude Code (the agentic CLI). The reasoning effort change and caching bug were specific to Claude Code’s infrastructure and client. The September 2025 postmortem covered infrastructure-layer bugs affecting Claude’s API responses more broadly.
How do I know if my usage was affected?
If you used Claude Code between March 4 and April 20, 2026, you were potentially affected. The reasoning effort downgrade affected all users March 4–April 7. The caching bug affected sessions in CLI v2.1.101 that exceeded one hour idle time. The verbosity restriction affected all Claude Code from April 16 to April 20.
Did Anthropic compensate users for the degraded experience?
Yes. Anthropic reset usage limits for all subscribers as of April 23, 2026 in acknowledgement that users experienced degraded service during the affected period. This applied automatically.
What does this mean for teams evaluating AI tools professionally?
An AI vendor that publishes detailed, version-dated postmortems creates accountability you can audit. When evaluating Claude versus competitors, the willingness to publish this level of technical detail about failures is a meaningful signal about how the vendor handles reliability incidents.
The Bottom Line
The Anthropic Claude Code postmortem is one of the most technically transparent documents an AI lab has published about its own product failures. For engineers, it’s a case study in how configuration changes, caching regressions, and system prompt constraints interact in ways that evade standard testing and compound into confusing user-facing degradation. For non-technical teams, it’s a reminder that AI tool reliability has failure modes that don’t look like traditional software failures — and that choosing vendors who document them openly is a reasonable way to manage that risk.
The three bugs are fixed. The version is v2.1.116. The lessons — about how AI systems fail silently, why short evals miss long-session regressions, and why output constraints have unexpected effects on reasoning quality — are the part that stays relevant long after the patch notes are forgotten.
For more on how AI infrastructure shifts are reshaping digital marketing, see our overview of how AI is changing SEO in 2026, our Google algorithm updates log, and our deep-dive on agentic AI in marketing. For context on the September 2025 infrastructure postmortem that preceded this one, see our coverage of the Claude reliability and rate limits story.