Critical Service Block — Engine Overloaded + Compaction Failure in KimiClaw (Allegretto subscriber)

Hello Kimi Support Team,

I am writing to report a critical service disruption that has completely blocked my workflow for the past 3–4 days.

Account Information:

  • Subscription: Allegretto

  • Account paid/active until: May 5, 2026

  • Client: KimiClaw (OpenClaw 2026.3.13, build 61d171a) via browser

  • Model used: kimi-coding/k2p5

Issue Timeline:

  • Until recently: Kimi worked perfectly with version 2.5 (k2p5). No issues.

  • 3–4 days ago: Severe degradation began — response start times increased to minutes, constant session hangs and disconnections.

  • This morning: Complete block. Every request returns “The engine is currently overloaded, please try again later”. I cannot get any response at all.

Specific Errors (see attached screenshots):

  1. “The engine is currently overloaded, please try again later” — repeating on every single request, making the service unusable.

  2. “Compaction failed: Compaction cancelled • Context ?/131k” — context compression broke entirely. The session became a “zombie” that could not be recovered.

  3. Context overload state: Before the total failure, the client showed:

    • Context: 119k/131k (91%)

    • Cache: 99% hit • 119k cached, 0 new

    • Compactions: 20

    • Session: agent:main:main

    • Think: high • Reasoning: stream • Queue: collect (depth 0)

My Situation:
I am a solo developer relying entirely on Kimi for a single project. My entire workflow runs through Kimi: planning, backend code, frontend code, CI/CD deployment management, and documentation. This outage has completely stopped my development.

Attempted Solutions:

  • Sending \new — helped only for a very short dialogue, then the same errors returned immediately.

  • Trying to clear/compact context — failed due to the compaction error above.

  • The issue appears only in KimiClaw (browser, mobile client). I have not tested the official web interface extensively because KimiClaw is my primary IDE-integrated workflow.

Additional Context:
I use a heavily customized agents.md file. KimiClaw previously confirmed it is valid/normal, but I can provide it for analysis if needed.

Request:
Please investigate:

  1. Why the kimi-coding/k2p5 model is returning persistent “engine overloaded” errors for a paid Allegretto account.

  2. Why context compaction fails at high context usage (119k/131k), leaving sessions unrecoverable.

  3. Whether there is a rate limit, context limit, or model-specific issue affecting KimiClaw browser users.

I have attached screenshots showing the error messages and the client state. I am happy to provide my agents.md, session logs, or any other diagnostic data you need.

Thank you for your urgent attention. My development is fully blocked until this is resolved.

Best regards,

+UPDATE I did /new and run one promt and get the following status

:lobster: OpenClaw 2026.3.13 (61d171a)
:brain: Model: kimi-coding/k2p5 · :key: api-key (env: KIMI_API_KEY)
:abacus: Tokens: 101k in / 1 out · :dollar_banknote: Cost: $0.0000
:books: Context: 101k/131k (77%) · :broom: Compactions: 25

:thread: Session: agent:main:main • updated just now
:gear: Runtime: direct · Think: high · Reasoning: stream
:knot: Queue: collect (depth 0)

so actually my KimiClaw is stuck again. Next i did /compact and now it

Compacted (100k → 17k) • Context 17k/131k (13%)

I dont understand such Kimi`s behavior and what should i do next.

+UPDATE Request timed out before a response was generated. Please try again, or increase `agents.defaults.timeoutSeconds` in your config.

+UPDATE 1 hour left since /compact and now actual /status is

:lobster: OpenClaw 2026.3.13 (61d171a)
:brain: Model: kimi-coding/k2p5 · :key: api-key (env: KIMI_API_KEY)
:abacus: Tokens: 108k in / 1 out · :dollar_banknote: Cost: $0.0000
:books: Context: 108k/131k (83%) · :broom: Compactions: 26
:thread: Session: agent:main:main • updated just now
:gear: Runtime: direct · Think: high · Reasoning: stream
:knot: Queue: collect (depth 0)

and KIMI again goes The engine is currently overloaded, please try again later The engine is currently overloaded, please try again later

Please help me to stabilize the KIMI Claw!!!

My BotID 19cf6e73-0362-8d1f-8000-0000684bfa7b

1 Like

Hi Team Kimi! Could you please fix the “engine overloaded” error asap!!! I just waste the full week, i did nothing. Oh sorry, i did! One thing i did, i pay KIMI subscription and this task was sacessfully! I absollutely angry and fully disapointed.

Hi,

I completely understand your frustration. Having your development workflow severely disrupted like this is unacceptable, and I sincerely apologize for the impact on your project.

To give you some context: two days ago, our engineers deployed a fix for a bias in our inference routing rules. Since overall system metrics stabilized and our internal tests passed, we believed the core issue was resolved. We assumed any remaining occasional “overloaded” errors were just peak-hour spikes that would naturally recover via automatic retries.

To explain these spikes: many automated setups (like OpenClaw) schedule cron jobs exactly on the hour (e.g., 6:00, 17:00). When everyone’s requests hit at the exact same minute, it creates a massive synchronized burst akin to a DDoS attack. (As a quick tip, adding a slight random delay or staggering these cron schedules can significantly help bypass these exact-minute traffic jams).

Seeing your latest message with your Bot ID (19cf6e73...), it seems your specific context compaction and timeout issues are still persisting.

We are not ignoring this. Our engineering team is actively using your Bot ID to investigate your exception events now.

We will update you as soon as we identify the root cause. Thank you for your patience, the detailed logs, and your continued support. We are working hard to get this fully fixed for you.

Hi Team,

Thank you for the detailed explanation and for actively investigating this. I appreciate that you are looking into my specific case using the Bot ID.

However, I need to clarify something important: my issue is not caused by peak-hour traffic spikes or cron-job synchronization. Here is why:

1. The “engine overloaded” error was persistent 24/7
The error occurred continuously for 3–4 days, including off-peak hours. It was not occasional — it was a total block of the service. If it were just peak-hour overload, the system would recover between spikes, but it never did in KimiClaw.

2. Compaction fails at low context (33%) — unrelated to traffic
I observed:
Compaction failed: Compaction cancelled • Context 44k/131k (33%)
This proves the problem is not context overflow or load-related. The compaction mechanism itself breaks, even with plenty of headroom. This points to a client-side or model-routing bug, not infrastructure overload.

3. Gateway reset and \new do not help
If the issue were transient server load, a gateway reset or new chat would fix it temporarily. In my case, even a full gateway reset failed, and \new only worked for a few messages before the same errors returned. This suggests corrupted or incompatible session/model state rather than temporary overload.

4. The problem is isolated to KimiClaw
The official Kimi web interface works perfectly with the same account. Other services using the same API key also work. This confirms the issue is specific to the OpenClaw 2026.3.13 (build 61d171a) client.

5. Model version correlation — k2p6 fallback
I noticed that KimiClaw now shows:
Model: kimi/k2p6Fallback: kimi-coding/k2p6 (selected model unavailable)

Previously, I used k2p5 without any issues. The problems began exactly when k2p6 started being enforced as a fallback in the client. This timing strongly suggests that OpenClaw 2026.3.13 is not fully compatible with the k2p6 routing/fallback logic, or that the k2p6 deployment introduced a regression specifically for KimiClaw sessions.

My hypothesis for your engineers:
The inference routing fix deployed 2 days ago may have changed how k2p6 is routed or how session state is handled. OpenClaw 2026.3.13 was likely built and tested against k2p5. When k2p6 became the forced fallback, the client’s compaction, context handling, or queue logic (Queue: collect (depth 0), Reasoning: stream, Think: high) may have started failing due to an unexpected model response format or routing behavior.

Additional info for debugging:

  • Client: OpenClaw 2026.3.13 (61d171a), browser-based

  • Previous stable model: kimi-coding/k2p5

  • Current forced model: kimi/k2p6 → fallback kimi-coding/k2p6

  • Custom agents.md: Present, but KimiClaw previously validated it as normal. I can provide it if needed, but since the web UI works fine with the same logic, I doubt it is the root cause.

  • Account: Allegretto, active until 2026-05-05

My requests:

  1. Could you confirm whether k2p6 is fully supported in OpenClaw 2026.3.13, or if a client update is required?

  2. Is there a way to force k2p5 as a temporary workaround while the k2p6 compatibility issue is fixed?

  3. Do you have an ETA for a fix or a patched KimiClaw build?

I am happy to test a beta client version or provide any additional logs (browser console, network requests, session storage dump) if that helps your engineers.

Thank you again for your attention. I am standing by to help with diagnostics.

Best regards,
Andrey

Hi Andrey,

Thank you for the incredibly detailed breakdown. This level of technical debugging from your side is extremely helpful.

I want to address your hypotheses directly and share what our backend logs are actually showing for your Bot ID. The data paints a different picture than what the client UI is displaying locally.

1. The “Engine Overloaded” Discrepancy

Your observation: Persistent 24/7 “engine overloaded” errors.

Your conclusion: The service is totally blocked and failing to process requests.

Actual behavior: Our backend inference logs for your Bot ID show very few “engine overloaded” errors today. Instead, the server is logging a continuous stream of successful user and agent activity for your account.

This indicates a significant gap between the severe blockage you are experiencing and the actual backend data. While even a few “overloaded” messages are enough to completely disrupt a developer’s workflow, the logs confirm that the core engine itself has not been persistently down and is successfully processing requests.

To help our claw team investigate why you are seeing these errors while the engine is reporting success, could you provide specific timestamps for the “engine overloaded” occurrences? This will allow us to pinpoint exactly what is happening at the gateway level during those moments.

2. Compaction Fails at Low Context (33%)

Your observation: Compaction failed: Compaction cancelled • Context 44k/131k (33%)

Your conclusion: The compaction mechanism is broken / there is a client-side bug.

Actual behavior: The compaction mechanism is working exactly as designed. This is not a code bug; it is a UX copy issue.

Here is the logic:

  • At 33% context utilization, your conversation history is still very short.
  • The system has a safeguard that aborts compaction when the context headroom is plentiful. Summarizing a short history would waste tokens and API calls for no meaningful gain.
  • The safeguard therefore cancels the job intentionally.

The problem is the message text. Compaction failed: Compaction cancelled reads like a system error. If the copy instead read:

Compaction skipped: Sufficient context available (33%) — no summary needed.

…it would immediately clarify that the mechanism is healthy and simply declined to do unnecessary work.

You were right that this is unrelated to traffic or context overflow. You were right that 33% is nowhere near a limit. But the mechanism is not broken; it is deliberately idle.

3. Why /new and Gateway Reset Only “Worked for a Few Messages”

Your observation: After /new, the error returned after only a few messages.

Your conclusion: This indicates corrupted or incompatible session/model state.

Actual behavior: This is the same safeguard from Point 2, now operating on an even smaller corpus.

When you run /new:

  1. The context is wiped. Only the base system prompt remains.
  2. You exchange only “a few messages.”
  3. The auto-compaction trigger eventually fires.
  4. The compactor inspects the history and finds insufficient conversational content to summarize (only a handful of turns plus system prompt).
  5. The safeguard cancels the job again.

A truly corrupted session state would manifest as 400-series API errors, malformed token streams, or persistent transport failures—not as a clean, structured Compaction cancelled log line. What you are seeing is the vacuum cleaner shutting off because the floor is already clean, not because the motor burned out.

Where a real bug might exist: If you did not manually invoke /compact and the client auto-triggered it, the trigger threshold may be miscalculating. It could be counting the system prompt (or embedded tool schemas) toward a faux “utilization” figure, causing the trigger to fire when it should not. That would be a trigger-logic bug, not a compaction or model-state bug.

4. Model Version Correlation (k2p6 Fallback)

Your observation: Issues began when kimi/k2p6 appeared as a fallback from k2p5.

Clarification on routing: For sessions routed through the Kimi Code Gateway (the path used by KimiClaw for code-plan / “vibe coding” features), the model ID you select or see in the client (k2p5, k2p6, etc.) has no actual effect on the underlying model. In this specific gateway path, the backend essentially ignores the specific version ID requested by the client. Regardless of which model ID is sent, the underlying model is identical. Every request is automatically routed to the exact same underlying model: the “latest model version allocated to the code-plan pipeline.”

Therefore, when you see k2p6 replacing k2p5, you are purely seeing a UI routing-label update. The backend did not swap you to a different, untested, or incompatible binary. The client build 2026.3.13 is still talking to the exact same compatible pipeline it always has.