Error code: 429: We're receiving too many requests at the moment

yuikns · March 3, 2026, 5:47pm

Thanks for the detailed report. Looking at the cluster metrics during that specific window, things appeared stable without obvious incidents, so this likely points to an account-level quota limit rather than a platform-side issue.

What’s happening
The rate_limit_reached_error indicates you’ve hit a quota ceiling, which includes both concurrency limits and token throughput limits that reset on a rolling basis.

About the 1-hour gap
Assuming your connection was stable during that time, the pattern you described—complete inability to make requests for roughly an hour, followed by a burst of successes—appears more consistent with token exhaustion within your 5-hour rolling window than concurrency limits.

If this were purely a concurrency issue (hitting the 30-task cap), you would typically see sporadic successes as individual subagents complete and free up slots. Instead, the observed “all-or-nothing” pattern over that hour-long window suggests your token quota was fully depleted by 16:19, blocking all requests until your 5-hour window reset around 17:20.

Checking your usage
I chatted with the Kimi Code team about this, and there’s an experimental endpoint you can use to check your current limits in real-time:

curl -H "Authorization: Bearer $KIMI_API_KEY" \
  https://api.kimi.com/coding/v1/usages

Example response:

{
  "usage": {
    "limit": "100",
    "remaining": "100",
    "resetTime": "2026-03-09T11:16:04.416717Z"
  },
  "limits": [
    {
      "window": {
        "duration": 300,
        "timeUnit": "TIME_UNIT_MINUTE"
      },
      "detail": {
        "limit": "100",
        "remaining": "100",
        "resetTime": "2026-03-03T11:16:04.416717Z"
      }
    }
  ]
}

The resetTime is personalized based on your subscription start time, which might explain why you recovered when you did—it’s possible your window reset fell within that gap.

Please note this is a legacy/experimental API that doesn’t yet expose granular token burn rates or real-time concurrency counts; we’re working on improving this for better transparency. If you check this during the next incident and see remaining: 0, that would help confirm the token exhaustion theory.

Hope it helps.