Error code: 429: We're receiving too many requests at the moment

Yesteraday you updated limits (500 requests before / percentage now). problem is that now i cant use my $40 subsctiption cause of 429 error….

I guess you have trouble with rate limit logic… or UI shows incorrect percentage.

LLM provider error: Error code: 429 - {‘error’: {‘message’: “We’re receiving too many requests at the moment. Please wait a moment and try again.”, ‘type’: ‘rate_limit_reached_error’}}

Thanks for flagging this — I understand the frustration of hitting a 429 error right after the billing model change.

From what I understand, this relates to a recent update on the Kimi Code side (Membership Benefits logic), where they shifted from request-count-based packages to token-based packages. The intention behind this change is actually to provide better value for users, particularly in scenarios where context caching is involved, as you can get significantly more usage out of the same spend versus the old request-based model.

However, you’re right that the error messaging here is problematic. The generic 429 response ("We're receiving too many requests...") appears to be conflating different types of limits. In your case, this is likely hitting a concurrency limit rather than the token quota being exhausted — the error type distinction isn’t being surfaced properly in the current implementation.

The Kimi Code team is aware of the confusion and has committed to updating the documentation at Kimi Code Membership Benefits | Kimi Code Docs to clarify the new token-based billing strategy and the specific rate limiting behaviors (including concurrency vs. quota limits). I’d recommend keeping an eye on that page for the elaborated policy docs, which should go live shortly.

1 Like

Just to add a bit of context: that limit is essentially a safety brake to prevent accidental spikes from exhausting your token package prematurely. It’s not meant to throttle your work, and I know the team is already evaluating usage patterns to set a more reasonable threshold.

1 Like

Hi there! :waving_hand:

Following up on my previous replies — wanted to give you a concrete update on the limit adjustments.

The documentation at Kimi Code Membership Benefits has been updated with specific figures that address the concurrency bottlenecks you encountered:

Key Advantages

  • Seamless Integration: Full compatibility with Kimi Code CLI, Claude Code, and Roo Code, fitting perfectly into your existing CI/CD or local workflows.
  • Elite Performance: Experience blistering output speeds of up to 100 Tokens/s with high stability.
  • Throughput Capacity: A 5-hour token quota supports approximately 300–1,200 API calls, with a maximum concurrency of 30, ensuring uninterrupted operation for complex workloads.

What this means for your subscription:

The maximum concurrency of 30 is the key fix here — this replaces the previous stricter throttling that was causing those 429 errors even when you had plenty of token quota remaining. You should now be able to run intensive workflows without hitting that “safety brake” wall we discussed earlier.

The token-based billing (vs. the old 500-request model) remains the same, but the concurrency ceiling has been raised significantly to better accommodate real-world usage patterns.

Could you test again? If you’re still seeing 429s with the specific error type rate_limit_reached_error, let me know — we want to ensure this new 30-request limit aligns with your actual batch operation needs.

i am having the same issue

LLM provider error: Error code: 429 - {‘error’: {‘message’: “We’re receiving too many requests at the moment. Please wait a moment and try again.”, ‘type’: ‘rate_limit_reached_error’}}

i simply run an /init in the project and i get:
/init
…..(Thinking)………
• Used Task (Explore project structure)
• 6 more tool calls …
• Used Task (Explore root directory)
• Used Task
• Used Task
• Used Task
• Used Task (Analyze config files)
Failed to run subagent
• Used Task (Examine CI/CD setup)
Failed to run subagent
• Used Task (Analyze test setup)
Failed to run subagent

When i then look in the session using the kimi web i get this error for each subagent failure LLM provider error: Error code: 429 - {‘error’: {‘message’: “We’re receiving too many requests at the moment. Please wait a moment and try again.”, ‘type’: ‘rate_limit_reached_error’}}

i have the Allegretto subscription and my quotas are:
╭────────────────────────────── API Usage ──────────────────────────────╮
│ Weekly limit ━━━╺━━━━━━━━━━━━━━━━ 84% left (resets in 5d 16h 43m) │
│ 5h limit ━━━━━━━━━━━━━━━━━━━━ 100% left (resets in 3h 43m) │
╰────────────────────────────────────────────────────────────────────────╯

where can i at least get/see this rate limit ? i do not think that i go over the 30 concurrent task. But in any ways would be nice to at least get this infos so i can work around it.
This happened already twice today and this is the usage history that i get from the kimi code console in the ui:

i removed the request ID

Login Device KimiCLI/1.16.0 2026/03/02 17:25:14 Success
Login Device KimiCLI/1.16.0 2026/03/02 17:25:08 Success
Login Device KimiCLI/1.16.0 2026/03/02 17:24:49 Success
Login Device KimiCLI/1.16.0 2026/03/02 17:24:29 Success
Login Device KimiCLI/1.16.0 2026/03/02 17:23:20 Success
Login Device KimiCLI/1.16.0 2026/03/02 17:23:14 Success
Login Device KimiCLI/1.16.0 2026/03/02 17:22:55 Success
Login Device KimiCLI/1.16.0 2026/03/02 17:22:28 Success
Login Device KimiCLI/1.16.0 2026/03/02 16:18:57 Success

as you can see at ~16:19 i got those rate_limiting error and nothing worked for the next hour then it started working again at ~17:20 i was able to make 8 request before hitting the rate_limit again. This seems weird

Hey @vimmoos,

Thanks for the detailed report. Looking at the cluster metrics during that specific window, things appeared stable without obvious incidents, so this likely points to an account-level quota limit rather than a platform-side issue.

What’s happening
The rate_limit_reached_error indicates you’ve hit a quota ceiling, which includes both concurrency limits and token throughput limits that reset on a rolling basis.

About the 1-hour gap
Assuming your connection was stable during that time, the pattern you described—complete inability to make requests for roughly an hour, followed by a burst of successes—appears more consistent with token exhaustion within your 5-hour rolling window than concurrency limits.

If this were purely a concurrency issue (hitting the 30-task cap), you would typically see sporadic successes as individual subagents complete and free up slots. Instead, the observed “all-or-nothing” pattern over that hour-long window suggests your token quota was fully depleted by 16:19, blocking all requests until your 5-hour window reset around 17:20.

Checking your usage
I chatted with the Kimi Code team about this, and there’s an experimental endpoint you can use to check your current limits in real-time:

curl -H "Authorization: Bearer $KIMI_API_KEY" \
  https://api.kimi.com/coding/v1/usages

Example response:

{
  "usage": {
    "limit": "100",
    "remaining": "100",
    "resetTime": "2026-03-09T11:16:04.416717Z"
  },
  "limits": [
    {
      "window": {
        "duration": 300,
        "timeUnit": "TIME_UNIT_MINUTE"
      },
      "detail": {
        "limit": "100",
        "remaining": "100",
        "resetTime": "2026-03-03T11:16:04.416717Z"
      }
    }
  ]
}

The resetTime is personalized based on your subscription start time, which might explain why you recovered when you did—it’s possible your window reset fell within that gap.

Please note this is a legacy/experimental API that doesn’t yet expose granular token burn rates or real-time concurrency counts; we’re working on improving this for better transparency. If you check this during the next incident and see remaining: 0, that would help confirm the token exhaustion theory.

Hope it helps.

Hello @yuikns, Thanks a lot for the reply! I now had the same issue again:

LLM provider error: Error code: 429 - {‘error’: {‘message’: “We’re receiving too many requests at the mo
ment. Please wait a moment and try again.”, ‘type’: ‘rate_limit_reached_error’}}

I use the curl to the route and i get:

{"user": {...},
 "usage": {"limit": "100",
             "used": "33",
             "remaining": "67",
             "resetTime": "2026-03-08T09:20:45.248979Z"},
"limits": [
           {"window": 
              {"duration": 300,
               "timeUnit": "TIME_UNIT_MINUTE"},
            "detail": 
              {"limit": "100",
               "used": "2",
               "remaining": "98",
               "resetTime": "2026-03-07T15:20:45.248979Z"}}]}

So i am not hitting any limit. If it can help i notice this happens only when i give kimi the CreateSubagent “tool“ it did create a couple of them but from the output seems at max 6/7, nothing close to the 30 concurrency limit mentioned in the doc. Maybe there is something in this CreateSubagent ? for now i will avoid using it because so far i never i had any other rate limiting issue when this is disabled.

for reference i have only 1 kimi session open and the output is:

• Used Task (...)
• 8 more tool calls …
• Used CreateSubagent
• Used Task (...)
• Used Task (...)
• Used Task (...)
• Used Task (...)
• 3 more tool calls …
• Used Task (...)
• Used Task (...)
• Used Task (...)
• Used Task (...)
• Used Task (...)
• Used Task (...)
• Used Task (...)
• Used Task (...)
Failed to run subagent
• Used Task (...)
• 5 more tool calls …
• Used ReadFile
• Used ReadFile (...)
• Used ReadFile (...)
• Used ReadFile

so maybe there is something i do not understand about this concurrency limit, maybe let me know if that’s the case.

For your reference, here is the stack trace i see after aborting the session. I hope this can help:

Interrupted by user

Unhandled exception in event loop:Exception NonePress ENTER to continue…

Unhandled exception in event loop:Exception NonePress ENTER to continue…:dizzy: /exitBye!ERROR:asyncio:unhandled exception during asyncio.run() shutdowntask: <Task finished name=‘Task-16248’ coro=<KimiSoul.run() done, defined at […]/site-packages/kimi_cli/soul/kimisoul.py:231> exception=APIStatusError(‘Error code: 429 - {‘error’: {‘message’: “We’re receiving too many requests at the moment. Please wait a moment and try again.”, ‘type’: ‘rate_limit_reached_error’}}’)>Traceback (most recent call last):File “[…]/site-packages/kosong/chat_provider/kimi.py”, line 165, in generateresponse = await self.client.chat.completions.create(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^…<6 lines>…)^File “[…]/site-packages/openai/resources/chat/completions/completions.py”, line 2678, in createreturn await self._post(^^^^^^^^^^^^^^^^^…<49 lines>…)^File “[…]/site-packages/openai/_base_client.py”, line 1797, in postreturn await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/openai/_base_client.py”, line 1597, in requestraise self._make_status_error_from_response(err.response) from Noneopenai.RateLimitError: Error code: 429 - {‘error’: {‘message’: “We’re receiving too many requests at the moment. Please wait a moment and try again.”, ‘type’: ‘rate_limit_reached_error’}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):File “[…]/site-packages/kimi_cli/soul/kimisoul.py”, line 255, in runawait self._turn(user_message)File “[…]/site-packages/kimi_cli/soul/kimisoul.py”, line 269, in _turnreturn await self._agent_loop()^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/kimi_cli/soul/kimisoul.py”, line 403, in _agent_loopstep_outcome = await self._step()^^^^^^^^^^^^^^^^^^File “[…]/site-packages/kimi_cli/soul/kimisoul.py”, line 473, in _stepresult = await _kosong_step_with_retry()^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/tenacity/asyncio/init.py”, line 189, in async_wrappedreturn await copy(fn, *args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/tenacity/asyncio/init.py”, line 111, in calldo = await self.iter(retry_state=retry_state)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/tenacity/asyncio/init.py”, line 153, in iterresult = await action(retry_state)^^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/tenacity/_utils.py”, line 99, in innerreturn call(*args, **kwargs)File “[…]/site-packages/tenacity/init.py”, line 420, in exc_checkraise retry_exc.reraise()~~~~~~~~~~~~~~~~~^^File “[…]/site-packages/tenacity/init.py”, line 187, in reraiseraise self.last_attempt.result()~~~~~~~~~~~~~~~~~~~~~~~~^^File “/usr/local/lib/python3.13/concurrent/futures/_base.py”, line 449, in resultreturn self.__get_result()~~~~~~~~~~~~~~~~~^^File “/usr/local/lib/python3.13/concurrent/futures/_base.py”, line 401, in __get_resultraise self._exceptionFile “[…]/site-packages/tenacity/asyncio/init.py”, line 114, in callresult = await fn(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/kimi_cli/soul/kimisoul.py”, line 467, in _kosong_step_with_retryreturn await self._run_with_connection_recovery(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^…<3 lines>…)^File “[…]/site-packages/kimi_cli/soul/kimisoul.py”, line 609, in _run_with_connection_recoveryreturn await operation()^^^^^^^^^^^^^^^^^File “[…]/site-packages/kimi_cli/soul/kimisoul.py”, line 450, in _run_step_oncereturn await kosong.step(^^^^^^^^^^^^^^^^^^…<6 lines>…)^File “[…]/site-packages/kosong/init.py”, line 158, in stepresult = await generate(^^^^^^^^^^^^^^^…<6 lines>…)^File “[…]/site-packages/kosong/_generate.py”, line 53, in generatestream = await chat_provider.generate(system_prompt, tools, history)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File “[…]/site-packages/kosong/chat_provider/kimi.py”, line 175, in generateraise convert_error(e) from ekosong.chat_provider.APIStatusError: Error code: 429 - {‘error’: {‘message’: “We’re receiving too many requests at the moment. Please wait a moment and try again.”, ‘type’: ‘rate_limit_reached_error’}}

Hi @vimmoos , thank you so much for providing the detailed logs! You make a fair point, and the usage output is indeed a bit confusing right now.

To clarify a few things from my end:

  • Usage API: The /usages endpoint currently only reflects token limits (the “100” and “33” actually represent token usage percentages). The concurrency limit tracking is still under development.
  • Concurrency vs. Subagents: The concurrency limit applies to HTTP-level API inference requests. From my understanding, this doesn’t strictly map 1:1 with the Task and Subagent counts you see in the CLI.

Still, I absolutely agree with you: if your token limits are fine, a single session shouldn’t be throwing this many rate-limit errors.

I’m not entirely certain how CreateSubagent and parallel Tasks might be triggering these HTTP request spikes under the hood, so I need to check with my colleagues to figure out exactly what’s happening here. Since it’s the weekend, it might be a little while before I can get back to you with a solid update.

Thanks for bearing with us, and I’ll update you as soon as I know more!

Hi @yuikns , thanks for the fast reply!
Okay perfect, looking forward to hearing back from you.

In the meantime i can confirm that after (1-2hrs) of being blocked by the rate_limit error, everything is working again by simply commenting out the CreateSubagent tool like so

version: 1
agent:
name: default-plus
extend: default
description: Default agent with subagent creation capabilities
tools:
# - “kimi_cli.tools.multiagent:CreateSubagent”
- “kimi_cli.tools.multiagent:Task”

(I use this instead of the default agent)

1 Like