TITLE : Allegretto Quota System Has Hidden Dual-Pool Architecture + Potential "Login Device" Classification Bug

Hi all — wanted to share something I just discovered the hard way so others don’t get caught off-guard.

What I Learned

Allegretto (and likely other tiers) has a dual-pool quota system:

  1. Weekly Fast Pool — refreshes every 7 days, gives instant responses

  2. Monthly Reserve — backup pool when weekly is dry

The dashboard shows both percentages separately, but here’s the catch: the API backend can hit 0% before the dashboard catches up. I was showing 15% weekly / 10% monthly and still getting 429 exceeded_current_quota_error.

The Bug

My usage history shows 100+ “Login Device” entries all classified as “Model Inference” calls. These are auth handshakes, not actual coding work. They consumed my monthly reserve. I have 3 CLI devices registered (Fedora, Ubuntu, Windows) — if the CLI is doing keep-alive loops, each ping is eating quota.

Evidence: kimi export data available. 100 records, nearly all “Login Device” → “Model Inference.”

What the Plan Page Doesn’t Tell You

  • “Monthly subscription” ≠ “monthly usage budget”

  • There are two independent counters that can each hit zero

  • The dashboard lags behind real API state

  • Login/auth calls do burn quota (not disclosed in plan docs)

My Ask to Moonshot

  1. Fix the “Login Device” → “Model Inference” classification bug

  2. Make the plan description transparent about dual-pool limits

  3. Sync dashboard percentages with real-time API enforcement

For Other Users

Check your usage history: https://www.kimi.com/code/console → Usage History. If you see “Login Device” eating your quota, you’re affected.

Account: Clinton M, Allegretto tier,

Hi Morty,

Thanks for taking the time to share your observations. I want to clarify the technical mechanism behind this to put your mind at ease—there is no bug causing background auth handshakes to drain your quota.

The core of the misunderstanding stems from our current naming convention in the “Name” column.

Here is what is actually happening:
The entries labeled “Login Device” are not keep-alive pings or authentication handshakes. They are your actual, real Model Inference calls (your coding work).

The “Name” column simply indicates the authentication method used for the request:

  • When you authenticate KimiCLI directly via OAuth (logging into the device), the backend hardcodes the source as Login Device.
  • If you were to generate a standard API Key and use that instead, the column would display the specific name you gave to that API key.

Because third-party tools currently rely on API keys and lack direct OAuth login capabilities, Login Device is used as the static identifier for native OAuth sessions. We completely understand how this nomenclature is misleading and makes it look like an auth-related event. If we support third-party login flows in the future, this identifier will be updated to reflect the specific third-party origin or landing page.

Regarding the Dashboard and Quota:
Your feedback on the dashboard synchronization lag and the transparency of the dual-pool quota system is completely valid. I am passing this feedback directly to the product team so we can improve the real-time accuracy of the usage dashboard and make the plan limits more explicit in the documentation.

Your quota is being consumed by actual inference work, not a rogue keep-alive loop. I hope this clears up the mystery!