I’m trying to access Kimi through langgraph’s ChatOpenAI. No matter how I set the max_tokens parameter, the maximum output tokens of Kimi K2 do not exceed 1024.
How to handle it in langgraph to make it output more than the maximum of 1024 tokens?
I’m trying to access Kimi through langgraph’s ChatOpenAI. No matter how I set the max_tokens parameter, the maximum output tokens of Kimi K2 do not exceed 1024.
How to handle it in langgraph to make it output more than the maximum of 1024 tokens?
For technical reasons, the default setting for max_tokens is 1024. You can explicitly modify the API’s max tokens by setting the max_tokens field in the request. Please refer to this link for details: https://platform.moonshot.ai/docs/api/chat#request-body.
The documentation only mentions the OpenAI client, but when I try to set max-tokens using the langgraph client, Kimi K2 fails to recognize it, causing the agent developed with langgraph to consistently encounter max-tokens limitations. How can this issue be resolved?
my code is
```python
from langchain_openai import ChatOpenAI
self.BASE_LLM = ChatOpenAI(
*model*=model, *# 大模型服务配置*
temperature=0.3,
openai_api_key=key,
base_url=base_url, # 增加token上限,允许输出更多内容
max_completion_tokens=128000,
max_tokens=128000,
)
```
I am not very familiar with the SDK of langchain, but from the perspective of HTTP requests, setting the max_tokens parameter should be sufficient. If this parameter fails to set, you can see it ends prematurely, and the finish_reason is length; we may need to investigate the cause further.
Additionally, it should be noted that max_tokens means “The maximum number of tokens to generate for the chat completion.” Setting max_tokens=128000 may not be a good practice.
It doesn’t throw an error because the model’s context length is 128*1024, slightly larger than this value.
However, a slightly longer prompt might cause the request to be rejected with a 400 error. (This indeed feels unfriendly, and we might update this implementation soon to make it more developer-friendly).
My program is built entirely on LangGraph, so it’s hard to make direct HTTP calls in the middle because it relies on LangGraph’s upstream and downstream features. Does Kimi have any solution to resolve this—specifically, to retrieve max_tokens from the LangGraph context and return the result correctly?
We cannot provide specific suggestions for external frameworks. I am merely inferring that langchain should possibly be max_tokens based on the HTTP request fields, but for detailed information, please refer to their official documentation.
For debugging HTTP requests, you might consider using our provided tool,moonpalace, to obtain the complete HTTP request and response.