tool_choice=required 下,响应里只有 tool_calls,content 为空,同时 reasoning_content 也为空。tool_choice=required下拿到 reasoning_content,必须开启 reasoning_parser,否则必然拿不到。parser 不仅是匹配或后处理的功能,它还会改变约束解码在解码期的生效时机。required 之所以让 content 为空,是 Serving 层在解析出 tool calls 后主动清空正文,这是设计行为。| 术语 | 含义 | 你关心的点 |
|---|---|---|
tool_choice=required |
强制模型必须产出 tool call | 会触发 JSON schema 约束解码 |
tool_call_parser |
把模型输出解析成 OpenAI tool_calls |
决定 tool call 的解析格式 |
reasoning_parser |
把输出拆成 reasoning_content 与 content |
同时影响解码期 gating 与返回期拆分 |
constrained decoding |
约束解码,采样时对 logits 做 mask | 约束一旦生效,非 JSON token 会被禁止 |
json_schema |
用 JSON schema 强约束输出结构 | required 默认走它 |
ReasonerGrammarBackend |
两段式 grammar 包装器 | 在 think_end_id 前放行思考段 |
think_end_id |
结束思考段的 token id | 决定何时开始强制 JSON |
require_reasoning |
请求级布尔值,表示需要思考段 | 决定 gating 初始状态 |
这里把顺序拆成两段时期进行讨论:
# required 或指定函数会使用 json_schema
if tool_choice is required or is specific function:
tool_call_constraint = ('json_schema', schema)
# require_reasoning 是布尔值
require_reasoning = decide_from_request(enable_thinking or thinking)
TokenizerManager 把 tokenized request 发给 SchedulerSchedulerOutputProcessorMixin 会对 next_token_id 进行处理:if req.grammar is not None:
# FIXME: this try-except block is for handling unexpected xgrammar issue.
try:
if batch.spec_algorithm.is_none():
# Normal decode: single token
**req.grammar.accept_token(next_token_id)**
elif batch.is_spec_v2:
# Speculative decode: next_token_id is a list of accepted tokens
for token_id in next_token_id:
**req.grammar.accept_token(token_id)**
except ValueError as e:
# Grammar accept_token can raise ValueError if the token is not in the grammar.
# This can happen if the grammar is not set correctly or the token is invalid.
logger.error(
f"Grammar accept_token failed for req {req.rid} with token {next_token_id}: {e}"
)
self.abort_request(AbortReq(rid=req.rid))
<aside> ⚠️
reasoning_parser 时,grammar backend 会被包装成两段式 gating
</aside>python/sglang/srt/constrained/base_grammar_backend.py