1. 背景

现象：tool_choice=required 下，响应里只有 tool_calls，content 为空，同时 reasoning_content 也为空。
结论：要在 tool_choice=required下拿到 reasoning_content，必须开启 reasoning_parser，否则必然拿不到。parser 不仅是匹配或后处理的功能，它还会改变约束解码在解码期的生效时机。required 之所以让 content 为空，是 Serving 层在解析出 tool calls 后主动清空正文，这是设计行为。

2. 术语表

术语	含义	你关心的点
`tool_choice=required`	强制模型必须产出 tool call	会触发 JSON schema 约束解码
`tool_call_parser`	把模型输出解析成 OpenAI `tool_calls`	决定 tool call 的解析格式
`reasoning_parser`	把输出拆成 `reasoning_content` 与 `content`	同时影响解码期 gating 与返回期拆分
`constrained decoding`	约束解码，采样时对 logits 做 mask	约束一旦生效，非 JSON token 会被禁止
`json_schema`	用 JSON schema 强约束输出结构	required 默认走它
`ReasonerGrammarBackend`	两段式 grammar 包装器	在 `think_end_id` 前放行思考段
`think_end_id`	结束思考段的 token id	决定何时开始强制 JSON
`require_reasoning`	请求级布尔值，表示需要思考段	决定 gating 初始状态

3. 调用栈分析

这里把顺序拆成两段时期进行讨论：

解码期，决定模型能不能先输出思考 — 实际代码包含该功能
返回期，决定响应字段如何被拆分与清空 — 之前对 parser 的认识有错误，认为只会做拆分

解码期调用栈

HTTP 入口把请求交给 Serving
ServingChat 发现 required，选择 json_schema 约束

# required 或指定函数会使用 json_schema
if tool_choice is required or is specific function:
	tool_call_constraint = ('json_schema', schema)

ServingChat 同时计算是否需要思考段

# require_reasoning 是布尔值
require_reasoning = decide_from_request(enable_thinking or thinking)

TokenizerManager 把 tokenized request 发给 Scheduler
SchedulerOutputProcessorMixin 会对 next_token_id 进行处理：

if req.grammar is not None:
  # FIXME: this try-except block is for handling unexpected xgrammar issue.
  try:
      if batch.spec_algorithm.is_none():
          # Normal decode: single token
          **req.grammar.accept_token(next_token_id)**
      elif batch.is_spec_v2:
          # Speculative decode: next_token_id is a list of accepted tokens
          for token_id in next_token_id:
              **req.grammar.accept_token(token_id)**
  except ValueError as e:
      # Grammar accept_token can raise ValueError if the token is not in the grammar.
      # This can happen if the grammar is not set correctly or the token is invalid.
      logger.error(
          f"Grammar accept_token failed for req {req.rid} with token {next_token_id}: {e}"
      )
      self.abort_request(AbortReq(rid=req.rid))

<aside> ⚠️

开启 reasoning_parser 时，grammar backend 会被包装成两段式 gating </aside>

位置：python/sglang/srt/constrained/base_grammar_backend.py