Skip to content

feat: generic chat_template_kwargs (model config + per-request metadata)#10359

Merged
mudler merged 8 commits into
masterfrom
feat/chat-template-kwargs
Jun 16, 2026
Merged

feat: generic chat_template_kwargs (model config + per-request metadata)#10359
mudler merged 8 commits into
masterfrom
feat/chat-template-kwargs

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

What

Closes #10329.

Adds a generic way to pass arbitrary jinja chat-template variables (e.g. Qwen3's preserve_thinking) to backends, so a new template lever no longer needs a hardcoded block in grpc-server.cpp (as enable_thinking in #8973 and reasoning_effort in #10184 each did).

Two sources, no new API surface:

  • Model YAML - a new chat_template_kwargs: map (typed):
    name: qwen3
    chat_template_kwargs:
      preserve_thinking: true
  • Per-request - the existing (previously unwired) OpenAI metadata field. String values; "true"/"false" are coerced to booleans, anything else stays a string:
    {
      "model": "qwen3",
      "messages": [{"role": "user", "content": "hi"}],
      "metadata": { "preserve_thinking": "true", "enable_thinking": "false" }
    }

How

Precedence (low -> high): config map < server reasoning levers (enable_thinking/reasoning_effort) < per-request metadata.

  • core/config - ModelConfig.ChatTemplateKwargs (YAML) + RequestMetadata (request-scoped carrier) + ResolveChatTemplateKwargs(meta) which layers the config map under the coerced metadata and skips the reserved chat_template_kwargs key.
  • core/backend/options.go (gRPCPredictOpts) - merges client RequestMetadata over the server-derived levers (so a per-request enable_thinking/reasoning_effort override reaches every backend via the standalone metadata keys), then serialises the resolved map into a single metadata["chat_template_kwargs"] JSON blob, written last so a client cannot clobber it.
  • core/http/middleware/request.go - stamps the request metadata onto the per-request config.
  • backend/cpp/llama-cpp/grpc-server.cpp - replaces the two per-key enable_thinking/reasoning_effort blocks (streaming + non-streaming) with one generic block that parses the blob and merges every key into body_json["chat_template_kwargs"]. New template levers now need no C++ change.
  • Docs + tests.

The standalone metadata["enable_thinking"]/["reasoning_effort"] keys are still emitted (sglang, mlx-vlm, mlx-distributed, vllm-omni read them); other backends receive the new chat_template_kwargs metadata key and harmlessly ignore it. enable_thinking reaches llama.cpp as a real JSON bool (preserving the old == "true" behaviour); reasoning_effort stays a string.

Notes / by design

  • A model-YAML chat_template_kwargs value is folded only into the llama.cpp blob, while per-request metadata keys also become standalone gRPC metadata keys (so they reach the Python backends). This asymmetry is intentional: the feature is llama.cpp/jinja-centric, and typed (non-boolean) values are YAML-only.
  • This moves enable_thinking/reasoning_effort chat_template_kwargs construction from C++ into Go; llama.cpp output is unchanged, but the gRPC metadata now carries a chat_template_kwargs blob whenever a reasoning lever or kwarg is active.
  • The C++ change was reviewed by reading; it is verified by the CI llama.cpp backend build.

Test plan

  • go test ./core/config/ ./core/backend/ ./core/http/middleware/ - green (resolver precedence/coercion/reserved-key, gRPCPredictOpts blob + client-override + anti-clobber + omit, middleware metadata wiring).
  • golangci-lint (new-from-merge-base) - 0 issues; gofmt clean.
  • llama.cpp backend build (CI) - confirms the generic C++ merge compiles.

Assisted-by: Claude:claude-opus-4-8

mudler added 8 commits June 16, 2026 07:49
Adds the ChatTemplateKwargs model-config map and RequestMetadata carrier,
plus ResolveChatTemplateKwargs which layers the config map under coerced
request metadata. Foundation for generic jinja chat-template kwargs (issue #10329).

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
gRPCPredictOpts now merges per-request client metadata over the server-derived
enable_thinking/reasoning_effort (reaching all backends via the standalone keys)
and serialises the resolved chat_template_kwargs map into a JSON blob for
llama.cpp, written last so a client cannot clobber it. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The OpenAI request metadata field was parsed but unused; stamp it onto the
per-request ModelConfig so gRPCPredictOpts forwards it as chat_template_kwargs
overrides. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…cks)

Replace the per-key enable_thinking/reasoning_effort handling in both the
streaming and non-streaming chat paths with a single block that parses the
chat_template_kwargs JSON blob resolved by the Go layer and merges every key
into body_json. New jinja template levers (e.g. preserve_thinking) now need
no C++ change. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…kwargs blob

Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Adds an ECHO_PREDICT_METADATA marker to the mock-backend that echoes the
received PredictOptions.Metadata, and an app_test.go spec that drives a real
/v1/chat/completions request (model chat_template_kwargs + per-request metadata
override) and asserts the exact metadata + chat_template_kwargs blob the REST
layer forwards to gRPC. Locks the REST->gRPC contract against regressions. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
chat_template_kwargs is a free-form map[string]any (like engine_args, already
on the list), not a scalar the config UI registry can surface, so it is exempt
from the registry-entry requirement. Fixes the TestAllFieldsHaveRegistryEntries
failure introduced by the new field. Issue #10329.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler merged commit 1ab61a0 into master Jun 16, 2026
103 of 104 checks passed
@mudler mudler deleted the feat/chat-template-kwargs branch June 16, 2026 10:16
@localai-bot localai-bot added the enhancement New feature or request label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

2 participants