feat: generic chat_template_kwargs (model config + per-request metadata) by localai-bot · Pull Request #10359 · mudler/LocalAI

localai-bot · 2026-06-16T08:28:41Z

What

Adds a generic way to pass arbitrary jinja chat-template variables (e.g. Qwen3's preserve_thinking) to backends, so a new template lever no longer needs a hardcoded block in grpc-server.cpp (as enable_thinking in #8973 and reasoning_effort in #10184 each did).

Two sources, no new API surface:

Model YAML - a new chat_template_kwargs: map (typed):

name: qwen3
chat_template_kwargs:
  preserve_thinking: true

Per-request - the existing (previously unwired) OpenAI metadata field. String values; "true"/"false" are coerced to booleans, anything else stays a string:

{
  "model": "qwen3",
  "messages": [{"role": "user", "content": "hi"}],
  "metadata": { "preserve_thinking": "true", "enable_thinking": "false" }
}

How

Precedence (low -> high): config map < server reasoning levers (enable_thinking/reasoning_effort) < per-request metadata.

core/config - ModelConfig.ChatTemplateKwargs (YAML) + RequestMetadata (request-scoped carrier) + ResolveChatTemplateKwargs(meta) which layers the config map under the coerced metadata and skips the reserved chat_template_kwargs key.
core/backend/options.go (gRPCPredictOpts) - merges client RequestMetadata over the server-derived levers (so a per-request enable_thinking/reasoning_effort override reaches every backend via the standalone metadata keys), then serialises the resolved map into a single metadata["chat_template_kwargs"] JSON blob, written last so a client cannot clobber it.
core/http/middleware/request.go - stamps the request metadata onto the per-request config.
backend/cpp/llama-cpp/grpc-server.cpp - replaces the two per-key enable_thinking/reasoning_effort blocks (streaming + non-streaming) with one generic block that parses the blob and merges every key into body_json["chat_template_kwargs"]. New template levers now need no C++ change.
Docs + tests.

The standalone metadata["enable_thinking"]/["reasoning_effort"] keys are still emitted (sglang, mlx-vlm, mlx-distributed, vllm-omni read them); other backends receive the new chat_template_kwargs metadata key and harmlessly ignore it. enable_thinking reaches llama.cpp as a real JSON bool (preserving the old == "true" behaviour); reasoning_effort stays a string.

Notes / by design

A model-YAML chat_template_kwargs value is folded only into the llama.cpp blob, while per-request metadata keys also become standalone gRPC metadata keys (so they reach the Python backends). This asymmetry is intentional: the feature is llama.cpp/jinja-centric, and typed (non-boolean) values are YAML-only.
This moves enable_thinking/reasoning_effort chat_template_kwargs construction from C++ into Go; llama.cpp output is unchanged, but the gRPC metadata now carries a chat_template_kwargs blob whenever a reasoning lever or kwarg is active.
The C++ change was reviewed by reading; it is verified by the CI llama.cpp backend build.

Test plan

go test ./core/config/ ./core/backend/ ./core/http/middleware/ - green (resolver precedence/coercion/reserved-key, gRPCPredictOpts blob + client-override + anti-clobber + omit, middleware metadata wiring).
golangci-lint (new-from-merge-base) - 0 issues; gofmt clean.
llama.cpp backend build (CI) - confirms the generic C++ merge compiles.

Assisted-by: Claude:claude-opus-4-8

Adds the ChatTemplateKwargs model-config map and RequestMetadata carrier, plus ResolveChatTemplateKwargs which layers the config map under coerced request metadata. Foundation for generic jinja chat-template kwargs (issue #10329). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

gRPCPredictOpts now merges per-request client metadata over the server-derived enable_thinking/reasoning_effort (reaching all backends via the standalone keys) and serialises the resolved chat_template_kwargs map into a JSON blob for llama.cpp, written last so a client cannot clobber it. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The OpenAI request metadata field was parsed but unused; stamp it onto the per-request ModelConfig so gRPCPredictOpts forwards it as chat_template_kwargs overrides. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…cks) Replace the per-key enable_thinking/reasoning_effort handling in both the streaming and non-streaming chat paths with a single block that parses the chat_template_kwargs JSON blob resolved by the Go layer and merges every key into body_json. New jinja template levers (e.g. preserve_thinking) now need no C++ change. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…kwargs blob Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Adds an ECHO_PREDICT_METADATA marker to the mock-backend that echoes the received PredictOptions.Metadata, and an app_test.go spec that drives a real /v1/chat/completions request (model chat_template_kwargs + per-request metadata override) and asserts the exact metadata + chat_template_kwargs blob the REST layer forwards to gRPC. Locks the REST->gRPC contract against regressions. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

chat_template_kwargs is a free-form map[string]any (like engine_args, already on the list), not a scalar the config UI registry can surface, so it is exempt from the registry-entry requirement. Fixes the TestAllFieldsHaveRegistryEntries failure introduced by the new field. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 8 commits June 16, 2026 07:49

docs: document custom chat_template_kwargs (model + per-request)

ed29a87

Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

test(backend): pin reasoning_effort as a string in the chat_template_…

0f36353

…kwargs blob Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler merged commit 1ab61a0 into master Jun 16, 2026
103 of 104 checks passed

mudler deleted the feat/chat-template-kwargs branch June 16, 2026 10:16

localai-bot added the enhancement New feature or request label Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: generic chat_template_kwargs (model config + per-request metadata)#10359

feat: generic chat_template_kwargs (model config + per-request metadata)#10359
mudler merged 8 commits into
masterfrom
feat/chat-template-kwargs

localai-bot commented Jun 16, 2026

Uh oh!

Labels

2 participants

Uh oh!

Conversation

localai-bot commented Jun 16, 2026

What

How

Notes / by design

Test plan

Uh oh!

Labels

2 participants