refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer by mudler · Pull Request #9380 · mudler/LocalAI

mudler · 2026-04-16T19:58:30Z

Drop the 295-line vendor/llama.py fork in favor of tinygrad.apps.llm, which now provides the Transformer blocks, GGUF loader (incl. Q4/Q6/Q8 quantization), KV-cache and generate loop we were maintaining ourselves.

What changed:

New vendor/appsllm_adapter.py (~90 LOC) — HF -> GGUF-native state-dict keymap, Transformer kwargs builder, _embed_hidden helper, and a hard rejection of qkv_bias models (Qwen2 / 2.5 are no longer supported; the apps.llm Transformer ties bias=False on Q/K/V projections).
backend.py routes both safetensors and GGUF paths through apps.llm.Transformer. Generation now delegates to its (greedy-only) generate(); Temperature / TopK / TopP / RepetitionPenalty are still accepted on the wire but ignored — documented in the module docstring.
Jinja chat render now passes enable_thinking=False so Qwen3's reasoning preamble doesn't eat the tool-call token budget on small models.
Embedding path uses _embed_hidden (block stack + output_norm) rather than the custom embed() method we were carrying on the vendored Transformer.
test.py gains TestAppsLLMAdapter covering the keymap rename, tied embedding fallback, unknown-key skipping, and qkv_bias rejection.
Makefile fixtures move from Qwen/Qwen2.5-0.5B-Instruct to Qwen/Qwen3-0.6B (apps.llm-compatible) and tool_parser from qwen3_xml to hermes (the HF chat template emits hermes-style JSON tool calls).

Verified with the docker-backed targets:
test-extra-backend-tinygrad 5/5 PASS
test-extra-backend-tinygrad-embeddings 3/3 PASS
test-extra-backend-tinygrad-whisper 4/4 PASS
test-extra-backend-tinygrad-sd 3/3 PASS

Description

This PR fixes #

Notes for Reviewers

Signed commits

Yes, I signed my commits.

…former Drop the 295-line vendor/llama.py fork in favor of `tinygrad.apps.llm`, which now provides the Transformer blocks, GGUF loader (incl. Q4/Q6/Q8 quantization), KV-cache and generate loop we were maintaining ourselves. What changed: - New vendor/appsllm_adapter.py (~90 LOC) — HF -> GGUF-native state-dict keymap, Transformer kwargs builder, `_embed_hidden` helper, and a hard rejection of qkv_bias models (Qwen2 / 2.5 are no longer supported; the apps.llm Transformer ties `bias=False` on Q/K/V projections). - backend.py routes both safetensors and GGUF paths through apps.llm.Transformer. Generation now delegates to its (greedy-only) `generate()`; Temperature / TopK / TopP / RepetitionPenalty are still accepted on the wire but ignored — documented in the module docstring. - Jinja chat render now passes `enable_thinking=False` so Qwen3's reasoning preamble doesn't eat the tool-call token budget on small models. - Embedding path uses `_embed_hidden` (block stack + output_norm) rather than the custom `embed()` method we were carrying on the vendored Transformer. - test.py gains TestAppsLLMAdapter covering the keymap rename, tied embedding fallback, unknown-key skipping, and qkv_bias rejection. - Makefile fixtures move from Qwen/Qwen2.5-0.5B-Instruct to Qwen/Qwen3-0.6B (apps.llm-compatible) and tool_parser from qwen3_xml to hermes (the HF chat template emits hermes-style JSON tool calls). Verified with the docker-backed targets: test-extra-backend-tinygrad 5/5 PASS test-extra-backend-tinygrad-embeddings 3/3 PASS test-extra-backend-tinygrad-whisper 4/4 PASS test-extra-backend-tinygrad-sd 3/3 PASS

mudler merged commit a0cbc46 into master Apr 16, 2026
38 of 40 checks passed

mudler deleted the chore/tinygrad-upstream branch April 16, 2026 20:41

localai-bot added the enhancement New feature or request label May 9, 2026

BrewTestBot mentioned this pull request May 11, 2026

localai 4.2.0 Homebrew/homebrew-core#282016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer#9380

refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer#9380
mudler merged 1 commit into
masterfrom
chore/tinygrad-upstream

mudler commented Apr 16, 2026

Uh oh!

Labels

2 participants

Uh oh!

Conversation

mudler commented Apr 16, 2026

Uh oh!

Labels

2 participants