test: numerical regression harness (frozen golden vs ggml/VAD/CIF/CTC output)#3003
Conversation
… output) Adds tests/ — runs each runtime tool on a fixed 6 s clip and diffs against frozen golden output, catching regressions in the ggml graphs, the FSMN-VAD state machine, the CIF predictor and CTC decode. - tests/run_regression.sh: auto-detects which tools are built; VAD model is auto-fetched (1.7 MB), ASR GGUFs tested when present or with RUN_FULL=1 (downloads from HF). Non-zero exit on any mismatch. BIN_DIR/MODELS_DIR overridable. - tests/sample.wav (~6 s) + tests/golden/*.txt: golden captured on Linux x86-64 with the f16 GGUFs from FunAudioLLM/*-GGUF. Verified locally: all present tools PASS; default mode fetches VAD + skips absent models.
There was a problem hiding this comment.
Code Review
This pull request introduces numerical regression tests for the FunASR llama.cpp runtime, including a test runner script, documentation, and frozen golden outputs for various models. Feedback on the test script suggests simplifying the tool execution logic by removing redundant binary lookups and fragile chaining, letting the runner function handle the binary path resolution directly.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| run_tool(){ # name binary golden key models... -- run... | ||
| local name="$1" b key gold; b=$(bin "$2"); gold="$DIR/golden/$3"; key="$4"; shift 4 | ||
| local models=(); while [ "$1" != "--" ]; do models+=("$1"); shift; done; shift | ||
| [ -n "$b" ] || { skipper "$name" "no binary"; return; } | ||
| [ -f "$gold" ] || { skipper "$name" "no golden"; return; } | ||
| ensure_models "$key" "${models[@]}" || { skipper "$name" "model missing (set RUN_FULL=1)"; return; } | ||
| check "$name" "$gold" "$("$@" 2>/dev/null)" | ||
| } | ||
|
|
||
| echo "== FunASR llama.cpp regression (sample.wav) ==" | ||
| B=$(bin llama-funasr-vad) && run_tool vad llama-funasr-vad vad.txt fsmn-vad fsmn-vad.gguf -- "$B" -m "$MODELS/fsmn-vad.gguf" -a "$SAMPLE" | ||
| B=$(bin llama-funasr-sensevoice) && run_tool sensevoice llama-funasr-sensevoice sensevoice.txt sensevoice sensevoice-small-f16.gguf -- "$B" -m "$MODELS/sensevoice-small-f16.gguf" -a "$SAMPLE" | ||
| B=$(bin llama-funasr-paraformer) && run_tool paraformer llama-funasr-paraformer paraformer.txt paraformer paraformer-f16.gguf -- "$B" -m "$MODELS/paraformer-f16.gguf" -a "$SAMPLE" | ||
| B=$(bin llama-funasr-cli) && run_tool nano llama-funasr-cli nano.txt nano funasr-encoder-f16.gguf qwen3-0.6b-q8_0.gguf -- "$B" --enc "$MODELS/funasr-encoder-f16.gguf" -m "$MODELS/qwen3-0.6b-q8_0.gguf" -a "$SAMPLE" |
There was a problem hiding this comment.
The current design of prefixing each tool run with B=$(bin ...) and chaining with && is redundant and fragile.
- Redundancy: The binary path is looked up twice—once via
B=$(bin ...)in the caller and once viab=$(bin "$2")insiderun_tool. - Fragility: The
binfunction currently returns0even when a binary is not found (due to bash's default behavior when noif/elifbranch matches). Ifbinis ever refactored to return1on failure (which is standard for lookup functions), the&&chain will short-circuit, preventingrun_toolfrom executing and silently skipping the skip reporting (skipperwon't be called).
We can simplify this by letting run_tool prepend the resolved binary $b to the command arguments, removing the need for the B variable and the && chain entirely.
| run_tool(){ # name binary golden key models... -- run... | |
| local name="$1" b key gold; b=$(bin "$2"); gold="$DIR/golden/$3"; key="$4"; shift 4 | |
| local models=(); while [ "$1" != "--" ]; do models+=("$1"); shift; done; shift | |
| [ -n "$b" ] || { skipper "$name" "no binary"; return; } | |
| [ -f "$gold" ] || { skipper "$name" "no golden"; return; } | |
| ensure_models "$key" "${models[@]}" || { skipper "$name" "model missing (set RUN_FULL=1)"; return; } | |
| check "$name" "$gold" "$("$@" 2>/dev/null)" | |
| } | |
| echo "== FunASR llama.cpp regression (sample.wav) ==" | |
| B=$(bin llama-funasr-vad) && run_tool vad llama-funasr-vad vad.txt fsmn-vad fsmn-vad.gguf -- "$B" -m "$MODELS/fsmn-vad.gguf" -a "$SAMPLE" | |
| B=$(bin llama-funasr-sensevoice) && run_tool sensevoice llama-funasr-sensevoice sensevoice.txt sensevoice sensevoice-small-f16.gguf -- "$B" -m "$MODELS/sensevoice-small-f16.gguf" -a "$SAMPLE" | |
| B=$(bin llama-funasr-paraformer) && run_tool paraformer llama-funasr-paraformer paraformer.txt paraformer paraformer-f16.gguf -- "$B" -m "$MODELS/paraformer-f16.gguf" -a "$SAMPLE" | |
| B=$(bin llama-funasr-cli) && run_tool nano llama-funasr-cli nano.txt nano funasr-encoder-f16.gguf qwen3-0.6b-q8_0.gguf -- "$B" --enc "$MODELS/funasr-encoder-f16.gguf" -m "$MODELS/qwen3-0.6b-q8_0.gguf" -a "$SAMPLE" | |
| run_tool(){ # name binary golden key models... -- run... | |
| local name="$1" b key gold; b=$(bin "$2"); gold="$DIR/golden/$3"; key="$4"; shift 4 | |
| local models=(); while [ "$1" != "--" ]; do models+=("$1"); shift; done; shift | |
| [ -n "$b" ] || { skipper "$name" "no binary"; return; } | |
| [ -f "$gold" ] || { skipper "$name" "no golden"; return; } | |
| ensure_models "$key" "${models[@]}" || { skipper "$name" "model missing (set RUN_FULL=1)"; return; } | |
| check "$name" "$gold" "$("$b" "$@" 2>/dev/null)" | |
| } | |
| echo "== FunASR llama.cpp regression (sample.wav) ==" | |
| run_tool vad llama-funasr-vad vad.txt fsmn-vad fsmn-vad.gguf -- -m "$MODELS/fsmn-vad.gguf" -a "$SAMPLE" | |
| run_tool sensevoice llama-funasr-sensevoice sensevoice.txt sensevoice sensevoice-small-f16.gguf -- -m "$MODELS/sensevoice-small-f16.gguf" -a "$SAMPLE" | |
| run_tool paraformer llama-funasr-paraformer paraformer.txt paraformer paraformer-f16.gguf -- -m "$MODELS/paraformer-f16.gguf" -a "$SAMPLE" | |
| run_tool nano llama-funasr-cli nano.txt nano funasr-encoder-f16.gguf qwen3-0.6b-q8_0.gguf -- --enc "$MODELS/funasr-encoder-f16.gguf" -m "$MODELS/qwen3-0.6b-q8_0.gguf" -a "$SAMPLE" |
C2 (roadmap P3) — a numerical regression harness so future changes can't silently break the runtime.
What
tests/runs each built tool on a fixed ~6 s clip and diffs the output against frozen golden, catching regressions in the ggml graphs, the FSMN-VAD state machine, the CIF predictor and CTC decode.tests/run_regression.sh— auto-detects which tools are built; the tiny VAD model (1.7 MB) is auto-fetched, ASR GGUFs are tested when present locally or withRUN_FULL=1(downloads fromFunAudioLLM/*-GGUF). Non-zero exit on any mismatch.BIN_DIR/MODELS_DIRoverridable — drops straight into a CI step.tests/sample.wav(~6 s, 192 KB) +tests/golden/*.txt— golden captured on Linux x86-64 with the published f16 GGUFs.Verified (Linux)
vad,sensevoice,paraformer,nano— 4/4).Additive —
runtime/llama.cpp/tests/only. Golden is exact-match on the reference platform; update only on a deliberate, reviewed output change.