Skip to content

test: numerical regression harness (frozen golden vs ggml/VAD/CIF/CTC output)#3003

Merged
LauraGPT merged 3 commits into
mainfrom
feat/c2-regression-tests
Jun 20, 2026
Merged

test: numerical regression harness (frozen golden vs ggml/VAD/CIF/CTC output)#3003
LauraGPT merged 3 commits into
mainfrom
feat/c2-regression-tests

Conversation

@LauraGPT

Copy link
Copy Markdown
Collaborator

C2 (roadmap P3) — a numerical regression harness so future changes can't silently break the runtime.

What

tests/ runs each built tool on a fixed ~6 s clip and diffs the output against frozen golden, catching regressions in the ggml graphs, the FSMN-VAD state machine, the CIF predictor and CTC decode.

  • tests/run_regression.sh — auto-detects which tools are built; the tiny VAD model (1.7 MB) is auto-fetched, ASR GGUFs are tested when present locally or with RUN_FULL=1 (downloads from FunAudioLLM/*-GGUF). Non-zero exit on any mismatch. BIN_DIR / MODELS_DIR overridable — drops straight into a CI step.
  • tests/sample.wav (~6 s, 192 KB) + tests/golden/*.txt — golden captured on Linux x86-64 with the published f16 GGUFs.

Verified (Linux)

  • All present tools PASS (vad, sensevoice, paraformer, nano — 4/4).
  • Default mode (no local models) fetches VAD, PASS, and cleanly SKIPs absent ASR models (exit 0).

Additive — runtime/llama.cpp/tests/ only. Golden is exact-match on the reference platform; update only on a deliberate, reviewed output change.

… output)

Adds tests/ — runs each runtime tool on a fixed 6 s clip and diffs against frozen
golden output, catching regressions in the ggml graphs, the FSMN-VAD state machine,
the CIF predictor and CTC decode.

- tests/run_regression.sh: auto-detects which tools are built; VAD model is auto-fetched
  (1.7 MB), ASR GGUFs tested when present or with RUN_FULL=1 (downloads from HF).
  Non-zero exit on any mismatch. BIN_DIR/MODELS_DIR overridable.
- tests/sample.wav (~6 s) + tests/golden/*.txt: golden captured on Linux x86-64 with the
  f16 GGUFs from FunAudioLLM/*-GGUF.

Verified locally: all present tools PASS; default mode fetches VAD + skips absent models.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces numerical regression tests for the FunASR llama.cpp runtime, including a test runner script, documentation, and frozen golden outputs for various models. Feedback on the test script suggests simplifying the tool execution logic by removing redundant binary lookups and fragile chaining, letting the runner function handle the binary path resolution directly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +36 to +49
run_tool(){ # name binary golden key models... -- run...
local name="$1" b key gold; b=$(bin "$2"); gold="$DIR/golden/$3"; key="$4"; shift 4
local models=(); while [ "$1" != "--" ]; do models+=("$1"); shift; done; shift
[ -n "$b" ] || { skipper "$name" "no binary"; return; }
[ -f "$gold" ] || { skipper "$name" "no golden"; return; }
ensure_models "$key" "${models[@]}" || { skipper "$name" "model missing (set RUN_FULL=1)"; return; }
check "$name" "$gold" "$("$@" 2>/dev/null)"
}

echo "== FunASR llama.cpp regression (sample.wav) =="
B=$(bin llama-funasr-vad) && run_tool vad llama-funasr-vad vad.txt fsmn-vad fsmn-vad.gguf -- "$B" -m "$MODELS/fsmn-vad.gguf" -a "$SAMPLE"
B=$(bin llama-funasr-sensevoice) && run_tool sensevoice llama-funasr-sensevoice sensevoice.txt sensevoice sensevoice-small-f16.gguf -- "$B" -m "$MODELS/sensevoice-small-f16.gguf" -a "$SAMPLE"
B=$(bin llama-funasr-paraformer) && run_tool paraformer llama-funasr-paraformer paraformer.txt paraformer paraformer-f16.gguf -- "$B" -m "$MODELS/paraformer-f16.gguf" -a "$SAMPLE"
B=$(bin llama-funasr-cli) && run_tool nano llama-funasr-cli nano.txt nano funasr-encoder-f16.gguf qwen3-0.6b-q8_0.gguf -- "$B" --enc "$MODELS/funasr-encoder-f16.gguf" -m "$MODELS/qwen3-0.6b-q8_0.gguf" -a "$SAMPLE"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current design of prefixing each tool run with B=$(bin ...) and chaining with && is redundant and fragile.

  1. Redundancy: The binary path is looked up twice—once via B=$(bin ...) in the caller and once via b=$(bin "$2") inside run_tool.
  2. Fragility: The bin function currently returns 0 even when a binary is not found (due to bash's default behavior when no if/elif branch matches). If bin is ever refactored to return 1 on failure (which is standard for lookup functions), the && chain will short-circuit, preventing run_tool from executing and silently skipping the skip reporting (skipper won't be called).

We can simplify this by letting run_tool prepend the resolved binary $b to the command arguments, removing the need for the B variable and the && chain entirely.

Suggested change
run_tool(){ # name binary golden key models... -- run...
local name="$1" b key gold; b=$(bin "$2"); gold="$DIR/golden/$3"; key="$4"; shift 4
local models=(); while [ "$1" != "--" ]; do models+=("$1"); shift; done; shift
[ -n "$b" ] || { skipper "$name" "no binary"; return; }
[ -f "$gold" ] || { skipper "$name" "no golden"; return; }
ensure_models "$key" "${models[@]}" || { skipper "$name" "model missing (set RUN_FULL=1)"; return; }
check "$name" "$gold" "$("$@" 2>/dev/null)"
}
echo "== FunASR llama.cpp regression (sample.wav) =="
B=$(bin llama-funasr-vad) && run_tool vad llama-funasr-vad vad.txt fsmn-vad fsmn-vad.gguf -- "$B" -m "$MODELS/fsmn-vad.gguf" -a "$SAMPLE"
B=$(bin llama-funasr-sensevoice) && run_tool sensevoice llama-funasr-sensevoice sensevoice.txt sensevoice sensevoice-small-f16.gguf -- "$B" -m "$MODELS/sensevoice-small-f16.gguf" -a "$SAMPLE"
B=$(bin llama-funasr-paraformer) && run_tool paraformer llama-funasr-paraformer paraformer.txt paraformer paraformer-f16.gguf -- "$B" -m "$MODELS/paraformer-f16.gguf" -a "$SAMPLE"
B=$(bin llama-funasr-cli) && run_tool nano llama-funasr-cli nano.txt nano funasr-encoder-f16.gguf qwen3-0.6b-q8_0.gguf -- "$B" --enc "$MODELS/funasr-encoder-f16.gguf" -m "$MODELS/qwen3-0.6b-q8_0.gguf" -a "$SAMPLE"
run_tool(){ # name binary golden key models... -- run...
local name="$1" b key gold; b=$(bin "$2"); gold="$DIR/golden/$3"; key="$4"; shift 4
local models=(); while [ "$1" != "--" ]; do models+=("$1"); shift; done; shift
[ -n "$b" ] || { skipper "$name" "no binary"; return; }
[ -f "$gold" ] || { skipper "$name" "no golden"; return; }
ensure_models "$key" "${models[@]}" || { skipper "$name" "model missing (set RUN_FULL=1)"; return; }
check "$name" "$gold" "$("$b" "$@" 2>/dev/null)"
}
echo "== FunASR llama.cpp regression (sample.wav) =="
run_tool vad llama-funasr-vad vad.txt fsmn-vad fsmn-vad.gguf -- -m "$MODELS/fsmn-vad.gguf" -a "$SAMPLE"
run_tool sensevoice llama-funasr-sensevoice sensevoice.txt sensevoice sensevoice-small-f16.gguf -- -m "$MODELS/sensevoice-small-f16.gguf" -a "$SAMPLE"
run_tool paraformer llama-funasr-paraformer paraformer.txt paraformer paraformer-f16.gguf -- -m "$MODELS/paraformer-f16.gguf" -a "$SAMPLE"
run_tool nano llama-funasr-cli nano.txt nano funasr-encoder-f16.gguf qwen3-0.6b-q8_0.gguf -- --enc "$MODELS/funasr-encoder-f16.gguf" -m "$MODELS/qwen3-0.6b-q8_0.gguf" -a "$SAMPLE"
@LauraGPT LauraGPT merged commit 3610b75 into main Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant