feat: add LocalVQE backend and audio transformations UI by richiejp · Pull Request #9640 · mudler/LocalAI

richiejp · 2026-05-02T11:42:50Z

Adds LocalVQE and a new REST API plus UI.

Introduce a generic "audio transform" capability for any audio-in / audio-out
operation (echo cancellation, noise suppression, dereverberation, voice
conversion, etc.) and ship LocalVQE as the first backend implementation.

Backend protocol:

Two new gRPC RPCs in backend.proto: unary AudioTransform for batch and
bidirectional AudioTransformStream for low-latency frame-by-frame use.
This is the first bidi stream in the proto; per-frame unary at LocalVQE's
16 ms hop would be RTT-bound. Wire it through pkg/grpc/{client,server,
embed,interface,base} with paired-channel ergonomics.

LocalVQE backend (backend/go/localvqe/):

Go-Purego wrapper around upstream liblocalvqe.so. CMake builds the upstream
shared lib + its libggml-cpu-*.so runtime variants directly — no MODULE
wrapper needed because LocalVQE handles CPU feature selection internally
via GGML_BACKEND_DL.
Sets GGML_NTHREADS from opts.Threads (or runtime.NumCPU()-1) — without it
LocalVQE runs single-threaded at ~1× realtime instead of the documented
~9.6×.
Reference-length policy: zero-pad short refs, truncate long ones (the
trailing portion can't have leaked into a mic that wasn't recording).
Ginkgo test suite (9 always-on specs + 2 model-gated).

HTTP layer:

POST /v1/audio/transformations (alias /audio/transform): multipart batch
endpoint, accepts audio + optional reference + params[*]=v form fields.
Persists inputs alongside the output in GeneratedContentDir/audio so the
React UI history can replay past (audio, reference, output) triples.
GET /v1/audio/transformations/stream: WebSocket bidi, 16 ms PCM frames
(interleaved stereo mic+ref in, mono out). JSON session.update envelope
for config; constants hoisted in core/schema/audio_transform.go.
ffmpeg-based input normalisation to 16 kHz mono s16 WAV via the existing
utils.AudioToWav (with passthrough fast-path), so the user can upload any
format / rate without seeing the model's strict 16 kHz constraint.
BackendTraceAudioTransform integration so /api/backend-traces and the
Traces UI light up with audio_snippet base64 and timing.

Auth + capability + importer:

FLAG_AUDIO_TRANSFORM (model_config.go), FeatureAudioTransform (default-on,
in APIFeatures), three RouteFeatureRegistry rows.
localvqe added to knownPrefOnlyBackends with modality "audio-transform".
Gallery entry localvqe-v1-1.3m (sha256-pinned, hosted on
huggingface.co/LocalAI-io/LocalVQE).

React UI:

New Studio "Transform" tab and /app/transform page with two AudioInput
components (Upload + Record tabs, drag-drop, mic capture).
Echo-test button: records mic while playing the loaded reference through
the speakers — the mic naturally picks up speaker bleed, giving a real
(mic, ref) pair for AEC testing without leaving the UI.
Reusable WaveformPlayer (canvas peaks + click-to-seek + audio controls)
and useAudioPeaks hook (shared module-scoped AudioContext to avoid
hitting browser context limits with three players on one page); migrated
TTS, Sound, Traces audio blocks to use it.
Past runs saved in localStorage via useMediaHistory('audio-transform') —
the history entry stores all three URLs so clicking re-renders the full
triple, not just the output.

Build + e2e:

11 matrix entries removed from .github/workflows/backend.yml (CUDA, ROCm,
SYCL, Metal, L4T): upstream supports only CPU + Vulkan, so we ship those
two and let GPU-class hardware route through Vulkan in the gallery
capabilities map.
tests-localvqe-grpc-transform job in test-extra.yml (gated on
detect-changes.outputs.localvqe).
New audio_transform capability + 4 specs in tests/e2e-backends.
Playwright spec suite in core/http/react-ui/e2e/audio-transform.spec.js
(8 specs covering tabs, file upload, multipart shape, history, errors).

Docs:

New docs/content/features/audio-transform.md covering the (audio,
reference) mental model, batch + WebSocket wire formats, LocalVQE param
keys, and a YAML config example. Cross-links from text-to-audio and
audio-to-text feature pages.

Introduce a generic "audio transform" capability for any audio-in / audio-out operation (echo cancellation, noise suppression, dereverberation, voice conversion, etc.) and ship LocalVQE as the first backend implementation. Backend protocol: - Two new gRPC RPCs in backend.proto: unary AudioTransform for batch and bidirectional AudioTransformStream for low-latency frame-by-frame use. This is the first bidi stream in the proto; per-frame unary at LocalVQE's 16 ms hop would be RTT-bound. Wire it through pkg/grpc/{client,server, embed,interface,base} with paired-channel ergonomics. LocalVQE backend (backend/go/localvqe/): - Go-Purego wrapper around upstream liblocalvqe.so. CMake builds the upstream shared lib + its libggml-cpu-*.so runtime variants directly — no MODULE wrapper needed because LocalVQE handles CPU feature selection internally via GGML_BACKEND_DL. - Sets GGML_NTHREADS from opts.Threads (or runtime.NumCPU()-1) — without it LocalVQE runs single-threaded at ~1× realtime instead of the documented ~9.6×. - Reference-length policy: zero-pad short refs, truncate long ones (the trailing portion can't have leaked into a mic that wasn't recording). - Ginkgo test suite (9 always-on specs + 2 model-gated). HTTP layer: - POST /audio/transformations (alias /audio/transform): multipart batch endpoint, accepts audio + optional reference + params[*]=v form fields. Persists inputs alongside the output in GeneratedContentDir/audio so the React UI history can replay past (audio, reference, output) triples. - GET /audio/transformations/stream: WebSocket bidi, 16 ms PCM frames (interleaved stereo mic+ref in, mono out). JSON session.update envelope for config; constants hoisted in core/schema/audio_transform.go. - ffmpeg-based input normalisation to 16 kHz mono s16 WAV via the existing utils.AudioToWav (with passthrough fast-path), so the user can upload any format / rate without seeing the model's strict 16 kHz constraint. - BackendTraceAudioTransform integration so /api/backend-traces and the Traces UI light up with audio_snippet base64 and timing. - Routes registered under routes/localai.go (LocalAI extension; OpenAI has no /audio/transformations endpoint), traced via TraceMiddleware. Auth + capability + importer: - FLAG_AUDIO_TRANSFORM (model_config.go), FeatureAudioTransform (default-on, in APIFeatures), three RouteFeatureRegistry rows. - localvqe added to knownPrefOnlyBackends with modality "audio-transform". - Gallery entry localvqe-v1-1.3m (sha256-pinned, hosted on huggingface.co/LocalAI-io/LocalVQE). React UI: - New /app/transform page surfaced via a dedicated "Enhance" sidebar section (sibling of Tools / Biometrics) — the page is enhancement, not generation, so it lives outside Studio. Two AudioInput components (Upload + Record tabs, drag-drop, mic capture). - Echo-test button: records mic while playing the loaded reference through the speakers — the mic naturally picks up speaker bleed, giving a real (mic, ref) pair for AEC testing without leaving the UI. - Reusable WaveformPlayer (canvas peaks + click-to-seek + audio controls) and useAudioPeaks hook (shared module-scoped AudioContext to avoid hitting browser context limits with three players on one page); migrated TTS, Sound, Traces audio blocks to use it. - Past runs saved in localStorage via useMediaHistory('audio-transform') — the history entry stores all three URLs so clicking re-renders the full triple, not just the output. Build + e2e: - 11 matrix entries removed from .github/workflows/backend.yml (CUDA, ROCm, SYCL, Metal, L4T): upstream supports only CPU + Vulkan, so we ship those two and let GPU-class hardware route through Vulkan in the gallery capabilities map. - tests-localvqe-grpc-transform job in test-extra.yml (gated on detect-changes.outputs.localvqe). - New audio_transform capability + 4 specs in tests/e2e-backends. - Playwright spec suite in core/http/react-ui/e2e/audio-transform.spec.js (8 specs covering tabs, file upload, multipart shape, history, errors). Docs: - New docs/content/features/audio-transform.md covering the (audio, reference) mental model, batch + WebSocket wire formats, LocalVQE param keys, and a YAML config example. Cross-links from text-to-audio and audio-to-text feature pages. Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent TaskCreate] Signed-off-by: Richard Palethorpe <io@richiejp.com>

richiejp force-pushed the feat/localvqe branch 2 times, most recently from ff7ff39 to 31a2460 Compare May 3, 2026 12:30

richiejp marked this pull request as ready for review May 3, 2026 13:06

richiejp force-pushed the feat/localvqe branch 5 times, most recently from 6620618 to 6930b22 Compare May 4, 2026 11:21

richiejp force-pushed the feat/localvqe branch from 6930b22 to 5237e30 Compare May 4, 2026 13:31

mudler approved these changes May 4, 2026

View reviewed changes

mudler merged commit bb033b1 into mudler:master May 4, 2026
50 checks passed

localai-bot added the enhancement New feature or request label May 9, 2026

BrewTestBot mentioned this pull request May 11, 2026

localai 4.2.0 Homebrew/homebrew-core#282016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add LocalVQE backend and audio transformations UI#9640

feat: add LocalVQE backend and audio transformations UI#9640
mudler merged 1 commit into
mudler:masterfrom
richiejp:feat/localvqe

richiejp commented May 2, 2026 •

edited

Loading

Uh oh!

Labels

3 participants

Uh oh!

Conversation

richiejp commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Labels

3 participants

richiejp commented May 2, 2026 •

edited

Loading