Skip to content

docs: fix README streaming example (runnable + actually streams)#2987

Merged
LauraGPT merged 1 commit into
mainfrom
fix/readme-streaming-example
Jun 18, 2026
Merged

docs: fix README streaming example (runnable + actually streams)#2987
LauraGPT merged 1 commit into
mainfrom
fix/readme-streaming-example

Conversation

@LauraGPT

Copy link
Copy Markdown
Collaborator

Problem

The Usage section's streaming example was broken and misleading:

# Streaming real-time
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
result = model.generate(input="chunk.wav", cache={}, chunk_size=[0, 10, 5])
  1. Not runnablechunk.wav is a placeholder file that doesn't exist.
  2. Doesn't actually stream — a single one-shot generate() call missing is_final, encoder_chunk_look_back/decoder_chunk_look_back, and the chunk loop. A user can't learn streaming from it.

Fix

Replace it with the real chunk-by-chunk loop, matching the repo's own example (examples/industrial_data_pretraining/paraformer_streaming/demo.py): read the audio, iterate fixed-stride chunks, pass cache + is_final + look-back, and print partial text per chunk. Applied to both README.md and README_zh.md.

Verification

Ran the new snippet on GPU with paraformer-zh-streaming — it emits incremental text chunk by chunk and reconstructs the full sentence:

欢迎大 / 家来 / 体验达 / 摩院推 / 出的语 / 音识 / 别模型
-> 欢迎大家来体验达摩院推出的语音识别模型

Only the streaming code example changes — no header/structure edits.

The Usage streaming snippet used input="chunk.wav" (a file that does not
exist) and a single one-shot generate() call missing is_final /
encoder_chunk_look_back / the chunk loop, so it neither ran nor
demonstrated streaming.

Replace it with the real chunk-by-chunk loop (matching the repo example
examples/industrial_data_pretraining/paraformer_streaming/demo.py):
read audio, iterate fixed-stride chunks, pass cache + is_final +
look-back, print partial text per chunk.

Verified on GPU (paraformer-zh-streaming): emits incremental text per
chunk and reconstructs the full sentence.
@LauraGPT LauraGPT merged commit 7e01e94 into main Jun 18, 2026
@LauraGPT LauraGPT deleted the fix/readme-streaming-example branch June 18, 2026 10:47

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates both the English and Chinese README files to provide a complete, realistic example of streaming real-time audio chunk-by-chunk using the soundfile library. The feedback suggests converting the loaded audio to mono if it has multiple channels to prevent potential shape mismatch errors during model inference.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread README.md
import soundfile as sf
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
result = model.generate(input="chunk.wav", cache={}, chunk_size=[0, 10, 5])
audio, sr = sf.read("speech.wav", dtype="float32") # 16 kHz mono

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the user's speech.wav is a stereo audio file, sf.read will return a 2D array, which can cause shape mismatch errors during feature extraction or model inference. Converting the audio to mono if it has multiple channels makes the example more robust.

Suggested change
audio, sr = sf.read("speech.wav", dtype="float32") # 16 kHz mono
audio, sr = sf.read("speech.wav", dtype="float32")
if audio.ndim > 1:
audio = audio[:, 0] # Convert to mono if stereo
Comment thread README_zh.md
import soundfile as sf
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
result = model.generate(input="chunk.wav", cache={}, chunk_size=[0, 10, 5])
audio, sr = sf.read("speech.wav", dtype="float32") # 16 kHz 单声道

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the user's speech.wav is a stereo audio file, sf.read will return a 2D array, which can cause shape mismatch errors during feature extraction or model inference. Converting the audio to mono if it has multiple channels makes the example more robust.

Suggested change
audio, sr = sf.read("speech.wav", dtype="float32") # 16 kHz 单声道
audio, sr = sf.read("speech.wav", dtype="float32")
if audio.ndim > 1:
audio = audio[:, 0] # Convert to mono if stereo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant