fix(cli): valid SRT timestamps + clear json duration fields#2982
Conversation
- SRT: when the model returns no per-sentence timestamps, the fallback used a bogus 99:59:59,999 end time. Now span the real audio duration so the SRT is valid. - json: duration_s was actually the processing time (a 60s file showed duration_s: 2.3). Split into audio_duration_s (real audio length) and processing_s (elapsed). Safe rename - the new CLI is unreleased. Found by smoke-testing the funasr CLI as a new user.
There was a problem hiding this comment.
Code Review
This pull request updates the output formatting in funasr/cli.py to dynamically retrieve and use the actual audio duration (via soundfile) for JSON and SRT outputs when per-sentence timestamps are missing, rather than using hardcoded placeholder values. The reviewer suggested applying a similar fix to the TSV output format to ensure consistency and correct metadata.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| elif fmt == "tsv": | ||
| return format_tsv(segments) if segments else f"start\tend\ttext\n0.000\t0.000\t{text}" |
There was a problem hiding this comment.
The TSV output format has the same issue as the SRT format: when there are no per-sentence timestamps (segments is empty), it returns a fallback string with a hardcoded 0.000 end time (0.000\\t0.000\\t{text}). To be consistent with the SRT fix and ensure correct metadata, we should retrieve the real audio duration and use it as the end timestamp in the TSV fallback.
elif fmt == "tsv":\n if segments:\n return format_tsv(segments)\n try:\n import soundfile as sf\n dur_s = round(sf.info(audio_path).duration, 3)\n except Exception:\n dur_s = 0.0\n return f"start\\tend\\ttext\\n0.000\\t{dur_s:.3f}\\t{text}"User-friendly funasr CLI (funasr audio.wav -f json/srt/text), clearer errors, and accumulated fixes since 1.3.9: - New funasr CLI replacing the Hydra entrypoint (old one -> funasr-hydra) - fix: clear FileNotFoundError for missing audio paths (#2981) - fix: valid SRT timestamps + clear json duration fields (#2982, #2983) - fix: correct fun-asr-nano CLI model id -2512 (#2984) - feat: batched VAD-segment decoding for Fun-ASR-Nano vLLM (#2979) - fix: warn on vLLM dtype=fp16 degraded output (#2980) - fix: bf16/fp16 inference (#2978), repetition_penalty CUDA crash (#2974)
Found by smoke-testing the new
funasrCLI as a first-time user.SRT bug:
funasr audio.wav -f srtproduced an invalid cue when the model returns no per-sentence timestamps — a bogus00:00:00,000 --> 99:59:59,999end. Now spans the real audio duration.json bug:
duration_swas actually the processing time — a 60s file showed"duration_s": 2.3, misleading users into thinking the audio is 2.3s. Split intoaudio_duration_s(real length) +processing_s(elapsed). Safe rename (the new CLI is unreleased).Tested: