Model Evaluation

Run Evaluation

Limited to 50 samples. Sign in for more.
Samples longer than this are skipped. Default 30s (Whisper/Moonshine limit). Increase for models that support longer audio.
Auto-detect works for Whisper, Qwen, VibeVoice, Voxtral, and Router models. OmniASR requires an explicit language. Showing 20 common languages.
Controls text normalization for WER/CER computation. Auto selects based on language.

Required for private datasets or pushing results. Get token
Saves predictions and WER for each sample (requires token with write access)
Current Job
No job

No evaluation running. Submit a job to see progress here.

Evaluation History
Time Model Dataset Samples WER Base WER CER Base CER Results Status
No evaluations yet