Model Training

Fine-tune Whisper
View requirements
Multiple datasets are sampled proportionally by weight during training.
Dataset Requirements
  • Columns: Must have audio and text columns
  • Audio: Max 30 seconds per sample (Whisper limit)
  • Text: Max ~1500 characters per sample (~444 tokens, Whisper limit)
  • Rows: Limited to 100 training / 50 validation rows. Sign in for higher limits.

Example: Trelis/llm-lingo

If empty: 5% of training data (max 50 rows) used for validation
Language of the training audio (99 languages supported). Select "multilingual" to use per-sample language from dataset.

HuggingFace Hub Settings (model + eval results)
Uploads trained model and evaluation results to Hub
Required for private base models or to push model to Hub. Get token
Track training progress with W&B. Get API key

Gradient clipping threshold (0 = disabled, 1.0 = standard)
Current Job
No job

No training running. Submit a job to see progress here.

Training History
Time Base Model Dataset Output Baseline WER Final WER Cost Status
No training jobs yet