Model Training

Fine-tune Whisper
View requirements
Multiple datasets are sampled proportionally by weight during training.
Dataset Requirements
  • Columns: Must have audio and text (or transcription) columns
  • Audio: Max 30 seconds per sample (Whisper limit)
  • Text: Max ~1500 characters per sample (~444 tokens, Whisper limit)
  • Rows: Limited to 100 training / 50 validation rows. Sign in for higher limits.

Example: Trelis/llm-lingo

If empty: 5% of training data used for validation
Limited to 50. Sign in for more.
Language of the training audio (99 languages supported). Select "multilingual" to use per-sample language from dataset.

HuggingFace Hub Settings (model + eval results)
Uploads trained model and evaluation results to Hub
Required for private base models or to push model to Hub. Get token
Track training progress with W&B. Get API key

Hyperparameters will be automatically optimized for your dataset size and model.
Gradient clipping threshold (0 = disabled, 1.0 = standard)
Limit to N random samples (blank = use all)
Controls text normalization for WER/CER during training evaluation
Current Job
No job

No training running. Submit a job to see progress here.

Training History
Time Base Model Dataset Output WER CER Cost Status
No training jobs yet