Model Training

Fine-tune Orpheus TTS
Pretrained: best for training a single custom voice. Fine-tuned: adds your voice alongside 8 existing voices.
HuggingFace dataset with text and audio columns. View requirements
Dataset Requirements
  • Columns: Must have audio and text columns
  • Audio: Any sample rate (resampled to 24kHz automatically)
  • Duration: ~8-10s max per sample at default settings (2048 seq length)
  • Single speaker: All audio should be from one speaker

Example: canopylabs/zac-sample-dataset

If empty: 5% of training data used for validation

Uploads trained model to Hub
Required for private datasets or to push model to Hub. Get token
Track training progress with W&B. Get API key

2048 ~ 8-10s audio. Increase for longer samples (uses more memory).
Voice prefix used during inference
Current Job
No job

No training running. Submit a job to see progress here.

Training History
Time Base Model Dataset Output Train Loss Val Loss Cost Status
No training jobs yet