TTS Evaluation - Trelis Studio

Model Evaluation

Generate Audio Samples

Model Type

Model to Evaluate

HuggingFace model ID. For Orpheus: a fine-tuned model repo. For Chatterbox: your fine-tuned repo (with t3_finetuned.pt) or ResembleAI/chatterbox for the base model. For Router TTS: model ID is filled automatically (requires Router API key). For Kokoro: enter kokoro.

Voice

79 voices across 9 languages. See VOICES.md for samples.

Text Prompts

One prompt per line. If empty, uses dataset or default test prompts.

Or Use Dataset (optional)

Dataset with a text (or transcription) column. Prompts field takes priority if filled.

Max Samples

Limited to 50 samples. Sign in for more.

Language

Used for TTS generation (Chatterbox/Piper) and ASR round-trip normalization

Speaker Name

Orpheus only — must match the speaker name used during training

ASR Round-Trip Evaluation (optional)

ASR Model (Router only)

Router ASR model for round-trip evaluation. Transcribes generated audio back to text and computes WER/CER. Leave empty to skip.

Reference ASR Column (optional)

Dataset column containing the ASR transcription of the ground-truth voice recording. CER/WER is computed against this instead of the 'text' column — useful when the speaker didn't read the prompt verbatim. Auto-detected if a reference_asr column exists.

Generation Settings (Orpheus only)

Temperature

Top P

Repetition Penalty

Max New Tokens

Controls max audio length (~84 tokens/sec). 2560 ~ 30s, 5040 ~ 60s.

Output Settings

HuggingFace Token (optional)

Required for private models or pushing results. Get token

Push results to HuggingFace as dataset

Saves generated audio samples as a playable HF dataset

Output Organization (optional)

Output Dataset Name (optional)

Current Job

No job

No evaluation running. Submit a job to see progress here.

Job ID:

Model:

Status:

Evaluation Logs

No logs yet...

Evaluation History

Time	Model	Samples	WER / CER	Output	Cost	Status
No evaluations yet

Model Evaluation

Generate Audio Samples

Current Job

Error

Evaluation History