Data Preparation
Create Speech Dataset
Drop audio + transcript files here
or click to browse (individual files or .zip)
or click to browse (individual files or .zip)
Supported: .wav, .mp3, .m4a, .sph + .srt, .vtt, .txt (or .zip)
Max: 10GB per file, 10GB total (up to 10 ZIPs per session)
For more than 10 audio files, upload a ZIP archive.
Max: 10GB per file, 10GB total (up to 10 ZIPs per session)
For more than 10 audio files, upload a ZIP archive.
Matching: Files are paired by name
(e.g.,
Note: .txt files (no timestamps) require audio under 5 minutes. Use .srt/.vtt for longer audio.
audio1.wav ↔ audio1.vtt)Note: .txt files (no timestamps) require audio under 5 minutes. Use .srt/.vtt for longer audio.
How to create a ZIP ▾
Place your audio + transcript files together and zip them:
- Mac: Select all files → right-click → Compress
- Windows: Select all files → right-click → Send to → Compressed (zipped) folder
- Linux/CLI:
zip dataset.zip *.wav *.vtt
How it works ▾
- Upload paired audio + transcript files (.srt/.vtt/.txt)
- Process: forced alignment + smart chunking (≤30s)
- Output: HuggingFace dataset ready for training
Current Job
No jobNo processing running. Upload files and click Process to start.
Processing History
| Time | Files | Samples | Audio Prepared | Cost | Output | Status |
|---|---|---|---|---|---|---|
| No data preparation jobs yet | ||||||