KaniTTS Zero-Shot Voice Cloning
This is a web interface for KaniTTS zero-shot voice cloning. Upload reference audio and generate speech in any voice!
KaniTTS Zero-Shot Voice Cloning
Upload reference audio, provide its transcript, and enter text to generate speech in the reference voice.
0 2
0 1
1 1.5
About KaniTTS
KaniTTS is a conversational text-to-speech model that can perform zero-shot voice cloning.
How to use:
- Upload a reference audio file (WAV or MP3, max 15 seconds)
- Either enter the transcript manually or click "Transcribe" to auto-transcribe
- Edit the transcript if needed to ensure accuracy
- Enter the text you want to generate in that voice
- Adjust generation parameters if needed
- Click "Generate Speech"
The model will use your provided transcript to understand the reference voice and generate the target text in the same voice.
Tips:
- Use clear, high-quality reference audio
- Keep reference audio under 15 seconds
- The model works best with conversational speech
- Try different temperature settings for varied results
Credits:
- KaniTTS model by the KaniTTS team
- Nemo codec by NVIDIA
- Interface adapted from Orpheus TTS demo