Convert text to natural speech with Fish Audio, MiniMax, Qwen, and more

High-accuracy transcription from uploaded audio

Generate images from prompts with leading models

Create video from text descriptions and styles

Lip-sync & digital human

Align speech to video for avatars and presenters

Voice Workspace

Voice synthesis workspace to create and manage your voice projects

Short video & dubbing

Fast voiceover for social, ads, and UGC

Audiobooks & podcasts

Long-form narration with natural pacing

Education & training

Clear narration for courses and internal comms

Compare TTS providers, features, and specs in one place

Voice clone tutorial

Step-by-step: samples, training, and best practices

Try REST endpoints online with your API key

Create and manage tokens in your account

Open App

.

API documentation & playground

Choose an API below for endpoint details, parameters, and live testing with your API key.

Text to Speech (HTTP)
REST synthesis with your voice model ID and engine options.
Text to Speech (HTTP v2)
Synthesize speech with a voice ID and optional engine settings.
TTS WebSocket
Streaming speech over WebSocket for realtime use cases.
TTS WebSocket v2
Updated WebSocket protocol for TTS.
Speech to Text
Transcribe audio from a public URL.
Voice clone — create model
Upload reference audio to create a voice model.
Voice clone — delete model
Remove a voice model by ID.
Voice clone — list models
List public and personal voice models.
Lip sync — create task
Create a lip-sync video generation task.
Lip sync — query task
Poll task status and results by ID.
Lip sync — list tasks
List lip-sync tasks and statistics.
User profile (API)
Remaining API quota and basic account info.