Kitta AI
Text to Speech
Convert text to natural speech with Fish Audio, MiniMax, Qwen, and more
Speech to Text
High-accuracy transcription from uploaded audio
AI Image
Generate images from prompts with leading models
AI Video
Create video from text descriptions and styles
Lip-sync & digital human
Align speech to video for avatars and presenters
Voice Workspace
Voice synthesis workspace to create and manage your voice projects
Short video & dubbing
Fast voiceover for social, ads, and UGC
Audiobooks & podcasts
Long-form narration with natural pacing
Education & training
Clear narration for courses and internal comms
Model library
Compare TTS providers, features, and specs in one place
Voice clone tutorial
Step-by-step: samples, training, and best practices
API Playground
Try REST endpoints online with your API key
API keys
Create and manage tokens in your account
Pricing
Open App
.

API documentation & playground

Choose an API below for endpoint details, parameters, and live testing with your API key.

  • Text to Speech (HTTP)

    REST synthesis with your voice model ID and engine options.

  • Text to Speech (HTTP v2)

    Synthesize speech with a voice ID and optional engine settings.

  • TTS WebSocket

    Streaming speech over WebSocket for realtime use cases.

  • TTS WebSocket v2

    Updated WebSocket protocol for TTS.

  • Speech to Text

    Transcribe audio from a public URL.

  • Voice clone — create model

    Upload reference audio to create a voice model.

  • Voice clone — delete model

    Remove a voice model by ID.

  • Voice clone — list models

    List public and personal voice models.

  • Lip sync — create task

    Create a lip-sync video generation task.

  • Lip sync — query task

    Poll task status and results by ID.

  • Lip sync — list tasks

    List lip-sync tasks and statistics.

  • User profile (API)

    Remaining API quota and basic account info.