Uberduck API Overview

Welcome to the Uberduck API documentation. Our API provides a unified interface to access a wide range of AI models for speech synthesis, voice cloning, and music generation. This document outlines the core concepts, endpoints, and usage patterns.

Base URL

All API requests should be made to the following base URL:

https://api.uberduck.ai/v1

Core Concepts

Capabilities

The API is organized around capabilities - high-level functions that models can perform:

text-to-speech - Convert text to spoken audio
voice-cloning - Create voice models from audio samples
music-generation - Create music from text descriptions

Models and Providers

Models are implementations of capabilities, typically from a specific provider:

Provider - The source of the model (e.g., AWS, Google, Azure)
Model - The specific implementation (e.g., Polly, WaveNet, Neural)

Voices

Voices are used with speech models and can be:

Provider Voices - Pre-built voices from providers (e.g., AWS Polly's "Joanna")
Zero-Shot Voices - Custom voices created from user audio samples
Fine-Tuned Voices - Extensively trained custom voice models

Available Endpoints

Endpoint	Description
`GET /v1/models`	List available TTS models
`GET /v1/voices`	List available voices with filtering options
`POST /v1/text-to-speech`	Generate speech from text using specified voice and model

Getting Started

To start using the Uberduck API:

API Versioning

The current version is v1. We follow semantic versioning principles for all API changes:

Major versions (v1, v2, etc.) may include breaking changes
Minor updates are backward-compatible and introduce new functionality
Patch updates are backward-compatible bug fixes

Major version changes will be announced with a minimum of 6 months' notice before the previous version is deprecated.

Base URL​

Core Concepts​

Capabilities​

Models and Providers​

Voices​

Available Endpoints​

Getting Started​

API Versioning​