Uberduck API Overview
Welcome to the Uberduck API documentation. Our API provides a unified interface to access a wide range of AI models for speech synthesis, voice cloning, and music generation. This document outlines the core concepts, endpoints, and usage patterns.
Base URL
All API requests should be made to the following base URL:
https://api.uberduck.ai/v1
Core Concepts
Capabilities
The API is organized around capabilities - high-level functions that models can perform:
text-to-speech
- Convert text to spoken audiovoice-cloning
- Create voice models from audio samplesmusic-generation
- Create music from text descriptions
Models and Providers
Models are implementations of capabilities, typically from a specific provider:
- Provider - The source of the model (e.g., AWS, Google, Azure)
- Model - The specific implementation (e.g., Polly, WaveNet, Neural)
Voices
Voices are used with speech models and can be:
- Provider Voices - Pre-built voices from providers (e.g., AWS Polly's "Joanna")
- Zero-Shot Voices - Custom voices created from user audio samples
- Fine-Tuned Voices - Extensively trained custom voice models
Available Endpoints
Endpoint | Description |
---|---|
GET /v1/models | List available TTS models |
GET /v1/voices | List available voices with filtering options |
POST /v1/text-to-speech | Generate speech from text using specified voice and model |
Getting Started
To start using the Uberduck API:
API Versioning
The current version is v1. We follow semantic versioning principles for all API changes:
- Major versions (v1, v2, etc.) may include breaking changes
- Minor updates are backward-compatible and introduce new functionality
- Patch updates are backward-compatible bug fixes
Major version changes will be announced with a minimum of 6 months' notice before the previous version is deprecated.