Skip to main content

Uberduck API Overview

Welcome to the Uberduck API documentation. Our API provides a unified interface to access a wide range of AI models for speech synthesis, voice cloning, and music generation. This document outlines the core concepts, endpoints, and usage patterns.

Base URL

All API requests should be made to the following base URL:

https://api.uberduck.ai/v1

Core Concepts

Capabilities

The API is organized around capabilities - high-level functions that models can perform:

  • text-to-speech - Convert text to spoken audio
  • voice-cloning - Create voice models from audio samples
  • music-generation - Create music from text descriptions

Models and Providers

Models are implementations of capabilities, typically from a specific provider:

  • Provider - The source of the model (e.g., AWS, Google, Azure)
  • Model - The specific implementation (e.g., Polly, WaveNet, Neural)

Voices

Voices are used with speech models and can be:

  • Provider Voices - Pre-built voices from providers (e.g., AWS Polly's "Joanna")
  • Zero-Shot Voices - Custom voices created from user audio samples
  • Fine-Tuned Voices - Extensively trained custom voice models

Available Endpoints

EndpointDescription
GET /v1/modelsList available TTS models
GET /v1/voicesList available voices with filtering options
POST /v1/text-to-speechGenerate speech from text using specified voice and model

Getting Started

To start using the Uberduck API:

  1. Obtain your API key
  2. Make your first request
  3. Explore available voices and models

API Versioning

The current version is v1. We follow semantic versioning principles for all API changes:

  • Major versions (v1, v2, etc.) may include breaking changes
  • Minor updates are backward-compatible and introduce new functionality
  • Patch updates are backward-compatible bug fixes

Major version changes will be announced with a minimum of 6 months' notice before the previous version is deprecated.