Text-to-Speech Endpoint

Generate speech from text using a specified voice and model.

Request

POST /v1/text-to-speech

Request Body Parameters

Parameter	Type	Required	Description
`text`	string	Yes	The text to convert to speech. Maximum length is 10,000 characters.
`voice`	string	Yes	The voice ID to use (e.g., `polly_joanna`, `google_wavenet_a`, `azure_guy`).
`model`	string	No	The model ID to use (e.g., `polly_neural`, `google_wavenet`). If not specified, a default model compatible with the voice will be used.
`extended`	object	No	Common parameters supported by many models.
`model_specific`	object	No	Parameters specific to the chosen model.
`output_format`	string	No	Desired output format (default: `mp3`). Options include `mp3`, `wav`, `ogg`.

Extended Parameters

These parameters are common across many models:

Parameter	Type	Description
`emotion`	string	Emotion to apply to the voice (e.g., `happy`, `sad`, `neutral`).
`speed`	float	Playback speed multiplier. Range: 0.5 to 2.0. Default: 1.0.
`pitch`	float	Voice pitch adjustment. Range: -10.0 to 10.0. Default: 0.0.

Model-Specific Parameters

Parameters vary by model. Some examples:

AWS Polly

Parameter	Type	Description
`engine`	string	The engine to use. Options: `standard`, `neural`. Default depends on voice compatibility.

Google Cloud TTS

Parameter	Type	Description
`speaking_rate`	float	Speaking rate. Range: 0.25 to 4.0. Default: 1.0.
`pitch`	float	Voice pitch. Range: -20.0 to 20.0. Default: 0.0.

Azure Speech Service

Parameter	Type	Description
`pitch`	string	Voice pitch adjustment. Options: `x-low`, `low`, `medium`, `high`, `x-high`. Default: `medium`.
`rate`	string	Speaking rate. Options: `x-slow`, `slow`, `medium`, `fast`, `x-fast`. Default: `medium`.

Example Request

curl -X POST https://api.uberduck.ai/v1/text-to-speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "voice": "polly_joanna",
    "model": "polly_neural",
    "extended": {
      "speed": 1.2
    },
    "output_format": "mp3"
  }'

Response

Response Body

Field	Type	Description
`audio_url`	string	URL to the generated audio file. This URL is valid for 24 hours.
`duration_seconds`	float	Duration of the generated audio in seconds.
`model`	string	The model used for generation.
`voice`	string	The voice used for generation.
`created_at`	string	ISO 8601 timestamp when the audio was created.

Example Response

{
  "audio_url": "https://static.uberduck.ai/output/fc9a66eb-1a0f-4cab-a139-13e8dc306786.mp3",
  "duration_seconds": 1.5,
  "model": "polly_neural",
  "voice": "polly_joanna",
  "created_at": "2025-05-27T17:30:00Z"
}

Error Codes

Status Code	Error Code	Description
400	`invalid_request`	Invalid request parameters.
400	`text_too_long`	Text exceeds maximum length.
400	`invalid_voice`	The specified voice does not exist.
401	`unauthorized`	Invalid or missing API key.
404	`voice_not_found`	The specified voice was not found.
404	`model_not_found`	The specified model was not found.
422	`voice_model_incompatible`	The specified voice is not compatible with the specified model.
429	`rate_limit_exceeded`	Rate limit exceeded.
500	`internal_error`	Internal server error.

Usage Notes

The maximum text length is 10,000 characters. Longer text will be rejected.
Audio URLs are valid for 24 hours. Download and store the audio if you need it for longer.
When using provider voices (AWS Polly, Google, Azure), the voice parameter should include the provider prefix (e.g., polly_joanna, google_wavenet_a).
Supported output formats: mp3, wav, ogg.

Request​

Request Body Parameters​

Extended Parameters​

Model-Specific Parameters​

AWS Polly​

Google Cloud TTS​

Azure Speech Service​

Example Request​

Response​

Response Body​

Example Response​

Error Codes​

Usage Notes​

Request

Request Body Parameters

Extended Parameters

Model-Specific Parameters

AWS Polly

Google Cloud TTS

Azure Speech Service

Example Request

Response

Response Body

Example Response

Error Codes

Usage Notes