Text-to-Speech Endpoint
Generate speech from text using a specified voice and model.
Request
POST /v1/text-to-speech
Request Body Parameters
Parameter | Type | Required | Description |
---|---|---|---|
text | string | Yes | The text to convert to speech. Maximum length is 10,000 characters. |
voice | string | Yes | The voice ID to use (e.g., polly_joanna , google_wavenet_a , azure_guy ). |
model | string | No | The model ID to use (e.g., polly_neural , google_wavenet ). If not specified, a default model compatible with the voice will be used. |
extended | object | No | Common parameters supported by many models. |
model_specific | object | No | Parameters specific to the chosen model. |
output_format | string | No | Desired output format (default: mp3 ). Options include mp3 , wav , ogg . |
Extended Parameters
These parameters are common across many models:
Parameter | Type | Description |
---|---|---|
emotion | string | Emotion to apply to the voice (e.g., happy , sad , neutral ). |
speed | float | Playback speed multiplier. Range: 0.5 to 2.0. Default: 1.0. |
pitch | float | Voice pitch adjustment. Range: -10.0 to 10.0. Default: 0.0. |
Model-Specific Parameters
Parameters vary by model. Some examples:
AWS Polly
Parameter | Type | Description |
---|---|---|
engine | string | The engine to use. Options: standard , neural . Default depends on voice compatibility. |
Google Cloud TTS
Parameter | Type | Description |
---|---|---|
speaking_rate | float | Speaking rate. Range: 0.25 to 4.0. Default: 1.0. |
pitch | float | Voice pitch. Range: -20.0 to 20.0. Default: 0.0. |
Azure Speech Service
Parameter | Type | Description |
---|---|---|
pitch | string | Voice pitch adjustment. Options: x-low , low , medium , high , x-high . Default: medium . |
rate | string | Speaking rate. Options: x-slow , slow , medium , fast , x-fast . Default: medium . |
Example Request
curl -X POST https://api.uberduck.ai/v1/text-to-speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, world!",
"voice": "polly_joanna",
"model": "polly_neural",
"extended": {
"speed": 1.2
},
"output_format": "mp3"
}'
Response
Response Body
Field | Type | Description |
---|---|---|
audio_url | string | URL to the generated audio file. This URL is valid for 24 hours. |
duration_seconds | float | Duration of the generated audio in seconds. |
model | string | The model used for generation. |
voice | string | The voice used for generation. |
created_at | string | ISO 8601 timestamp when the audio was created. |
Example Response
{
"audio_url": "https://static.uberduck.ai/output/fc9a66eb-1a0f-4cab-a139-13e8dc306786.mp3",
"duration_seconds": 1.5,
"model": "polly_neural",
"voice": "polly_joanna",
"created_at": "2025-05-27T17:30:00Z"
}
Error Codes
Status Code | Error Code | Description |
---|---|---|
400 | invalid_request | Invalid request parameters. |
400 | text_too_long | Text exceeds maximum length. |
400 | invalid_voice | The specified voice does not exist. |
401 | unauthorized | Invalid or missing API key. |
404 | voice_not_found | The specified voice was not found. |
404 | model_not_found | The specified model was not found. |
422 | voice_model_incompatible | The specified voice is not compatible with the specified model. |
429 | rate_limit_exceeded | Rate limit exceeded. |
500 | internal_error | Internal server error. |
Usage Notes
- The maximum text length is 10,000 characters. Longer text will be rejected.
- Audio URLs are valid for 24 hours. Download and store the audio if you need it for longer.
- When using provider voices (AWS Polly, Google, Azure), the voice parameter should include the provider prefix (e.g.,
polly_joanna
,google_wavenet_a
). - Supported output formats:
mp3
,wav
,ogg
.