Skip to main content

Text-to-Speech Endpoint

Generate speech from text using a specified voice and model.

Request

POST /v1/text-to-speech

Request Body Parameters

ParameterTypeRequiredDescription
textstringYesThe text to convert to speech. Maximum length is 10,000 characters.
voicestringYesThe voice ID to use (e.g., polly_joanna, google_wavenet_a, azure_guy).
modelstringNoThe model ID to use (e.g., polly_neural, google_wavenet). If not specified, a default model compatible with the voice will be used.
extendedobjectNoCommon parameters supported by many models.
model_specificobjectNoParameters specific to the chosen model.
output_formatstringNoDesired output format (default: mp3). Options include mp3, wav, ogg.

Extended Parameters

These parameters are common across many models:

ParameterTypeDescription
emotionstringEmotion to apply to the voice (e.g., happy, sad, neutral).
speedfloatPlayback speed multiplier. Range: 0.5 to 2.0. Default: 1.0.
pitchfloatVoice pitch adjustment. Range: -10.0 to 10.0. Default: 0.0.

Model-Specific Parameters

Parameters vary by model. Some examples:

AWS Polly

ParameterTypeDescription
enginestringThe engine to use. Options: standard, neural. Default depends on voice compatibility.

Google Cloud TTS

ParameterTypeDescription
speaking_ratefloatSpeaking rate. Range: 0.25 to 4.0. Default: 1.0.
pitchfloatVoice pitch. Range: -20.0 to 20.0. Default: 0.0.

Azure Speech Service

ParameterTypeDescription
pitchstringVoice pitch adjustment. Options: x-low, low, medium, high, x-high. Default: medium.
ratestringSpeaking rate. Options: x-slow, slow, medium, fast, x-fast. Default: medium.

Example Request

curl -X POST https://api.uberduck.ai/v1/text-to-speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, world!",
"voice": "polly_joanna",
"model": "polly_neural",
"extended": {
"speed": 1.2
},
"output_format": "mp3"
}'

Response

Response Body

FieldTypeDescription
audio_urlstringURL to the generated audio file. This URL is valid for 24 hours.
duration_secondsfloatDuration of the generated audio in seconds.
modelstringThe model used for generation.
voicestringThe voice used for generation.
created_atstringISO 8601 timestamp when the audio was created.

Example Response

{
"audio_url": "https://static.uberduck.ai/output/fc9a66eb-1a0f-4cab-a139-13e8dc306786.mp3",
"duration_seconds": 1.5,
"model": "polly_neural",
"voice": "polly_joanna",
"created_at": "2025-05-27T17:30:00Z"
}

Error Codes

Status CodeError CodeDescription
400invalid_requestInvalid request parameters.
400text_too_longText exceeds maximum length.
400invalid_voiceThe specified voice does not exist.
401unauthorizedInvalid or missing API key.
404voice_not_foundThe specified voice was not found.
404model_not_foundThe specified model was not found.
422voice_model_incompatibleThe specified voice is not compatible with the specified model.
429rate_limit_exceededRate limit exceeded.
500internal_errorInternal server error.

Usage Notes

  • The maximum text length is 10,000 characters. Longer text will be rejected.
  • Audio URLs are valid for 24 hours. Download and store the audio if you need it for longer.
  • When using provider voices (AWS Polly, Google, Azure), the voice parameter should include the provider prefix (e.g., polly_joanna, google_wavenet_a).
  • Supported output formats: mp3, wav, ogg.