Skip to main content

Voice Selection Guide

Choosing the right voice is crucial for creating engaging and effective text-to-speech content. This guide will help you navigate Uberduck's extensive voice library and make informed decisions.

Types of Voices

Uberduck offers several types of voices:

Provider Voices

These are pre-built voices from major cloud providers:

  • AWS Polly: Over 60 voices across 20+ languages
  • Google Cloud TTS: Includes Standard, WaveNet, and Neural2 voices
  • Azure Speech Service: Microsoft's neural voices

Provider voices are reliable, well-tested, and available for immediate use.

Zero-Shot Voices

These are custom voices created on-demand from audio samples:

  • Generated quickly (usually within minutes)
  • Require only a short audio sample
  • Good for prototyping or one-off projects

Fine-Tuned Voices

These are extensively trained custom voice models:

  • Higher quality and more consistent than zero-shot voices
  • Require more training data
  • Better for production applications

Finding the Right Voice

Using the Voices API Endpoint

The /v1/voices endpoint allows you to search and filter voices based on various criteria:

# Basic request to get all voices
curl -X GET "https://api.uberduck.ai/v1/voices" \
-H "Authorization: Bearer YOUR_API_KEY"

# Filtered request for female English voices
curl -X GET "https://api.uberduck.ai/v1/voices?gender=female&language=english" \
-H "Authorization: Bearer YOUR_API_KEY"

# Search for voices by keyword
curl -X GET "https://api.uberduck.ai/v1/voices?search_term=professional" \
-H "Authorization: Bearer YOUR_API_KEY"

# Filter by provider model type
curl -X GET "https://api.uberduck.ai/v1/voices?model=polly_neural" \
-H "Authorization: Bearer YOUR_API_KEY"

Voice Selection Parameters

The most useful parameters for voice selection:

ParameterDescriptionExample Values
genderGender of the voicemale, female
languageLanguage of the voiceenglish, spanish, french, etc.
accentAccent of the voiceamerican, british, australian, etc.
ageAge category of the voicechild, young, middle_aged, senior
modelModel type for provider voicespolly_neural, google_wavenet, azure_neural
search_termKeyword search across name and display nameAny search term
tagFilter by voice category tagnarrative, conversational, character, etc.

Voice Categories

Uberduck organizes voices into categories with tags:

  • Narrative & Story: Great for audiobooks, storytelling
  • Conversational: Natural-sounding voices for dialogues and chatbots
  • Characters & Animation: Stylized voices for characters
  • Social Media: Voices optimized for social content
  • Entertainment & TV: Celebrity and entertainment-style voices
  • Advertisement: Professional voices for marketing content
  • Educational: Clear, instructional voices

Voice Selection by Use Case

Business Applications

For professional business applications:

  • Choose clear, neutral voices
  • Prefer provider voices from AWS, Google, or Azure
  • Consider regional accents based on your target audience
  • Look for voices tagged with "professional" or "business"

Example:

// Professional American female voice for business content
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'Welcome to our quarterly business review.',
voice: 'polly_joanna', // Professional American female voice
model: 'polly_neural'
})
});

Gaming and Entertainment

For gaming and entertainment:

  • Look for character voices with distinctive styles
  • Consider zero-shot voices for unique characters
  • Experiment with emotional and expressive voices
  • Try voices tagged with "character" or "entertainment"

Example:

// Stylized character voice for a game
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'Behold, adventurer! The treasure lies beyond the dragon\'s lair!',
voice: 'character_voice_id', // A character voice from your search
model: 'compatible_model',
extended: {
emotion: 'excited', // If the model supports emotions
pitch: 0.8 // Slightly deeper voice
}
})
});

Educational Content

For educational applications:

  • Select clear, articulate voices
  • Choose appropriate regional accents based on your audience
  • Prefer slower speaking rates for complex content
  • Look for voices tagged with "education" or "informative"

Example:

// Clear, educational voice with slower pace
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'The mitochondria is the powerhouse of the cell, responsible for cellular respiration.',
voice: 'polly_matthew', // Clear, articulate voice
model: 'polly_neural',
extended: {
speed: 0.9 // Slightly slower for educational content
}
})
});

Multilingual Applications

For multilingual applications:

  • Search for voices in specific languages
  • Consider Google's voices for broadest language support
  • Test pronunciation of foreign words and phrases
  • Use native-speaking voices for each language

Example:

// Function to generate speech in multiple languages
async function generateMultilingualSpeech() {
const content = {
english: {
text: "Welcome to our multilingual application.",
voice: "polly_joanna"
},
spanish: {
text: "Bienvenido a nuestra aplicación multilingüe.",
voice: "polly_lupe"
},
french: {
text: "Bienvenue dans notre application multilingue.",
voice: "polly_lea"
}
};

const results = {};

for (const [language, data] of Object.entries(content)) {
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: data.text,
voice: data.voice,
model: data.voice.startsWith('polly') ? 'polly_neural' : 'google_wavenet'
})
});

results[language] = await response.json();
}

return results;
}

Voice Comparison and Testing

For critical applications, it's often best to compare multiple voices:

// Function to test multiple voices with the same content
async function compareVoices(text, voiceIds) {
const results = [];

for (const voiceId of voiceIds) {
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: text,
voice: voiceId,
model: voiceId.startsWith('polly') ? 'polly_neural' :
voiceId.startsWith('google') ? 'google_wavenet' : 'azure_neural'
})
});

const data = await response.json();
results.push({
voiceId: voiceId,
audioUrl: data.audio_url
});
}

return results;
}

// Example usage
const testText = "This is a test of different voice options for our application.";
const voicesToCompare = [
'polly_joanna',
'polly_matthew',
'google_wavenet_a',
'azure_guy'
];

compareVoices(testText, voicesToCompare)
.then(results => {
console.table(results);
// Present these to stakeholders for selection
});

Building a Voice Selection Interface

For applications where users select voices, consider building a voice browsing interface:

// React component example for a voice selection interface
function VoiceSelector({ onVoiceSelected }) {
const [voices, setVoices] = useState([]);
const [filters, setFilters] = useState({
gender: '',
language: 'english',
accent: ''
});
const [selectedVoice, setSelectedVoice] = useState(null);
const [loading, setLoading] = useState(true);

useEffect(() => {
async function fetchVoices() {
setLoading(true);

// Build query string from filters
const queryParams = new URLSearchParams();
if (filters.gender) queryParams.append('gender', filters.gender);
if (filters.language) queryParams.append('language', filters.language);
if (filters.accent) queryParams.append('accent', filters.accent);

try {
const response = await fetch(`/api/voices?${queryParams.toString()}`);
const data = await response.json();
setVoices(data.voices);
} catch (error) {
console.error('Failed to fetch voices:', error);
} finally {
setLoading(false);
}
}

fetchVoices();
}, [filters]);

function handleFilterChange(name, value) {
setFilters(prev => ({
...prev,
[name]: value
}));
}

function handleVoiceSelect(voice) {
setSelectedVoice(voice);
onVoiceSelected(voice);
}

async function playVoiceSample(voice) {
// Play the sample audio
const audio = new Audio(voice.sample_url);
audio.play();
}

return (
<div className="voice-selector">
<div className="filters">
<select
value={filters.gender}
onChange={e => handleFilterChange('gender', e.target.value)}
>
<option value="">All Genders</option>
<option value="male">Male</option>
<option value="female">Female</option>
</select>

<select
value={filters.language}
onChange={e => handleFilterChange('language', e.target.value)}
>
<option value="">All Languages</option>
<option value="english">English</option>
<option value="spanish">Spanish</option>
<option value="french">French</option>
{/* More languages */}
</select>

<select
value={filters.accent}
onChange={e => handleFilterChange('accent', e.target.value)}
>
<option value="">All Accents</option>
<option value="american">American</option>
<option value="british">British</option>
<option value="australian">Australian</option>
{/* More accents */}
</select>
</div>

{loading ? (
<div className="loading">Loading voices...</div>
) : (
<div className="voice-list">
{voices.map(voice => (
<div
key={voice.voicemodel_uuid}
className={`voice-item ${selectedVoice?.voicemodel_uuid === voice.voicemodel_uuid ? 'selected' : ''}`}
onClick={() => handleVoiceSelect(voice)}
>
<h3>{voice.display_name}</h3>
<div className="voice-details">
<span>{voice.gender}</span>
<span>{voice.accent}</span>
<span>{voice.language}</span>
</div>
<button onClick={(e) => {
e.stopPropagation();
playVoiceSample(voice);
}}>
Play Sample
</button>
</div>
))}
</div>
)}
</div>
);
}

Conclusion

Selecting the right voice significantly impacts how your content is received. Take time to explore the available options, test different voices with your actual content, and consider your audience's preferences and expectations.

For more information, refer to the Voices API Reference and explore our Text-to-Speech Guide for advanced usage techniques.