Voice Selection Guide
Choosing the right voice is crucial for creating engaging and effective text-to-speech content. This guide will help you navigate Uberduck's extensive voice library and make informed decisions.
Types of Voices
Uberduck offers several types of voices:
Provider Voices
These are pre-built voices from major cloud providers:
- AWS Polly: Over 60 voices across 20+ languages
- Google Cloud TTS: Includes Standard, WaveNet, and Neural2 voices
- Azure Speech Service: Microsoft's neural voices
Provider voices are reliable, well-tested, and available for immediate use.
Zero-Shot Voices
These are custom voices created on-demand from audio samples:
- Generated quickly (usually within minutes)
- Require only a short audio sample
- Good for prototyping or one-off projects
Fine-Tuned Voices
These are extensively trained custom voice models:
- Higher quality and more consistent than zero-shot voices
- Require more training data
- Better for production applications
Finding the Right Voice
Using the Voices API Endpoint
The /v1/voices
endpoint allows you to search and filter voices based on various criteria:
# Basic request to get all voices
curl -X GET "https://api.uberduck.ai/v1/voices" \
-H "Authorization: Bearer YOUR_API_KEY"
# Filtered request for female English voices
curl -X GET "https://api.uberduck.ai/v1/voices?gender=female&language=english" \
-H "Authorization: Bearer YOUR_API_KEY"
# Search for voices by keyword
curl -X GET "https://api.uberduck.ai/v1/voices?search_term=professional" \
-H "Authorization: Bearer YOUR_API_KEY"
# Filter by provider model type
curl -X GET "https://api.uberduck.ai/v1/voices?model=polly_neural" \
-H "Authorization: Bearer YOUR_API_KEY"
Voice Selection Parameters
The most useful parameters for voice selection:
Parameter | Description | Example Values |
---|---|---|
gender | Gender of the voice | male , female |
language | Language of the voice | english , spanish , french , etc. |
accent | Accent of the voice | american , british , australian , etc. |
age | Age category of the voice | child , young , middle_aged , senior |
model | Model type for provider voices | polly_neural , google_wavenet , azure_neural |
search_term | Keyword search across name and display name | Any search term |
tag | Filter by voice category tag | narrative , conversational , character , etc. |
Voice Categories
Uberduck organizes voices into categories with tags:
- Narrative & Story: Great for audiobooks, storytelling
- Conversational: Natural-sounding voices for dialogues and chatbots
- Characters & Animation: Stylized voices for characters
- Social Media: Voices optimized for social content
- Entertainment & TV: Celebrity and entertainment-style voices
- Advertisement: Professional voices for marketing content
- Educational: Clear, instructional voices
Voice Selection by Use Case
Business Applications
For professional business applications:
- Choose clear, neutral voices
- Prefer provider voices from AWS, Google, or Azure
- Consider regional accents based on your target audience
- Look for voices tagged with "professional" or "business"
Example:
// Professional American female voice for business content
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'Welcome to our quarterly business review.',
voice: 'polly_joanna', // Professional American female voice
model: 'polly_neural'
})
});
Gaming and Entertainment
For gaming and entertainment:
- Look for character voices with distinctive styles
- Consider zero-shot voices for unique characters
- Experiment with emotional and expressive voices
- Try voices tagged with "character" or "entertainment"
Example:
// Stylized character voice for a game
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'Behold, adventurer! The treasure lies beyond the dragon\'s lair!',
voice: 'character_voice_id', // A character voice from your search
model: 'compatible_model',
extended: {
emotion: 'excited', // If the model supports emotions
pitch: 0.8 // Slightly deeper voice
}
})
});
Educational Content
For educational applications:
- Select clear, articulate voices
- Choose appropriate regional accents based on your audience
- Prefer slower speaking rates for complex content
- Look for voices tagged with "education" or "informative"
Example:
// Clear, educational voice with slower pace
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: 'The mitochondria is the powerhouse of the cell, responsible for cellular respiration.',
voice: 'polly_matthew', // Clear, articulate voice
model: 'polly_neural',
extended: {
speed: 0.9 // Slightly slower for educational content
}
})
});
Multilingual Applications
For multilingual applications:
- Search for voices in specific languages
- Consider Google's voices for broadest language support
- Test pronunciation of foreign words and phrases
- Use native-speaking voices for each language
Example:
// Function to generate speech in multiple languages
async function generateMultilingualSpeech() {
const content = {
english: {
text: "Welcome to our multilingual application.",
voice: "polly_joanna"
},
spanish: {
text: "Bienvenido a nuestra aplicación multilingüe.",
voice: "polly_lupe"
},
french: {
text: "Bienvenue dans notre application multilingue.",
voice: "polly_lea"
}
};
const results = {};
for (const [language, data] of Object.entries(content)) {
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: data.text,
voice: data.voice,
model: data.voice.startsWith('polly') ? 'polly_neural' : 'google_wavenet'
})
});
results[language] = await response.json();
}
return results;
}
Voice Comparison and Testing
For critical applications, it's often best to compare multiple voices:
// Function to test multiple voices with the same content
async function compareVoices(text, voiceIds) {
const results = [];
for (const voiceId of voiceIds) {
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: text,
voice: voiceId,
model: voiceId.startsWith('polly') ? 'polly_neural' :
voiceId.startsWith('google') ? 'google_wavenet' : 'azure_neural'
})
});
const data = await response.json();
results.push({
voiceId: voiceId,
audioUrl: data.audio_url
});
}
return results;
}
// Example usage
const testText = "This is a test of different voice options for our application.";
const voicesToCompare = [
'polly_joanna',
'polly_matthew',
'google_wavenet_a',
'azure_guy'
];
compareVoices(testText, voicesToCompare)
.then(results => {
console.table(results);
// Present these to stakeholders for selection
});
Building a Voice Selection Interface
For applications where users select voices, consider building a voice browsing interface:
// React component example for a voice selection interface
function VoiceSelector({ onVoiceSelected }) {
const [voices, setVoices] = useState([]);
const [filters, setFilters] = useState({
gender: '',
language: 'english',
accent: ''
});
const [selectedVoice, setSelectedVoice] = useState(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
async function fetchVoices() {
setLoading(true);
// Build query string from filters
const queryParams = new URLSearchParams();
if (filters.gender) queryParams.append('gender', filters.gender);
if (filters.language) queryParams.append('language', filters.language);
if (filters.accent) queryParams.append('accent', filters.accent);
try {
const response = await fetch(`/api/voices?${queryParams.toString()}`);
const data = await response.json();
setVoices(data.voices);
} catch (error) {
console.error('Failed to fetch voices:', error);
} finally {
setLoading(false);
}
}
fetchVoices();
}, [filters]);
function handleFilterChange(name, value) {
setFilters(prev => ({
...prev,
[name]: value
}));
}
function handleVoiceSelect(voice) {
setSelectedVoice(voice);
onVoiceSelected(voice);
}
async function playVoiceSample(voice) {
// Play the sample audio
const audio = new Audio(voice.sample_url);
audio.play();
}
return (
<div className="voice-selector">
<div className="filters">
<select
value={filters.gender}
onChange={e => handleFilterChange('gender', e.target.value)}
>
<option value="">All Genders</option>
<option value="male">Male</option>
<option value="female">Female</option>
</select>
<select
value={filters.language}
onChange={e => handleFilterChange('language', e.target.value)}
>
<option value="">All Languages</option>
<option value="english">English</option>
<option value="spanish">Spanish</option>
<option value="french">French</option>
{/* More languages */}
</select>
<select
value={filters.accent}
onChange={e => handleFilterChange('accent', e.target.value)}
>
<option value="">All Accents</option>
<option value="american">American</option>
<option value="british">British</option>
<option value="australian">Australian</option>
{/* More accents */}
</select>
</div>
{loading ? (
<div className="loading">Loading voices...</div>
) : (
<div className="voice-list">
{voices.map(voice => (
<div
key={voice.voicemodel_uuid}
className={`voice-item ${selectedVoice?.voicemodel_uuid === voice.voicemodel_uuid ? 'selected' : ''}`}
onClick={() => handleVoiceSelect(voice)}
>
<h3>{voice.display_name}</h3>
<div className="voice-details">
<span>{voice.gender}</span>
<span>{voice.accent}</span>
<span>{voice.language}</span>
</div>
<button onClick={(e) => {
e.stopPropagation();
playVoiceSample(voice);
}}>
Play Sample
</button>
</div>
))}
</div>
)}
</div>
);
}
Conclusion
Selecting the right voice significantly impacts how your content is received. Take time to explore the available options, test different voices with your actual content, and consider your audience's preferences and expectations.
For more information, refer to the Voices API Reference and explore our Text-to-Speech Guide for advanced usage techniques.