Voice Selection Guide

Choosing the right voice is crucial for creating engaging and effective text-to-speech content. This guide will help you navigate Uberduck's extensive voice library and make informed decisions.

Types of Voices

Uberduck offers several types of voices:

Provider Voices

These are pre-built voices from major cloud providers:

AWS Polly: Over 60 voices across 20+ languages
Google Cloud TTS: Includes Standard, WaveNet, and Neural2 voices
Azure Speech Service: Microsoft's neural voices

Provider voices are reliable, well-tested, and available for immediate use.

Zero-Shot Voices

These are custom voices created on-demand from audio samples:

Generated quickly (usually within minutes)
Require only a short audio sample
Good for prototyping or one-off projects

Fine-Tuned Voices

These are extensively trained custom voice models:

Higher quality and more consistent than zero-shot voices
Require more training data
Better for production applications

Finding the Right Voice

Using the Voices API Endpoint

The /v1/voices endpoint allows you to search and filter voices based on various criteria:

# Basic request to get all voices
curl -X GET "https://api.uberduck.ai/v1/voices" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Filtered request for female English voices
curl -X GET "https://api.uberduck.ai/v1/voices?gender=female&language=english" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Search for voices by keyword
curl -X GET "https://api.uberduck.ai/v1/voices?search_term=professional" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Filter by provider model type
curl -X GET "https://api.uberduck.ai/v1/voices?model=polly_neural" \
  -H "Authorization: Bearer YOUR_API_KEY"

Voice Selection Parameters

The most useful parameters for voice selection:

Parameter	Description	Example Values
`gender`	Gender of the voice	`male`, `female`
`language`	Language of the voice	`english`, `spanish`, `french`, etc.
`accent`	Accent of the voice	`american`, `british`, `australian`, etc.
`age`	Age category of the voice	`child`, `young`, `middle_aged`, `senior`
`model`	Model type for provider voices	`polly_neural`, `google_wavenet`, `azure_neural`
`search_term`	Keyword search across name and display name	Any search term
`tag`	Filter by voice category tag	`narrative`, `conversational`, `character`, etc.

Voice Categories

Uberduck organizes voices into categories with tags:

Narrative & Story: Great for audiobooks, storytelling
Conversational: Natural-sounding voices for dialogues and chatbots
Characters & Animation: Stylized voices for characters
Social Media: Voices optimized for social content
Entertainment & TV: Celebrity and entertainment-style voices
Advertisement: Professional voices for marketing content
Educational: Clear, instructional voices

Voice Selection by Use Case

Business Applications

For professional business applications:

Choose clear, neutral voices
Prefer provider voices from AWS, Google, or Azure
Consider regional accents based on your target audience
Look for voices tagged with "professional" or "business"

Example:

// Professional American female voice for business content
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Welcome to our quarterly business review.',
    voice: 'polly_joanna', // Professional American female voice
    model: 'polly_neural'
  })
});

Gaming and Entertainment

For gaming and entertainment:

Look for character voices with distinctive styles
Consider zero-shot voices for unique characters
Experiment with emotional and expressive voices
Try voices tagged with "character" or "entertainment"

Example:

// Stylized character voice for a game
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Behold, adventurer! The treasure lies beyond the dragon\'s lair!',
    voice: 'character_voice_id', // A character voice from your search
    model: 'compatible_model',
    extended: {
      emotion: 'excited', // If the model supports emotions
      pitch: 0.8 // Slightly deeper voice
    }
  })
});

Educational Content

For educational applications:

Select clear, articulate voices
Choose appropriate regional accents based on your audience
Prefer slower speaking rates for complex content
Look for voices tagged with "education" or "informative"

Example:

// Clear, educational voice with slower pace
const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'The mitochondria is the powerhouse of the cell, responsible for cellular respiration.',
    voice: 'polly_matthew', // Clear, articulate voice
    model: 'polly_neural',
    extended: {
      speed: 0.9 // Slightly slower for educational content
    }
  })
});

Multilingual Applications

For multilingual applications:

Search for voices in specific languages
Consider Google's voices for broadest language support
Test pronunciation of foreign words and phrases
Use native-speaking voices for each language

Example:

// Function to generate speech in multiple languages
async function generateMultilingualSpeech() {
  const content = {
    english: {
      text: "Welcome to our multilingual application.",
      voice: "polly_joanna"
    },
    spanish: {
      text: "Bienvenido a nuestra aplicación multilingüe.",
      voice: "polly_lupe"
    },
    french: {
      text: "Bienvenue dans notre application multilingue.",
      voice: "polly_lea"
    }
  };
  
  const results = {};
  
  for (const [language, data] of Object.entries(content)) {
    const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        text: data.text,
        voice: data.voice,
        model: data.voice.startsWith('polly') ? 'polly_neural' : 'google_wavenet'
      })
    });
    
    results[language] = await response.json();
  }
  
  return results;
}

Voice Comparison and Testing

For critical applications, it's often best to compare multiple voices:

// Function to test multiple voices with the same content
async function compareVoices(text, voiceIds) {
  const results = [];
  
  for (const voiceId of voiceIds) {
    const response = await fetch('https://api.uberduck.ai/v1/text-to-speech', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        text: text,
        voice: voiceId,
        model: voiceId.startsWith('polly') ? 'polly_neural' : 
               voiceId.startsWith('google') ? 'google_wavenet' : 'azure_neural'
      })
    });
    
    const data = await response.json();
    results.push({
      voiceId: voiceId,
      audioUrl: data.audio_url
    });
  }
  
  return results;
}

// Example usage
const testText = "This is a test of different voice options for our application.";
const voicesToCompare = [
  'polly_joanna',
  'polly_matthew',
  'google_wavenet_a',
  'azure_guy'
];

compareVoices(testText, voicesToCompare)
  .then(results => {
    console.table(results);
    // Present these to stakeholders for selection
  });

Building a Voice Selection Interface

For applications where users select voices, consider building a voice browsing interface:

// React component example for a voice selection interface
function VoiceSelector({ onVoiceSelected }) {
  const [voices, setVoices] = useState([]);
  const [filters, setFilters] = useState({
    gender: '',
    language: 'english',
    accent: ''
  });
  const [selectedVoice, setSelectedVoice] = useState(null);
  const [loading, setLoading] = useState(true);
  
  useEffect(() => {
    async function fetchVoices() {
      setLoading(true);
      
      // Build query string from filters
      const queryParams = new URLSearchParams();
      if (filters.gender) queryParams.append('gender', filters.gender);
      if (filters.language) queryParams.append('language', filters.language);
      if (filters.accent) queryParams.append('accent', filters.accent);
      
      try {
        const response = await fetch(`/api/voices?${queryParams.toString()}`);
        const data = await response.json();
        setVoices(data.voices);
      } catch (error) {
        console.error('Failed to fetch voices:', error);
      } finally {
        setLoading(false);
      }
    }
    
    fetchVoices();
  }, [filters]);
  
  function handleFilterChange(name, value) {
    setFilters(prev => ({
      ...prev,
      [name]: value
    }));
  }
  
  function handleVoiceSelect(voice) {
    setSelectedVoice(voice);
    onVoiceSelected(voice);
  }
  
  async function playVoiceSample(voice) {
    // Play the sample audio
    const audio = new Audio(voice.sample_url);
    audio.play();
  }
  
  return (
    <div className="voice-selector">
      <div className="filters">
        <select 
          value={filters.gender} 
          onChange={e => handleFilterChange('gender', e.target.value)}
        >
          <option value="">All Genders</option>
          <option value="male">Male</option>
          <option value="female">Female</option>
        </select>
        
        <select 
          value={filters.language} 
          onChange={e => handleFilterChange('language', e.target.value)}
        >
          <option value="">All Languages</option>
          <option value="english">English</option>
          <option value="spanish">Spanish</option>
          <option value="french">French</option>
          {/* More languages */}
        </select>
        
        <select 
          value={filters.accent} 
          onChange={e => handleFilterChange('accent', e.target.value)}
        >
          <option value="">All Accents</option>
          <option value="american">American</option>
          <option value="british">British</option>
          <option value="australian">Australian</option>
          {/* More accents */}
        </select>
      </div>
      
      {loading ? (
        <div className="loading">Loading voices...</div>
      ) : (
        <div className="voice-list">
          {voices.map(voice => (
            <div 
              key={voice.voicemodel_uuid}
              className={`voice-item ${selectedVoice?.voicemodel_uuid === voice.voicemodel_uuid ? 'selected' : ''}`}
              onClick={() => handleVoiceSelect(voice)}
            >
              <h3>{voice.display_name}</h3>
              <div className="voice-details">
                <span>{voice.gender}</span>
                <span>{voice.accent}</span>
                <span>{voice.language}</span>
              </div>
              <button onClick={(e) => {
                e.stopPropagation();
                playVoiceSample(voice);
              }}>
                Play Sample
              </button>
            </div>
          ))}
        </div>
      )}
    </div>
  );
}

Conclusion

Selecting the right voice significantly impacts how your content is received. Take time to explore the available options, test different voices with your actual content, and consider your audience's preferences and expectations.

For more information, refer to the Voices API Reference and explore our Text-to-Speech Guide for advanced usage techniques.

Types of Voices​

Provider Voices​

Zero-Shot Voices​

Fine-Tuned Voices​

Finding the Right Voice​

Using the Voices API Endpoint​

Voice Selection Parameters​

Voice Categories​

Voice Selection by Use Case​

Business Applications​

Gaming and Entertainment​

Educational Content​

Multilingual Applications​

Voice Comparison and Testing​

Building a Voice Selection Interface​

Conclusion​

Types of Voices

Provider Voices

Zero-Shot Voices

Fine-Tuned Voices

Finding the Right Voice

Using the Voices API Endpoint

Voice Selection Parameters

Voice Categories

Voice Selection by Use Case

Business Applications

Gaming and Entertainment

Educational Content

Multilingual Applications

Voice Comparison and Testing

Building a Voice Selection Interface

Conclusion