Skip to main content
Create your own custom voice models by training on audio samples. Perfect for businesses wanting branded voices, content creators building character libraries, or developers creating unique voice applications.

Overview

Training a custom voice involves three main steps:
1

Create the voice

Initialize a new voice model with your training parameters
2

Upload training audio

Provide high-quality audio samples for the voice to learn from
3

Start training

Begin the AI training process to create your custom voice model
Custom voice training consumes 10 API credits per minute of training time (specified by maxMinutes). Choose your training duration wisely.

Step 1: Create a Voice Model

Start by creating a new voice with your desired training parameters:
curl -X POST 'https://api.voicedub.ai/v1/me/voices' \
  -H 'Authorization: Api-Key YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "My Custom Voice",
    "maxMinutes": 10,
    "separate": true,
    "numberOfFiles": 3
  }'
{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "name": "My Custom Voice",
    "status": "new",
    "separate": true,
    "maxMinutes": 10,
    "requiredCredits": 100,
    "createdAt": "2024-01-15T10:00:00.000Z",
    ...
  },
  "uploadUrls": [
    "https://s3.amazonaws.com/bucket/models/.../vocals/file_0?signature=...",
    "https://s3.amazonaws.com/bucket/models/.../vocals/file_1?signature=...",
    "https://s3.amazonaws.com/bucket/models/.../vocals/file_2?signature=..."
  ]
}

Parameters Explained

name
string
required
Display name for your voice model (1-50 characters). Choose something descriptive and unique.
numberOfFiles
integer
required
Number of audio files to upload for training (1-5). More diverse files typically yield better results.
maxMinutes
integer
default:10
Maximum duration of training audio in minutes (5-60). More training data generally produces better quality, but costs more credits.
separate
boolean
default:true
Whether to use vocal separation during training. Set to false if your training audio already contains isolated vocals.

Step 2: Upload Training Audio

Use the provided upload URLs to upload your training audio files:
curl -X PUT 'UPLOAD_URL_FROM_RESPONSE' \
  -H 'Content-Type: audio/mpeg' \
  --data-binary @training-audio-1.mp3
Repeat this for each training file. The training won’t start until all files are uploaded.

Training Audio Guidelines

Follow these guidelines for best training results:
Audio Quality:
  • Use high-quality recordings (192kbps+ MP3 or lossless formats)
  • A total of 3–5 minutes works well, but longer files (up to 20 minutes) are also supported
  • Minimize background noise and reverb
  • Ensure consistent audio levels across files
  • Avoid heavily processed or auto-tuned vocals
Content Variety:
  • Include diverse vocal expressions (soft, loud, emotional)
  • Mix different tempos and rhythms
  • Include both sustained notes and quick phrases
  • Vary pitch range throughout the samples
Technical Requirements:
  • Each file: 1-20 minutes duration
  • Total training duration: Up to your specified maxMinutes
  • Supported formats: MP3, WAV, M4A, FLAC, OGG
  • Video files are supported (audio will be extracted)
Only use audio you have legal rights to. Training on copyrighted material without permission is prohibited.

Step 3: Start Training

Once all files are uploaded, start the training process:
curl -X POST 'https://api.voicedub.ai/v1/me/voices/789e0123-e89b-12d3-a456-426614174000/clone' \
  -H 'Authorization: Api-Key YOUR_API_KEY'
{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "status": "queued",
    "apiCreditsUsed": 100,
    "apiCreditsLeft": 900,
    ...
  }
}

Step 4: Monitor Training Progress

Poll the voice status to track training progress:
curl -X GET 'https://api.voicedub.ai/v1/me/voices/789e0123-e89b-12d3-a456-426614174000' \
  -H 'Authorization: Api-Key YOUR_API_KEY'
{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "name": "My Custom Voice",
    "status": "training",
    "maxMinutes": 10,
    "apiCreditsUsed": 100,
    ...
  }
}

Status Values

  • new - Voice created, waiting for file uploads
  • queued - All files uploaded, waiting in training queue
  • starting - Training initialization beginning
  • processing - AI model training in progress
  • finalizing - Completing training and validating model
  • done - Training complete! Voice ready for use
  • error - Training failed (check errorMessage)
Training typically takes up to maxMinutes + setup time (a couple minutes) depending on the amount of training audio, but can be higher based on current queue load.
Poll the voice status maximum once every 3 seconds to avoid rate limiting.

Step 5: Use Your Trained Voice

Once training is complete (status: "done"), your voice is ready for dubbing:
{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "name": "My Custom Voice", 
    "status": "done",
    "completedAt": "2024-01-15T11:30:00.000Z",
    "apiCreditsUsed": 100,
    ...
  }
}
Now you can use this voice ID in the dubbing API just like any public voice:
curl -X POST 'https://api.voicedub.ai/v1/me/dubs' \
  -H 'Authorization: Api-Key YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "voiceId": "789e0123-e89b-12d3-a456-426614174000",
    "link": "https://example.com/audio-to-dub.mp3"
  }'
When filtering public voices (languages, genres, styles), multiple values inside the same filter are matched using OR (for example, languages: English or Spanish). Different filter types are combined using AND (for example, languages and genres and styles must all match). See the API reference for details.

Complete Example

Here’s a complete Node.js example for training a custom voice:
const fs = require('fs');
const apiKey = process.env.VOICEDUB_API_KEY;
const baseUrl = 'https://api.voicedub.ai';

async function trainCustomVoice(name, audioFiles, maxMinutes = 10) {
  // Step 1: Create the voice
  const createResponse = await fetch(`${baseUrl}/v1/me/voices`, {
    method: 'POST',
    headers: {
      'Authorization': `Api-Key ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      name: name,
      maxMinutes: maxMinutes,
      separate: true,
      numberOfFiles: audioFiles.length
    })
  });
  
  const { voice, uploadUrls } = await createResponse.json();
  console.log('Voice created:', voice.id);
  console.log('Required credits:', voice.requiredCredits);
  
  // Step 2: Upload training files
  for (let i = 0; i < audioFiles.length; i++) {
    const fileData = fs.readFileSync(audioFiles[i]);
    
    await fetch(uploadUrls[i], {
      method: 'PUT',
      headers: { 'Content-Type': 'audio/mpeg' },
      body: fileData
    });
    
    console.log(`Uploaded file ${i + 1}/${audioFiles.length}`);
  }
  
  // Step 3: Start training
  await fetch(`${baseUrl}/v1/me/voices/${voice.id}/clone`, {
    method: 'POST',
    headers: { 'Authorization': `Api-Key ${apiKey}` }
  });
  
  console.log('Training started...');
  
  // Step 4: Wait for completion
  let status = 'queued';
  while (!['done', 'error'].includes(status)) {
    await new Promise(resolve => setTimeout(resolve, 30000)); // Wait 30 seconds
    
    const statusResponse = await fetch(`${baseUrl}/v1/me/voices/${voice.id}`, {
      headers: { 'Authorization': `Api-Key ${apiKey}` }
    });
    
    const voiceData = await statusResponse.json();
    status = voiceData.voice.status;
    
    console.log('Training status:', status);
    
    if (status === 'done') {
      console.log('Training completed! Voice ID:', voice.id);
      return voice.id;
    } else if (status === 'error') {
      throw new Error(`Training failed: ${voiceData.voice.errorMessage}`);
    }
  }
}

// Usage
const audioFiles = [
  './training-audio-1.mp3',
  './training-audio-2.mp3', 
  './training-audio-3.mp3'
];

trainCustomVoice('My Custom Voice', audioFiles, 15)
  .then(voiceId => console.log('Success! Voice ID:', voiceId))
  .catch(err => console.error('Error:', err));

Training Duration Guidelines

Choose your maxMinutes based on your quality needs and budget:

Basic (5-10 min)

Cost: 50-100 creditsQuality: Good for simple voicesBest for: Testing, basic character voices

Standard (10-30 min)

Cost: 100-300 creditsQuality: High quality resultsBest for: Most use cases, content creation

Premium (30-60 min)

Cost: 300-600 creditsQuality: Exceptional qualityBest for: Professional projects, complex voices

Voice Quality Tips

  • Quiet space: Record in a quiet room with minimal echo
  • Consistent microphone: Use the same mic for all training files
  • Stable distance: Maintain consistent distance from microphone
  • Audio levels: Keep input levels consistent but avoid clipping
  • Emotional range: Include happy, sad, excited, and calm expressions
  • Pitch variety: Cover the full vocal range of the target voice
  • Speech patterns: Include natural speech or singing rhythm and pacing
  • Phonetic coverage: Ensure good coverage of different sounds
  • Sample rate: 44.1kHz minimum, 48kHz preferred
  • Bit depth: 16-bit minimum, 24-bit preferred
  • Format: WAV or FLAC for best quality, high-bitrate MP3 acceptable
  • Editing: Light noise reduction okay, avoid heavy processing

Pricing & Credits

Custom voice training consumes 10 API credits per minute of training time (based on your maxMinutes setting):
  • 5-minute training = 50 credits
  • 10-minute training = 100 credits
  • 30-minute training = 300 credits
  • 60-minute training = 600 credits
The exact cost is shown in the requiredCredits field when you create the voice, before training starts.

Troubleshooting

Common causes and solutions:
  • Low-quality source audio: Use higher bitrate recordings
  • Inconsistent audio: Ensure similar recording conditions for all files
  • Insufficient data: Try increasing maxMinutes for more training material
  • Poor vocal separation: If audio has backing tracks, ensure separate: true
Training typically takes the length of your maxMinutes setting, plus 1–2 minutes for setup and processing.If your training is taking significantly longer than this, please reach out to our support team for assistance.
  • File too large: Max 50MB per file, consider compressing audio
  • Unsupported format: Use MP3, WAV, M4A, FLAC, or OGG
  • Network timeout: Try uploading smaller files or check connection
  • Try different source material: More diverse audio often helps
  • Adjust pitch in dubbing: Use pitchShift parameter when creating dubs
  • Increase training duration: More training data usually improves quality
  • Use dry acapella recordings: Training works best with clean vocals free from background noise or music. Our vocal separation can help, but starting with isolated vocals gives the best results.

Voice Management

Once trained, your custom voices:
  • Persist indefinitely - No expiration or maintenance required
  • Are private to your account - Only you can use them for dubbing
  • Can be used unlimited times - Usage incurs standard API dubbing credits, in addition to the training cost
  • Work with all dubbing features - Pitch shifting, vocal separation, etc.
Your trained voice is now ready! Use it in the dubbing API or integrate it into your applications.