Training Custom Voices

Create your own custom voice models by training on audio samples. Perfect for businesses wanting branded voices, content creators building character libraries, or developers creating unique voice applications.

Overview

Training a custom voice involves three main steps:

Create the voice

Initialize a new voice model with your training parameters

Upload training audio

Provide high-quality audio samples for the voice to learn from

Start training

Begin the AI training process to create your custom voice model

Custom voice training consumes 10 API credits per minute of training time (specified by maxMinutes). Choose your training duration wisely.

Step 1: Create a Voice Model

Start by creating a new voice with your desired training parameters:

curl -X POST 'https://api.voicedub.ai/v1/me/voices' \
  -H 'Authorization: Api-Key YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "My Custom Voice",
    "maxMinutes": 10,
    "separate": true,
    "numberOfFiles": 3
  }'

Show Response

{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "name": "My Custom Voice",
    "status": "new",
    "separate": true,
    "maxMinutes": 10,
    "requiredCredits": 100,
    "createdAt": "2024-01-15T10:00:00.000Z",
    ...
  },
  "uploadUrls": [
    "https://s3.amazonaws.com/bucket/models/.../vocals/file_0?signature=...",
    "https://s3.amazonaws.com/bucket/models/.../vocals/file_1?signature=...",
    "https://s3.amazonaws.com/bucket/models/.../vocals/file_2?signature=..."
  ]
}

Parameters Explained

name

string

required

Display name for your voice model (1-50 characters). Choose something descriptive and unique.

numberOfFiles

integer

required

Number of audio files to upload for training (1-5). More diverse files typically yield better results.

maxMinutes

integer

default:10

Maximum duration of training audio in minutes (5-60). More training data generally produces better quality, but costs more credits.

separate

boolean

default:true

Whether to use vocal separation during training. Set to false if your training audio already contains isolated vocals.

Step 2: Upload Training Audio

Use the provided upload URLs to upload your training audio files:

curl -X PUT 'UPLOAD_URL_FROM_RESPONSE' \
  -H 'Content-Type: audio/mpeg' \
  --data-binary @training-audio-1.mp3

Repeat this for each training file. The training won’t start until all files are uploaded.

Training Audio Guidelines

Follow these guidelines for best training results:

Audio Quality:

Use high-quality recordings (192kbps+ MP3 or lossless formats)
A total of 3–5 minutes works well, but longer files (up to 20 minutes) are also supported
Minimize background noise and reverb
Ensure consistent audio levels across files
Avoid heavily processed or auto-tuned vocals

Content Variety:

Include diverse vocal expressions (soft, loud, emotional)
Mix different tempos and rhythms
Include both sustained notes and quick phrases
Vary pitch range throughout the samples

Technical Requirements:

Each file: 1-20 minutes duration
Total training duration: Up to your specified maxMinutes
Supported formats: MP3, WAV, M4A, FLAC, OGG
Video files are supported (audio will be extracted)

Only use audio you have legal rights to. Training on copyrighted material without permission is prohibited.

Step 3: Start Training

Once all files are uploaded, start the training process:

curl -X POST 'https://api.voicedub.ai/v1/me/voices/789e0123-e89b-12d3-a456-426614174000/clone' \
  -H 'Authorization: Api-Key YOUR_API_KEY'

Show Response

{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "status": "queued",
    "apiCreditsUsed": 100,
    "apiCreditsLeft": 900,
    ...
  }
}

Step 4: Monitor Training Progress

Poll the voice status to track training progress:

curl -X GET 'https://api.voicedub.ai/v1/me/voices/789e0123-e89b-12d3-a456-426614174000' \
  -H 'Authorization: Api-Key YOUR_API_KEY'

Show Response

{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "name": "My Custom Voice",
    "status": "training",
    "maxMinutes": 10,
    "apiCreditsUsed": 100,
    ...
  }
}

Status Values

new - Voice created, waiting for file uploads
queued - All files uploaded, waiting in training queue
starting - Training initialization beginning
processing - AI model training in progress
finalizing - Completing training and validating model
done - Training complete! Voice ready for use
error - Training failed (check errorMessage)

Training typically takes up to maxMinutes + setup time (a couple minutes) depending on the amount of training audio, but can be higher based on current queue load.

Poll the voice status maximum once every 3 seconds to avoid rate limiting.

Step 5: Use Your Trained Voice

Once training is complete (status: "done"), your voice is ready for dubbing:

Show Response (when complete)

{
  "voice": {
    "id": "789e0123-e89b-12d3-a456-426614174000",
    "name": "My Custom Voice", 
    "status": "done",
    "completedAt": "2024-01-15T11:30:00.000Z",
    "apiCreditsUsed": 100,
    ...
  }
}

Now you can use this voice ID in the dubbing API just like any public voice:

curl -X POST 'https://api.voicedub.ai/v1/me/dubs' \
  -H 'Authorization: Api-Key YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "voiceId": "789e0123-e89b-12d3-a456-426614174000",
    "link": "https://example.com/audio-to-dub.mp3"
  }'

When filtering public voices (languages, genres, styles), multiple values inside the same filter are matched using OR (for example, languages: English or Spanish). Different filter types are combined using AND (for example, languages and genres and styles must all match). See the API reference for details.

Complete Example

Here’s a complete Node.js example for training a custom voice:

const fs = require('fs');
const apiKey = process.env.VOICEDUB_API_KEY;
const baseUrl = 'https://api.voicedub.ai';

async function trainCustomVoice(name, audioFiles, maxMinutes = 10) {
  // Step 1: Create the voice
  const createResponse = await fetch(`${baseUrl}/v1/me/voices`, {
    method: 'POST',
    headers: {
      'Authorization': `Api-Key ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      name: name,
      maxMinutes: maxMinutes,
      separate: true,
      numberOfFiles: audioFiles.length
    })
  });
  
  const { voice, uploadUrls } = await createResponse.json();
  console.log('Voice created:', voice.id);
  console.log('Required credits:', voice.requiredCredits);
  
  // Step 2: Upload training files
  for (let i = 0; i < audioFiles.length; i++) {
    const fileData = fs.readFileSync(audioFiles[i]);
    
    await fetch(uploadUrls[i], {
      method: 'PUT',
      headers: { 'Content-Type': 'audio/mpeg' },
      body: fileData
    });
    
    console.log(`Uploaded file ${i + 1}/${audioFiles.length}`);
  }
  
  // Step 3: Start training
  await fetch(`${baseUrl}/v1/me/voices/${voice.id}/clone`, {
    method: 'POST',
    headers: { 'Authorization': `Api-Key ${apiKey}` }
  });
  
  console.log('Training started...');
  
  // Step 4: Wait for completion
  let status = 'queued';
  while (!['done', 'error'].includes(status)) {
    await new Promise(resolve => setTimeout(resolve, 30000)); // Wait 30 seconds
    
    const statusResponse = await fetch(`${baseUrl}/v1/me/voices/${voice.id}`, {
      headers: { 'Authorization': `Api-Key ${apiKey}` }
    });
    
    const voiceData = await statusResponse.json();
    status = voiceData.voice.status;
    
    console.log('Training status:', status);
    
    if (status === 'done') {
      console.log('Training completed! Voice ID:', voice.id);
      return voice.id;
    } else if (status === 'error') {
      throw new Error(`Training failed: ${voiceData.voice.errorMessage}`);
    }
  }
}

// Usage
const audioFiles = [
  './training-audio-1.mp3',
  './training-audio-2.mp3', 
  './training-audio-3.mp3'
];

trainCustomVoice('My Custom Voice', audioFiles, 15)
  .then(voiceId => console.log('Success! Voice ID:', voiceId))
  .catch(err => console.error('Error:', err));

Training Duration Guidelines

Choose your maxMinutes based on your quality needs and budget:

Basic (5-10 min)

Cost: 50-100 creditsQuality: Good for simple voicesBest for: Testing, basic character voices

Standard (10-30 min)

Cost: 100-300 creditsQuality: High quality resultsBest for: Most use cases, content creation

Premium (30-60 min)

Cost: 300-600 creditsQuality: Exceptional qualityBest for: Professional projects, complex voices

Voice Quality Tips

Recording Environment

Quiet space: Record in a quiet room with minimal echo
Consistent microphone: Use the same mic for all training files
Stable distance: Maintain consistent distance from microphone
Audio levels: Keep input levels consistent but avoid clipping

Content Selection

Emotional range: Include happy, sad, excited, and calm expressions
Pitch variety: Cover the full vocal range of the target voice
Speech patterns: Include natural speech or singing rhythm and pacing
Phonetic coverage: Ensure good coverage of different sounds

Technical Quality

Sample rate: 44.1kHz minimum, 48kHz preferred
Bit depth: 16-bit minimum, 24-bit preferred
Format: WAV or FLAC for best quality, high-bitrate MP3 acceptable
Editing: Light noise reduction okay, avoid heavy processing

Pricing & Credits

Custom voice training consumes 10 API credits per minute of training time (based on your maxMinutes setting):

5-minute training = 50 credits
10-minute training = 100 credits
30-minute training = 300 credits
60-minute training = 600 credits

The exact cost is shown in the requiredCredits field when you create the voice, before training starts.

Troubleshooting

Training failed with poor quality

Common causes and solutions:

Low-quality source audio: Use higher bitrate recordings
Inconsistent audio: Ensure similar recording conditions for all files
Insufficient data: Try increasing maxMinutes for more training material
Poor vocal separation: If audio has backing tracks, ensure separate: true

Training stuck or taking too long

Training typically takes the length of your maxMinutes setting, plus 1–2 minutes for setup and processing.If your training is taking significantly longer than this, please reach out to our support team for assistance.

Upload failures

File too large: Max 50MB per file, consider compressing audio
Unsupported format: Use MP3, WAV, M4A, FLAC, or OGG
Network timeout: Try uploading smaller files or check connection

Voice doesn't sound right

Try different source material: More diverse audio often helps
Adjust pitch in dubbing: Use pitchShift parameter when creating dubs
Increase training duration: More training data usually improves quality
Use dry acapella recordings: Training works best with clean vocals free from background noise or music. Our vocal separation can help, but starting with isolated vocals gives the best results.

Voice Management

Once trained, your custom voices:

Persist indefinitely - No expiration or maintenance required
Are private to your account - Only you can use them for dubbing
Can be used unlimited times - Usage incurs standard API dubbing credits, in addition to the training cost
Work with all dubbing features - Pitch shifting, vocal separation, etc.

Your trained voice is now ready! Use it in the dubbing API or integrate it into your applications.

Getting Started

Core Features

Advanced Topics

Overview

Step 1: Create a Voice Model

Parameters Explained

Step 2: Upload Training Audio

Training Audio Guidelines

Step 3: Start Training

Step 4: Monitor Training Progress

Status Values

Step 5: Use Your Trained Voice

Complete Example

Training Duration Guidelines

Basic (5-10 min)

Standard (10-30 min)

Premium (30-60 min)

Voice Quality Tips

Pricing & Credits

Troubleshooting

Voice Management

Getting Started

Core Features

Advanced Topics

​Overview

​Step 1: Create a Voice Model

​Parameters Explained

​Step 2: Upload Training Audio

​Training Audio Guidelines

​Step 3: Start Training

​Step 4: Monitor Training Progress

​Status Values

​Step 5: Use Your Trained Voice

​Complete Example

​Training Duration Guidelines

Basic (5-10 min)

Standard (10-30 min)

Premium (30-60 min)

​Voice Quality Tips

​Pricing & Credits

​Troubleshooting

​Voice Management

Overview

Step 1: Create a Voice Model

Parameters Explained

Step 2: Upload Training Audio

Training Audio Guidelines

Step 3: Start Training

Step 4: Monitor Training Progress

Status Values

Step 5: Use Your Trained Voice

Complete Example

Training Duration Guidelines

Voice Quality Tips

Pricing & Credits

Troubleshooting

Voice Management