Token pricing for each tool

Approximate token cost for Translate Audio, Translate Video, Add Captions, AI Chat and other tools.

Written By Umakhan Magomedov

Last updated About 21 hours ago

This page lists the approximate token cost for each AI tool in VocaLingo. The app always shows an estimate before you start, so you can verify the cost for your specific input.

ℹ️ All prices are estimates. Actual charges may vary slightly based on final processing results..

Translate Video

Mode

Price

HeyGen

5 tokens/sec

HeyGen + Enhanced Cloning

10 tokens/sec

ElevenLabs dubbing

About 0.84 tokens/sec

ElevenLabs dubbing + Lip Sync

About 1.5 tokens/sec

Add Captions

Mode

Approximate cost

Mirage animated templates

15 tokens per started minute

Standard subtitles

Speech recognition + rendering

Translated subtitles

Speech recognition + translation + rendering

Learn more: How Add Captions works.

Translate Audio

Charged per pipeline step. Voiceover is optional and runs when you tap play on the Translation tab. Full settings guide: Translate Audio settings.

Speech recognition

Provider

Price

ElevenLabs Scribe (default)

0.0133 tokens/sec

OpenAI Transcribe

0.02 tokens/sec

Whisper

0.01 tokens/sec

Example: 60-second voice message with ElevenLabs Scribe ≈ 0.8 tokens.

Translation

Model

Price

Gemini Flash Lite

0.006 tokens/1K chars

Gemini Flash

0.028 tokens/1K chars

Gemini 3 (first pipeline translation)

0.044 tokens/1K chars

GPT-4o

0.156 tokens/1K chars

GPT-5 Mini

0.028 tokens/1K chars

First upload always uses Gemini 3. Settings model applies to re-translations only.

Voiceover (TTS)

Provider

Price

Extra fees

ElevenLabs

0.01 tokens/sec

None

OpenAI

0.03 tokens/sec

None

MiniMax

0.15 tokens/sec

150 tokens first clone per voice

Qwen

0.15 tokens/sec

Minimum 5 tokens per request

HeyGen v3

1.84 tokens/sec

None

Example: 30-second voiced translation with ElevenLabs ≈ 0.3 tokens. Same text with HeyGen ≈ 55 tokens.

Other tools

Speech to Text, Video to Text, Text Analysis, Text to Speech, Tourist Translator, Text Translator, AI Calls and AI Chat show a token estimate before you start. Cost depends on duration, selected model, text length and optional summaries.

Frequently asked questions