VOCABLE Documentation
Everything you need to add voice to your AI agents. From first API call to production integration.
What is VOCABLE?
VOCABLE is a cloud API that converts text to speech using open-source AI models. You send text in, you get audio back. One API call, one line of code, real human-sounding voice output.
Unlike proprietary TTS services (ElevenLabs at $180/1M characters, OpenAI at $15-30/1M), VOCABLE runs open-source models on managed GPU infrastructure — delivering the same quality at a fraction of the cost. The models we use (F5-TTS, NVIDIA Magpie) are open-source with 14,000+ GitHub stars. There is no proprietary lock-in at the model layer.
VOCABLE handles the hard parts — GPU provisioning, model serving, authentication, rate limiting, usage tracking — so you can focus on building your application.
Up to 3.6x Cheaper
~$50 per 1M characters vs $180 for ElevenLabs. Open-source models, managed infrastructure.
No Lock-in
Built on open-source models. Standard REST API. Move away anytime with zero migration cost.
Production Ready
API key auth, usage limits, rate limiting, 8 multilingual voices, two model options.
Who is it for?
VOCABLE is built for developers adding voice capabilities to AI applications. If you're building any of the following, VOCABLE is for you:
Voice-enabled AI agents
LangChain, CrewAI, AutoGen, or custom agent frameworks that need to speak to users
Conversational AI products
Chatbots, virtual assistants, or customer service tools that respond with voice
Content creation tools
Podcast generators, audiobook narration, or video voiceover automation
Accessibility features
Screen readers, text-to-speech widgets, or voice output for visually impaired users
Prototyping & demos
Quick voice demos for investor pitches, hackathons, or proof-of-concept builds
IoT & robotics
Giving voice to hardware devices, robots, or embedded systems via HTTP
How It Works
Four steps from text to audio. No GPU setup, no model downloads, no infrastructure.
You send text
Your application makes a POST request to the VOCABLE API with the text you want spoken, the model to use, and optionally a voice selection.
We authenticate & validate
VOCABLE verifies your API key, checks your plan's usage limits and rate limits, and validates the request parameters.
GPU inference runs
The selected open-source model (F5-TTS or NVIDIA Magpie) generates speech on managed GPU infrastructure. Typical latency is 3-15 seconds. First request after idle may take up to 30 seconds due to GPU warmup.
Audio returns to you
A WAV audio file is returned in the HTTP response body. Play it, save it, stream it, or pipe it to your agent's output.
All models are open-source (F5-TTS is MIT licensed, NVIDIA Magpie is available via NVIDIA NIM). There is no proprietary lock-in at the model layer. The API is standard REST — any HTTP client in any language works.
Quick Start
Go from zero to your first voice output in under 5 minutes.
1Create an account
Join the early access waitlist— we'll send you an invite when we launch. The free tier will include 10,000 characters per month, enough for hundreds of voice outputs.
2Create an API key
Go to your dashboardand click "Create API Key". Copy the key immediately — it's shown only once. It looks like:
sk-vocable-a1b2c3d4e5f6...3Make your first TTS request
Replace YOUR_API_KEY with the key you just copied:
4Play the audio
Open hello.wav— you'll hear a natural, human-sounding voice speaking your text. That's it. You just made your first VOCABLE API call.
Try it without code: Log into your dashboard and use the Voice Playground to type text and hear it spoken — no API key needed for the playground.
Models & Voices
VOCABLE offers two open-source TTS models. Each has different strengths.
Available Voices (NVIDIA Magpie)
| Voice ID | Speaker | Language |
|---|---|---|
Magpie-Multilingual.EN-US.Aria | Aria (Female) | English (US) |
Magpie-Multilingual.EN-US.Jason | Jason (Male) | English (US) |
Magpie-Multilingual.EN-US.Leo | Leo (Male) | English (US) |
Magpie-Multilingual.DE-DE.Aria | Aria (Female) | German |
Magpie-Multilingual.DE-DE.Leo | Leo (Male) | German |
Magpie-Multilingual.ZH-CN.Mia | Mia (Female) | Chinese (Mandarin) |
Magpie-Multilingual.ES-US.Aria | Aria (Female) | Spanish (US) |
Magpie-Multilingual.FR-FR.Aria | Aria (Female) | French |
Programmatic access: GET /api/tts/models returns all models and voices as JSON.
Integration Examples
VOCABLE is a standard REST API. Any language or framework that can make HTTP requests works out of the box.
Python
Node.js / TypeScript
LangChain / AI Agent Frameworks
Use VOCABLE as a custom tool in any agent framework. Here's a LangChain example:
Multilingual Example
API Reference
Error Reference
All errors return JSON with an error field explaining what went wrong.
| Code | Error | Cause | What to do |
|---|---|---|---|
400 | Invalid request | Missing or malformed text, model, or voice | Check request body against the API reference above |
400 | voice not supported for f5-tts | Sent voice parameter with f5-tts model | Remove voice field, or switch to nvidia-magpie |
401 | Unauthorized | Missing, invalid, or revoked API key | Check your Authorization header. Create a new key if needed. |
429 | Usage limit exceeded | Monthly character quota reached | Upgrade your plan or wait for reset on the 1st of the month |
429 | Rate limit exceeded | Too many requests per minute | Wait for Retry-After seconds, then retry |
502 | TTS engine error | Upstream model returned an error | Retry in a few seconds. Try the other model if it persists. |
503 | Model not configured | NVIDIA Magpie API not available | Use f5-tts model instead |
504 | TTS engine timed out | GPU cold-start or queue delay | Wait 30s and retry. First request after idle is slowest. |
Plans & Limits
Start free. Upgrade when you need more volume.
| Plan | Price | Characters / month | API Keys | Rate Limit | Models |
|---|---|---|---|---|---|
| Free | $0 | 10,000 | 1 | 5 req/min | F5-TTS |
| Starter | $29/mo | 500,000 | 5 | 30 req/min | All models |
| Pro | $99/mo | 2,000,000 | 100 | 120 req/min | All models + 8 voices + 5 languages |
Usage resets on the 1st of each month (UTC). Unused characters do not roll over. Upgrade or downgrade anytime from your dashboard.
FAQ
What audio format does VOCABLE return?
WAV (PCM, 16-bit, mono). WAV is uncompressed and widely supported — you can convert to MP3 or OGG client-side if needed.
How fast is the API?
Typical response time is 3-15 seconds depending on text length and model. The first request after an idle period (GPU cold start) can take up to 30 seconds. Subsequent requests in the same session are faster (2-8 seconds).
What languages are supported?
NVIDIA Magpie supports English, German, Chinese (Mandarin), Spanish, and French. F5-TTS supports English only. We plan to add more languages as open-source models improve.
Can I use VOCABLE in production?
Yes. The API includes authentication, rate limiting, usage tracking, and error handling designed for production use. Start with the free tier to validate your integration, then upgrade when you need volume.
What happens when I hit my character limit?
The API returns a 429 status with a clear message showing your current usage and limit. Your existing API keys continue to work — you just can't make new TTS requests until you upgrade or the limit resets on the 1st of the month.
Are the models really open-source?
Yes. F5-TTS code is MIT licensed (pretrained models are CC-BY-NC). NVIDIA Magpie is available via NVIDIA NIM. The model architectures are public and inspectable. VOCABLE provides managed hosting so you don't have to run GPU infrastructure yourself.
How is this different from ElevenLabs or OpenAI TTS?
Three differences: (1) Cost — VOCABLE is up to 3.6x cheaper per character ($50/1M vs $180/1M). (2) Transparency — we use open-source models you can inspect, not proprietary black boxes. (3) No lock-in — standard REST API with no proprietary voice IDs or formats to migrate away from.
Do you store the text I send?
We log the character count, model used, and timestamp for usage tracking. The full text is not stored after the audio is generated. See our privacy policy for details.
Ready to add voice?
VOCABLE is launching soon. Get early access to be the first to try it.
Get Early Access