Documentation

VOCABLE Documentation

Everything you need to add voice to your AI agents. From first API call to production integration.

What is VOCABLE?

VOCABLE is a cloud API that converts text to speech using open-source AI models. You send text in, you get audio back. One API call, one line of code, real human-sounding voice output.

Unlike proprietary TTS services (ElevenLabs at $180/1M characters, OpenAI at $15-30/1M), VOCABLE runs open-source models on managed GPU infrastructure — delivering the same quality at a fraction of the cost. The models we use (F5-TTS, NVIDIA Magpie) are open-source with 14,000+ GitHub stars. There is no proprietary lock-in at the model layer.

VOCABLE handles the hard parts — GPU provisioning, model serving, authentication, rate limiting, usage tracking — so you can focus on building your application.

Up to 3.6x Cheaper

~$50 per 1M characters vs $180 for ElevenLabs. Open-source models, managed infrastructure.

No Lock-in

Built on open-source models. Standard REST API. Move away anytime with zero migration cost.

Production Ready

API key auth, usage limits, rate limiting, 8 multilingual voices, two model options.

Who is it for?

VOCABLE is built for developers adding voice capabilities to AI applications. If you're building any of the following, VOCABLE is for you:

Voice-enabled AI agents

LangChain, CrewAI, AutoGen, or custom agent frameworks that need to speak to users

Conversational AI products

Chatbots, virtual assistants, or customer service tools that respond with voice

Content creation tools

Podcast generators, audiobook narration, or video voiceover automation

Accessibility features

Screen readers, text-to-speech widgets, or voice output for visually impaired users

Prototyping & demos

Quick voice demos for investor pitches, hackathons, or proof-of-concept builds

IoT & robotics

Giving voice to hardware devices, robots, or embedded systems via HTTP

How It Works

Four steps from text to audio. No GPU setup, no model downloads, no infrastructure.

You send text

Your application makes a POST request to the VOCABLE API with the text you want spoken, the model to use, and optionally a voice selection.

We authenticate & validate

VOCABLE verifies your API key, checks your plan's usage limits and rate limits, and validates the request parameters.

GPU inference runs

The selected open-source model (F5-TTS or NVIDIA Magpie) generates speech on managed GPU infrastructure. Typical latency is 3-15 seconds. First request after idle may take up to 30 seconds due to GPU warmup.

Audio returns to you

A WAV audio file is returned in the HTTP response body. Play it, save it, stream it, or pipe it to your agent's output.

All models are open-source (F5-TTS is MIT licensed, NVIDIA Magpie is available via NVIDIA NIM). There is no proprietary lock-in at the model layer. The API is standard REST — any HTTP client in any language works.

Quick Start

Go from zero to your first voice output in under 5 minutes.

1Create an account

Join the early access waitlist— we'll send you an invite when we launch. The free tier will include 10,000 characters per month, enough for hundreds of voice outputs.

2Create an API key

Go to your dashboardand click "Create API Key". Copy the key immediately — it's shown only once. It looks like:

sk-vocable-a1b2c3d4e5f6...

3Make your first TTS request

Replace YOUR_API_KEY with the key you just copied:

curl

curl -X POST https://vocable.draftlabs.org/api/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello! I am VOCABLE. Your open-source voice API.", "model": "nvidia-magpie", "voice": "Magpie-Multilingual.EN-US.Aria"}' \
  -o hello.wav

4Play the audio

Open hello.wav— you'll hear a natural, human-sounding voice speaking your text. That's it. You just made your first VOCABLE API call.

Try it without code: Log into your dashboard and use the Voice Playground to type text and hear it spoken — no API key needed for the playground.

Models & Voices

VOCABLE offers two open-source TTS models. Each has different strengths.

F5-TTSOpen Source

High-quality English TTS from the F5-TTS research project. Uses a reference voice for consistent output.

Languages: English

Voices: 1 (default reference voice)

Latency: 5-15 seconds (GPU warmup on first call)

Best for: English-only projects, cost-sensitive usage

NVIDIA Magpie TTSNVIDIA NIMRecommended

Multilingual TTS from NVIDIA with 8 distinct voices across 5 languages. Higher quality, more natural prosody.

Languages: English, German, Chinese, Spanish, French

Voices: 8 (3 male, 5 female)

Latency: 2-8s warm, up to 30s on first request after idle

Best for: Multilingual apps, production quality, voice variety

Available Voices (NVIDIA Magpie)

Voice ID	Speaker	Language
`Magpie-Multilingual.EN-US.Aria`	Aria (Female)	English (US)
`Magpie-Multilingual.EN-US.Jason`	Jason (Male)	English (US)
`Magpie-Multilingual.EN-US.Leo`	Leo (Male)	English (US)
`Magpie-Multilingual.DE-DE.Aria`	Aria (Female)	German
`Magpie-Multilingual.DE-DE.Leo`	Leo (Male)	German
`Magpie-Multilingual.ZH-CN.Mia`	Mia (Female)	Chinese (Mandarin)
`Magpie-Multilingual.ES-US.Aria`	Aria (Female)	Spanish (US)
`Magpie-Multilingual.FR-FR.Aria`	Aria (Female)	French

Programmatic access: GET /api/tts/models returns all models and voices as JSON.

Integration Examples

VOCABLE is a standard REST API. Any language or framework that can make HTTP requests works out of the box.

Python

python

import requests

response = requests.post(
    "https://vocable.draftlabs.org/api/tts",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "text": "Welcome to our platform. How can I help you today?",
        "model": "nvidia-magpie",
        "voice": "Magpie-Multilingual.EN-US.Aria"
    }
)

# Save to file
with open("welcome.wav", "wb") as f:
    f.write(response.content)

# Or play directly (requires playsound package)
# from playsound import playsound
# playsound("welcome.wav")

Node.js / TypeScript

javascript

const response = await fetch("https://vocable.draftlabs.org/api/tts", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    text: "Your order has been confirmed. It will arrive by Friday.",
    model: "nvidia-magpie",
    voice: "Magpie-Multilingual.EN-US.Jason",
  }),
});

const audioBuffer = Buffer.from(await response.arrayBuffer());
require("fs").writeFileSync("confirmation.wav", audioBuffer);

LangChain / AI Agent Frameworks

Use VOCABLE as a custom tool in any agent framework. Here's a LangChain example:

python

from langchain.tools import tool
import requests

@tool
def speak(text: str) -> str:
    """Convert text to speech using VOCABLE TTS API."""
    response = requests.post(
        "https://vocable.draftlabs.org/api/tts",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={"text": text, "model": "nvidia-magpie", "voice": "Magpie-Multilingual.EN-US.Aria"}
    )
    with open("/tmp/agent_speech.wav", "wb") as f:
        f.write(response.content)
    return f"Audio saved ({len(response.content)} bytes)"

# Use in your agent
# agent = initialize_agent([speak], llm, agent="zero-shot-react-description")
# agent.run("Say hello to the user")

Multilingual Example

python

import requests

API = "https://vocable.draftlabs.org/api/tts"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

# Generate the same message in 3 languages
messages = [
    ("Hello, welcome!", "Magpie-Multilingual.EN-US.Aria", "en-US"),
    ("Hallo, willkommen!", "Magpie-Multilingual.DE-DE.Aria", "de-DE"),
    ("Hola, bienvenido!", "Magpie-Multilingual.ES-US.Aria", "es-US"),
]

for text, voice, lang in messages:
    response = requests.post(API, headers=HEADERS, json={
        "text": text,
        "model": "nvidia-magpie",
        "voice": voice,
        "language": lang,
    })
    with open(f"welcome_{lang}.wav", "wb") as f:
        f.write(response.content)
    print(f"{lang}: {len(response.content)} bytes")

API Reference

POST/api/tts

Generate speech audio from text. Returns WAV audio on success, JSON error on failure.

Authentication

Authorization: Bearer sk-vocable-xxx

Request body (JSON)

Field	Type	Required	Description
`text`	string	Yes	Text to speak (1 - 5,000 characters)
`model`	string	No	`"f5-tts"` or `"nvidia-magpie"`. Default: `"f5-tts"`
`voice`	string	No	NVIDIA Magpie only. See voice table above. Rejected if sent with f5-tts.
`language`	string	No	Language code (e.g. `"en-US"`, `"de-DE"`). NVIDIA Magpie only.

Response headers

Header	Description
`Content-Type`	`audio/wav` on success, `application/json` on error
`X-RateLimit-Limit`	Max requests per minute for your plan
`X-RateLimit-Remaining`	Requests remaining in current window
`Retry-After`	Seconds to wait before retrying (only on 429)

POST/api/keys

Create a new API key. Returns the raw key once — store it securely. Requires browser session auth.

GET/api/tts/models

List available TTS models and voices. Public endpoint, no auth required. Use to check model availability.

Error Reference

All errors return JSON with an error field explaining what went wrong.

Code	Error	Cause	What to do
`400`	Invalid request	Missing or malformed text, model, or voice	Check request body against the API reference above
`400`	voice not supported for f5-tts	Sent voice parameter with f5-tts model	Remove voice field, or switch to nvidia-magpie
`401`	Unauthorized	Missing, invalid, or revoked API key	Check your Authorization header. Create a new key if needed.
`429`	Usage limit exceeded	Monthly character quota reached	Upgrade your plan or wait for reset on the 1st of the month
`429`	Rate limit exceeded	Too many requests per minute	Wait for Retry-After seconds, then retry
`502`	TTS engine error	Upstream model returned an error	Retry in a few seconds. Try the other model if it persists.
`503`	Model not configured	NVIDIA Magpie API not available	Use f5-tts model instead
`504`	TTS engine timed out	GPU cold-start or queue delay	Wait 30s and retry. First request after idle is slowest.

Plans & Limits

Start free. Upgrade when you need more volume.

Plan	Price	Characters / month	API Keys	Rate Limit	Models
Free	$0	10,000	1	5 req/min	F5-TTS
Starter	$29/mo	500,000	5	30 req/min	All models
Pro	$99/mo	2,000,000	100	120 req/min	All models + 8 voices + 5 languages

Usage resets on the 1st of each month (UTC). Unused characters do not roll over. Upgrade or downgrade anytime from your dashboard.

FAQ

What audio format does VOCABLE return?

WAV (PCM, 16-bit, mono). WAV is uncompressed and widely supported — you can convert to MP3 or OGG client-side if needed.

How fast is the API?

Typical response time is 3-15 seconds depending on text length and model. The first request after an idle period (GPU cold start) can take up to 30 seconds. Subsequent requests in the same session are faster (2-8 seconds).

What languages are supported?

NVIDIA Magpie supports English, German, Chinese (Mandarin), Spanish, and French. F5-TTS supports English only. We plan to add more languages as open-source models improve.

Can I use VOCABLE in production?

Yes. The API includes authentication, rate limiting, usage tracking, and error handling designed for production use. Start with the free tier to validate your integration, then upgrade when you need volume.

What happens when I hit my character limit?

The API returns a 429 status with a clear message showing your current usage and limit. Your existing API keys continue to work — you just can't make new TTS requests until you upgrade or the limit resets on the 1st of the month.

Are the models really open-source?

Yes. F5-TTS code is MIT licensed (pretrained models are CC-BY-NC). NVIDIA Magpie is available via NVIDIA NIM. The model architectures are public and inspectable. VOCABLE provides managed hosting so you don't have to run GPU infrastructure yourself.

How is this different from ElevenLabs or OpenAI TTS?

Three differences: (1) Cost — VOCABLE is up to 3.6x cheaper per character ($50/1M vs $180/1M). (2) Transparency — we use open-source models you can inspect, not proprietary black boxes. (3) No lock-in — standard REST API with no proprietary voice IDs or formats to migrate away from.

Do you store the text I send?

We log the character count, model used, and timestamp for usage tracking. The full text is not stored after the audio is generated. See our privacy policy for details.

Ready to add voice?

VOCABLE is launching soon. Get early access to be the first to try it.

Get Early Access