Artifact type: API reference documentation
Audience: Software developers integrating with the API
Role: Documentation author
Note: Endpoint names and payload values have been anonymized for portfolio use.
Voice Render API — Create a voice job
Version 1 is the current stable release.
Overview
Use this endpoint to convert text into speech as a single synthesis job. The API validates the request, applies policy checks, reserves quota, and enqueues a rendering job. On success, it returns a job_id, which you can use to retrieve the synthesized audio.
Endpoint
POST https://api.example.com/v1/voice/jobs
Content-Type: application/json
Authentication
All requests must include a valid access token.
Header: Authorization: Bearer <token>
Tokens are issued by your identity provider. If the token is missing, invalid, or expired, the API returns 401 unauthenticated.
Idempotency
This endpoint supports idempotency to prevent duplicate job creation and duplicate quota charges during retries.
Header: Idempotency-Key: <uuid>
Each unique request must include a new UUID value.
Idempotency rules
- Generate one new UUID for each job creation attempt.
- If you retry the same request (for example, due to a timeout), reuse the same
Idempotency-Key. - If the same key is reused with a different request body, the API returns
409 idempotency_conflict. - Idempotency keys are retained for 24 hours.
Request
Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <token> where <token> is a valid access token issued by your identity provider. |
Idempotency-Key | Yes | UUID (v4) used to prevent duplicate job creation during retries. |
Content-Type | Yes | Must be application/json. |
Body schema
| Field | Type | Required | Description |
|---|---|---|---|
input | object | Yes | Input content to synthesize. |
input.content | string | Yes | Minimum length: 1 character. Maximum length: 2,000 characters. |
input.content_type | string | Yes | Must be text/plain. |
voice | object | Yes | Voice selection and rendering options. |
voice.profile_id | string | Yes | Voice profile ID (for example, voice_en_us_001). Retrieve available profiles using GET /v1/voice/profiles. |
voice.style | string | No | Rendering style. Supported values depend on the selected voice profile. Default: neutral. |
voice.rate | number | No | Speed multiplier. Range: 0.5–2.0. Default: 1.0. |
output | object | Yes | Output formatting configuration. |
output.format | string | Yes | Audio output format. Currently supports: pcm_s16le. |
output.sample_rate_hz | integer | Yes | Sample rate in Hz (for example, 24000). |
client | object | No | Optional metadata for correlation. |
client.request_tag | string | No | Client-defined identifier for tracking requests. |
Input validation and limits
Plan limits
- Maximum input length may vary by plan.
- Requests exceeding configured limits return
400 invalid_input.
Character and encoding rules
- Input must be valid UTF-8.
- Most Unicode characters, including emoji, are accepted.
- Invalid byte sequences return
400 invalid_input. - Rendering quality for emoji or uncommon symbols may vary by voice profile.
- Avoid sending control characters outside standard whitespace.
Standard whitespace includes tab (\t), newline (\n), and carriage return (\r). Other control characters, such as null bytes, may cause validation errors or unpredictable rendering behavior.
Example request (curl)
The request body must be valid JSON.
curl https://api.example.com/v1/voice/jobs \
-H "Authorization: Bearer <token>" \
-H "Idempotency-Key: 8b3f1c8e-3f2b-4ad2-9ef0-2a8c9b2a4f19" \
-H "Content-Type: application/json" \
-d '{
"input": {
"content": "Hello, world.",
"content_type": "text/plain"
},
"voice": {
"profile_id": "voice_en_us_001",
"style": "neutral",
"rate": 1.0
},
"output": {
"format": "pcm_s16le",
"sample_rate_hz": 24000
},
"client": {
"request_tag": "msg-001"
}
}'
Response
On success, the API returns 200 OK and a JSON body containing the created job identifier.
Response body
| Field | Type | Notes |
|---|---|---|
job_id | string | UUID (v4) identifying the created job |
Response headers
| Header | Meaning |
|---|---|
X-Quota-Limit | Total quota available for the current period |
X-Quota-Remaining | Remaining quota after this request |
X-Request-Id | Unique server-generated identifier for this request |
Example response
{
"job_id": "5b2a5b7a-6c2d-4d40-9e7b-1f25c08e6a1f"
}
Job lifecycle
Voice jobs are processed asynchronously. After creating a job, poll GET /v1/voice/jobs/{job_id} for status. Most jobs complete within a few seconds. Audio is available for retrieval once the job status returns completed.
Errors
Errors are returned using a consistent JSON envelope.
{
"error": {
"code": "string",
"message": "string",
"retryable": false
}
}
Error codes
| HTTP status | error.code | Meaning | Retry guidance |
|---|---|---|---|
| 400 | invalid_input | Request body is invalid (missing fields, wrong types, constraint violations) | Do not retry without correcting the request |
| 401 | unauthenticated | Missing or invalid access token | Do not retry until authentication is corrected |
| 402 | quota_exceeded | Insufficient quota to create the job | Do not retry until quota is restored |
| 403 | policy_blocked | Content rejected by policy | Do not retry with the same content |
| 409 | idempotency_conflict | Same Idempotency-Key reused with a different request body | Generate a new key for a new request |
| 429 | rate_limited | Too many requests in a short period | Retry with backoff and honor Retry-After if present |
| 503 | service_unavailable | Temporary service or upstream provider failure | Retry with exponential backoff |
Example error response
{
"error": {
"code": "rate_limited",
"message": "Too many requests. Please retry after a short delay.",
"retryable": true
}
}
Versioning and migration
This API uses URI versioning (/v1/...). Breaking changes are introduced in new major versions.
Minor, backward-compatible additions (such as new optional fields) do not require a version change.
Version 2 changes
Version 2 restructures the output schema to support future audio formats and extensibility.
In v1:
output.formatoutput.sample_rate_hz
In v2:
output.audio.formatoutput.audio.sample_rate_hzoutput.audio.channels(new, default:1)
The new structure allows the introduction of additional output types without breaking top-level schema.
No behavioral changes were made to authentication, idempotency semantics, or retry handling.
Migrating from v1 to v2
-
Update the endpoint path to
POST /v2/voice/jobs. -
Update the request body:
- Move
output.formattooutput.audio.format. - Move
output.sample_rate_hztooutput.audio.sample_rate_hz. - Optionally specify
output.audio.channels.
- Move
-
Re-run integration tests to confirm:
- Idempotency behavior remains unchanged.
- Retry handling for
429and503remains unchanged.
v1 remains available during the deprecation window to support gradual client migration.
Related endpoints
-
GET /v1/voice/jobs/{job_id}
Retrieve the current status and metadata for a voice job. -
GET /v1/voice/jobs/{job_id}/audio
Retrieve the rendered audio output for a completed job. -
GET /v1/quota
Retrieve current quota limits and remaining usage for the authenticated account.