Create a New Transcription Job
This endpoint allows you to create a new transcription job by providing an audio file URL, a webhook callback URL, and optional configuration parameters.
Endpoint
POST https://api.speechischeap.com/v2/jobs/
Authentication
Requires authentication using a Bearer token in the HTTP header:
Authorization: Bearer YOUR_API_KEY
Request Body
The request body should be a JSON object with the following parameters:
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
input_url | string | - | The URL of the audio file to transcribe (required). Must be between six seconds and 24 hours long. |
can_label_audio | boolean | false | When enabled, includes an audio classification label in the transcription. See add-ons for pricing. |
can_parse_speakers | boolean | false | When enabled, adds speaker_id to each segment based on the speaker's voice. See add-ons for pricing. |
can_parse_words | boolean | false | When enabled, includes a timecode for every word in the transcription. See add-ons for pricing. |
hotwords | string | "" | Specific words or phrases to help improve transcription accuracy. |
is_private | boolean | false | When enabled, redacts the original input_url for privacy. |
language | string | "" | Two-letter ISO 639-1 language code (e.g., "en"). If not provided, the language will be auto-detected from the first segment. |
minimum_confidence | number | 0.5 | Filter out segments that fall below this confidence threshold. Applies both to transcriptions and to non-speech audio labels when can_label_audio is true . |
prompt | string | "" | Custom prompt to adjust transcription style. Should match the audio language. |
segment_duration | number | 30 | Duration of each transcription segment in seconds. Must be between six and 30 seconds. |
user_agent | string | [internal] | Custom user agent header to fetch the audio file. |
webhook_url | string | "" | The URL where the transcription results may be sent to via a POST request. |
Example Request
Using cURL
curl --request POST \
--url https://api.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook"
}'
Including all optional parameters
{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook",
"can_label_audio": true,
"can_parse_speakers": true,
"can_parse_words": true,
"hotwords": "AI, ML, Neural Networks",
"language": "en",
"minimum_confidence": 0,
"prompt": "This is a technical discussion about artificial intelligence",
"segment_duration": 15
}
Response
Synchronous Job Responses
The API uses standard HTTP status codes to indicate the outcome of your request:
Status Code | Description |
---|---|
202 Accepted | The job was successfully created and is being processed |
400 Bad Request | Missing or invalid parameters |
401 Unauthorized | Invalid or missing authentication token |
500 Internal Server Error | Server-side error occurred |
New Job Response
When a new transcription job is created successfully, you'll receive a 202 Accepted status code with the following response:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "PENDING"
}
Error Responses
Error responses include a message explaining what went wrong. For example:
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Missing required "input_url" parameter
Asynchronous Webhook Responses
Success
When the job completes successfully, the webhook will receive a completion response similar to the following:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"segments": [
{
"confidence": 0.987,
"end": 12.345,
"id": 1,
"language": "en (99.95%)",
"processing_duration_in_s": 0.321,
"seek": 1234.5,
"start": 1.234,
"text": "This is an example of some transcribed text output.",
"words": null
}
]
},
"status": "COMPLETED"
}
Failure
If the job fails to complete, the webhook will receive the following error response:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"error": "Some error message"
},
"status": "FAILED"
}
Cancelation
If the job is canceled, the webhook will receive the following confirmation response:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "CANCELED"
}
Add-ons Responses
The output may include additional values depending on the add-ons used.
Parse Speakers
Returns one segment per speaker including the speaker_id
string for each segment. May be empty (""
) when used together with the Label Audio add-on if the segment has no speech:
{ "speaker_id": "A" }
Parse Words
Returns an array of words within each segment. Includes the start
and end
timestamps and text
contents:
{
"words": [
{
"start": 2.345,
"end": 2.567,
"text": "hi"
}
]
}
Label Audio
Returns the audio classification label for each segment:
{ "label": "music" }
Notes
- The
input_url
audio file must be publicly accessible - For best results, omit all optional parameters and let the system auto-configure based on the input
- When
language
is not provided, the system will auto-detect the audio language of each segment - Missing segments indicate that some of them fell below the
minimum_confidence
threshold - The
confidence
value in non-speech segments refers to the confidence of the audio classifier - Enabling
is_private
may limit our ability to troubleshoot sinceinput_url
will be masked - You are only charged for successfully completed transcriptions