Create a New Transcription Job
This endpoint allows you to create a new transcription job. You can provide the audio to be transcribed in one of two ways:
- Provide a URL: Submit a URL to a publicly accessible audio file. This is the standard method and is suitable for most use cases.
- Upload a file: Upload an audio file directly. This method supports files up to 2 GiB and is ideal for large files or local media.
Method 1: Creating a Job via URL
This method allows you to create a new transcription job by providing an audio file URL, a webhook callback URL, and optional configuration parameters.
Endpoint
POST https://api.speechischeap.com/v2/jobs/
Authentication
Requires authentication using a Bearer token in the HTTP header:
Authorization: Bearer YOUR_API_KEY
Request Body
The request body should be a JSON object with the following parameters:
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
input_url | string | - | The URL of the audio file to transcribe (required). Must be between six seconds and 24 hours long. |
can_label_audio | boolean | false | When enabled, includes an audio classification label in the transcription. See add-ons for pricing. |
can_parse_speakers | boolean | false | When enabled, adds speaker_id to each segment based on the speaker's voice. See add-ons for pricing. |
can_parse_words | boolean | false | When enabled, includes a timecode for every word in the transcription. See add-ons for pricing. |
can_stream_output | boolean | false | When enabled, streams the transcription results. |
is_private | boolean | false | When enabled, immediately redacts the original input_url and deletes the segments after 12 hours. |
language | string | "" | Two-letter ISO 639-1 language code (e.g., "en"). If not provided, the language will be auto-detected from the first segment. |
minimum_confidence | number | 0.5 | Filter out segments that fall below this confidence threshold. Applies both to transcriptions and to non-speech audio labels when can_label_audio is true. |
output_format | string | "" | Transcription output format: srt, vtt, or webvtt. |
segment_duration | number | 30 | Duration of each transcription segment in seconds. Must be between six and 30 seconds. |
user_agent | string | [internal] | Custom user agent header to fetch the audio file. |
webhook_url | string | "" | The URL where the transcription results may be sent to via a POST request. |
Example Request
Using cURL to create a new async job
curl --request POST \
--url https://api.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook"
}'
Using cURL to create a new streaming job
curl --request POST \
--url https://api.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook",
"can_stream_output": true
}' \
--no-buffer
Including all optional parameters
{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook",
"can_label_audio": true,
"can_parse_speakers": true,
"can_parse_words": true,
"can_stream_output": true,
"is_private": true,
"language": "en",
"minimum_confidence": 0,
"output_format": "srt",
"segment_duration": 15,
"user_agent": "Mozilla/5.0"
}
Response
The API uses standard HTTP status codes to indicate the outcome of your request for non-streaming jobs.
| Status Code | Description |
|---|---|
| 202 Accepted | The job was successfully created and is being processed |
| 400 Bad Request | Missing or invalid parameters |
| 401 Unauthorized | Missing or invalid authentication token |
| 500 Internal Server Error | Server-side error occurred |
New Async Job Response
When a new transcription job is created successfully without can_stream_output, you'll receive a 202 Accepted status code with the following response:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "PENDING"
}
New Streaming Job Response
When can_stream_output is true, the API will stream Server-Sent Events (SSE). Each event is a JSON object prefixed with data: and terminated by two newline characters.
data: <json_payload>
The JSON payload contains a type field that indicates the type of event. The following types are supported:
-
initial: Sent once at the beginning of the stream, containing initial job details.{
"device": "cuda",
"duration": 4321, // in seconds
"request": {
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook",
"can_stream_output": true,
/* etc... */
},
"type": "initial"
} -
progress: Sent periodically to indicate the transcription progress.{
"message": "Processing...",
"processed_segments": 120,
"total_segments": 360,
"type": "progress"
} -
segment: Sent when a new audio segment has been transcribed.{
"confidence": 0.987,
"end": 12.345,
"id": 1,
"language": "en (99.95%)",
"processing_duration_in_s": 0.321,
"seek": 1234.5,
"start": 1.234,
"text": "This is an example of some transcribed text output.",
"type": "segment",
"words": null
} -
final: Sent once at the end of the stream to indicate completion.{
"type": "final"
}
Streaming SRT and VTT
If you specify an output_format of srt or vtt (webvtt is an alias for vtt), the stream will be in the respective format. Each line of the output is prefixed with data: .
Example SRT Stream:
data: 1
data: 00:00:00,000 --> 00:00:06,960
data: Hello and welcome to the Speech is Cheap docs
data:
data: 2
data: 00:00:06,960 --> 00:00:07,680
data: [music]
data:
Example VTT Stream:
data: WEBVTT
data:
data: 1
data: 00:00:00.000 --> 00:00:02.880
data: Hello world
data: 2
data: 00:00:03.360 --> 00:05:07.440
data: [thunder]
Method 2: Creating a Job via File Upload
This method allows you to create a new transcription job by uploading an audio file directly from your machine. This endpoint is a wrapper around the Jobs API that first handles the file upload and then makes a request to the main Jobs API on your behalf.
Endpoint
POST https://upload.speechischeap.com/v2/jobs/
Authentication
Requires authentication using a Bearer token in the HTTP header:
Authorization: Bearer YOUR_API_KEY
Request Body
The request body must be sent as multipart/form-data.
Parameters
All parameters from the URL-based method are supported. The main difference is the requirement of input_file instead of input_url.
| Parameter | Type | Default | Description |
|---|---|---|---|
input_file | file | - | The audio file to transcribe (required). Must be under 2 GiB in size and between six seconds and 24 hours long. |
can_stream_output | boolean | false | When enabled, streams the transcription results after the file upload is complete. |
webhook_url | string | "" | The URL where the transcription results may be sent to via a POST request. |
… | … | … | All other parameters from the URL-based method are also accepted as form fields. |
Example Request
curl --request POST \
--url https://upload.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--form 'input_file=@/path/to/your/audio-file.mp3' \
--form 'webhook_url=https://your-domain.com/webhook' \
--no-buffer
Response
The API uses standard HTTP status codes to indicate the outcome of your request.
| Status Code | Description |
|---|---|
| 200 OK | The file upload is initiated |
| 400 Bad Request | Missing or invalid parameters |
| 401 Unauthorized | Missing or invalid authentication token |
| 500 Internal Server Error | Server-side error occurred |
After the initial HTTP response, the upload endpoint always sends a Server-Sent Events (SSE) stream (text/event-stream). This stream provides real-time feedback on the file upload progress and, optionally, the transcription progress.
The stream begins with upload events indicating the file upload progress:
-
upload: Sent periodically during the file upload process. The initial event will havebytes_uploaded: 0, and subsequent events will show the progress.data: {"type":"upload","message":"Uploading...","bytes_uploaded":0}
data: {"type":"upload","message":"Uploading...","bytes_uploaded":16384}
After the file upload is complete, the stream's behavior depends on the can_stream_output parameter:
-
If
can_stream_outputisfalse(default): The API will send a final event containing the standard asynchronous job response and then close the connection.{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "PENDING"
} -
If
can_stream_outputistrue: The stream will remain open and continue with the same transcription events as the URL-based streaming method (initial,progress,segment, andfinal). Please see the New Streaming Job Response section under Method 1 for details on these events.
General Information
Error Responses
Error responses include a message explaining what went wrong. For example:
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Missing required "input_url" parameter
Asynchronous Webhook Responses
If a webhook_url is provided, the following responses will be sent to that URL upon job completion.
Success
When the job completes successfully, the webhook will receive a completion response. If no output_format was specified, the response will be a JSON object:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"segments": [
{
"confidence": 0.987,
"end": 12.345,
"id": 1,
"language": "en (99.95%)",
"processing_duration_in_s": 0.321,
"seek": 1234.5,
"start": 1.234,
"text": "This is an example of some transcribed text output.",
"words": null
}
]
},
"status": "COMPLETED"
}
If an output_format was specified, the webhook will receive the transcription in that format with the corresponding Content-Type header (application/x-subrip for SRT, text/vtt for VTT).
Failure
If the job fails to complete, the webhook will receive the following error response:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"error": "Some error message"
},
"status": "FAILED"
}
Cancelation
If the job is canceled, the webhook will receive the following confirmation response:
{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "CANCELED"
}
Add-ons Responses
The output may include additional values depending on the add-ons used.
Parse Speakers
Returns one segment per speaker including the speaker_id string for each segment. May be empty ("") when used together with the Label Audio add-on if the segment has no speech:
{ "speaker_id": "A" }
Parse Words
Returns an array of words within each segment. Includes the start and end timestamps and text contents:
{
"words": [
{
"start": 2.345,
"end": 2.567,
"text": "hi"
}
]
}
Label Audio
Returns the audio classification label for each segment. In JSON output, this is a label field:
{ "label": "music" }
In SRT and VTT outputs, non-speech segments are represented with the label in square brackets, for example [music]. Silence is represented as an empty string.
Notes
- The
input_urlaudio file must be publicly accessible - For best results, omit all optional parameters and let the system auto-configure based on the input
- When
languageis not provided, the system will auto-detect the audio language of each segment - Missing segments indicate that some of them fell below the
minimum_confidencethreshold - The
confidencevalue in non-speech segments refers to the confidence of the audio classifier - Enabling
is_privatemay limit our ability to troubleshoot sinceinput_urlwill be masked - You are only charged for successfully completed transcriptions