Skip to main content

Create a New Transcription Job

This endpoint allows you to create a new transcription job. You can provide the audio to be transcribed in one of two ways:

  1. Provide a URL: Submit a URL to a publicly accessible audio file. This is the standard method and is suitable for most use cases.
  2. Upload a file: Upload an audio file directly. This method supports files up to 2 GiB and is ideal for large files or local media.

Method 1: Creating a Job via URL

This method allows you to create a new transcription job by providing an audio file URL, a webhook callback URL, and optional configuration parameters.

Endpoint

POST https://api.speechischeap.com/v2/jobs/

Authentication

Requires authentication using a Bearer token in the HTTP header:

Authorization: Bearer YOUR_API_KEY

Request Body

The request body should be a JSON object with the following parameters:

Parameters

ParameterTypeDefaultDescription
input_urlstring-The URL of the audio file to transcribe (required). Must be between six seconds and 24 hours long.
can_label_audiobooleanfalseWhen enabled, includes an audio classification label in the transcription. See add-ons for pricing.
can_parse_speakersbooleanfalseWhen enabled, adds speaker_id to each segment based on the speaker's voice. See add-ons for pricing.
can_parse_wordsbooleanfalseWhen enabled, includes a timecode for every word in the transcription. See add-ons for pricing.
can_stream_outputbooleanfalseWhen enabled, streams the transcription results.
is_privatebooleanfalseWhen enabled, immediately redacts the original input_url and deletes the segments after 12 hours.
languagestring""Two-letter ISO 639-1 language code (e.g., "en"). If not provided, the language will be auto-detected from the first segment.
minimum_confidencenumber0.5Filter out segments that fall below this confidence threshold. Applies both to transcriptions and to non-speech audio labels when can_label_audio is true.
output_formatstring""Transcription output format: srt, vtt, or webvtt.
segment_durationnumber30Duration of each transcription segment in seconds. Must be between six and 30 seconds.
user_agentstring[internal]Custom user agent header to fetch the audio file.
webhook_urlstring""The URL where the transcription results may be sent to via a POST request.

Example Request

Using cURL to create a new async job

curl --request POST \
--url https://api.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook"
}'

Using cURL to create a new streaming job

curl --request POST \
--url https://api.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook",
"can_stream_output": true
}' \
--no-buffer

Including all optional parameters

{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook",
"can_label_audio": true,
"can_parse_speakers": true,
"can_parse_words": true,
"can_stream_output": true,
"is_private": true,
"language": "en",
"minimum_confidence": 0,
"output_format": "srt",
"segment_duration": 15,
"user_agent": "Mozilla/5.0"
}

Response

The API uses standard HTTP status codes to indicate the outcome of your request for non-streaming jobs.

Status CodeDescription
202 AcceptedThe job was successfully created and is being processed
400 Bad RequestMissing or invalid parameters
401 UnauthorizedMissing or invalid authentication token
500 Internal Server ErrorServer-side error occurred

New Async Job Response

When a new transcription job is created successfully without can_stream_output, you'll receive a 202 Accepted status code with the following response:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "PENDING"
}

New Streaming Job Response

When can_stream_output is true, the API will stream Server-Sent Events (SSE). Each event is a JSON object prefixed with data: and terminated by two newline characters.

data: <json_payload>

The JSON payload contains a type field that indicates the type of event. The following types are supported:

  • initial: Sent once at the beginning of the stream, containing initial job details.

    {
    "device": "cuda",
    "duration": 4321, // in seconds
    "request": {
    "input_url": "https://example.com/audio-file.mp3",
    "webhook_url": "https://your-domain.com/webhook",
    "can_stream_output": true,
    /* etc... */
    },
    "type": "initial"
    }
  • progress: Sent periodically to indicate the transcription progress.

    {
    "message": "Processing...",
    "processed_segments": 120,
    "total_segments": 360,
    "type": "progress"
    }
  • segment: Sent when a new audio segment has been transcribed.

    {
    "confidence": 0.987,
    "end": 12.345,
    "id": 1,
    "language": "en (99.95%)",
    "processing_duration_in_s": 0.321,
    "seek": 1234.5,
    "start": 1.234,
    "text": "This is an example of some transcribed text output.",
    "type": "segment",
    "words": null
    }
  • final: Sent once at the end of the stream to indicate completion.

    {
    "type": "final"
    }

Streaming SRT and VTT

If you specify an output_format of srt or vtt (webvtt is an alias for vtt), the stream will be in the respective format. Each line of the output is prefixed with data: .

Example SRT Stream:

data: 1
data: 00:00:00,000 --> 00:00:06,960
data: Hello and welcome to the Speech is Cheap docs
data:

data: 2
data: 00:00:06,960 --> 00:00:07,680
data: [music]
data:

Example VTT Stream:

data: WEBVTT
data:

data: 1
data: 00:00:00.000 --> 00:00:02.880
data: Hello world

data: 2
data: 00:00:03.360 --> 00:05:07.440
data: [thunder]

Method 2: Creating a Job via File Upload

This method allows you to create a new transcription job by uploading an audio file directly from your machine. This endpoint is a wrapper around the Jobs API that first handles the file upload and then makes a request to the main Jobs API on your behalf.

Endpoint

POST https://upload.speechischeap.com/v2/jobs/

Authentication

Requires authentication using a Bearer token in the HTTP header:

Authorization: Bearer YOUR_API_KEY

Request Body

The request body must be sent as multipart/form-data.

Parameters

All parameters from the URL-based method are supported. The main difference is the requirement of input_file instead of input_url.

ParameterTypeDefaultDescription
input_filefile-The audio file to transcribe (required). Must be under 2 GiB in size and between six seconds and 24 hours long.
can_stream_outputbooleanfalseWhen enabled, streams the transcription results after the file upload is complete.
webhook_urlstring""The URL where the transcription results may be sent to via a POST request.
All other parameters from the URL-based method are also accepted as form fields.

Example Request

curl --request POST \
--url https://upload.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--form 'input_file=@/path/to/your/audio-file.mp3' \
--form 'webhook_url=https://your-domain.com/webhook' \
--no-buffer

Response

The API uses standard HTTP status codes to indicate the outcome of your request.

Status CodeDescription
200 OKThe file upload is initiated
400 Bad RequestMissing or invalid parameters
401 UnauthorizedMissing or invalid authentication token
500 Internal Server ErrorServer-side error occurred

After the initial HTTP response, the upload endpoint always sends a Server-Sent Events (SSE) stream (text/event-stream). This stream provides real-time feedback on the file upload progress and, optionally, the transcription progress.

The stream begins with upload events indicating the file upload progress:

  • upload: Sent periodically during the file upload process. The initial event will have bytes_uploaded: 0, and subsequent events will show the progress.

    data: {"type":"upload","message":"Uploading...","bytes_uploaded":0}

    data: {"type":"upload","message":"Uploading...","bytes_uploaded":16384}

After the file upload is complete, the stream's behavior depends on the can_stream_output parameter:

  • If can_stream_output is false (default): The API will send a final event containing the standard asynchronous job response and then close the connection.

    {
    "id": "00000000-1111-7222-b333-444444444444-sic",
    "output": {},
    "status": "PENDING"
    }
  • If can_stream_output is true: The stream will remain open and continue with the same transcription events as the URL-based streaming method (initial, progress, segment, and final). Please see the New Streaming Job Response section under Method 1 for details on these events.


General Information

Error Responses

Error responses include a message explaining what went wrong. For example:

HTTP/1.1 400 Bad Request
Content-Type: text/plain

Missing required "input_url" parameter

Asynchronous Webhook Responses

If a webhook_url is provided, the following responses will be sent to that URL upon job completion.

Success

When the job completes successfully, the webhook will receive a completion response. If no output_format was specified, the response will be a JSON object:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"segments": [
{
"confidence": 0.987,
"end": 12.345,
"id": 1,
"language": "en (99.95%)",
"processing_duration_in_s": 0.321,
"seek": 1234.5,
"start": 1.234,
"text": "This is an example of some transcribed text output.",
"words": null
}
]
},
"status": "COMPLETED"
}

If an output_format was specified, the webhook will receive the transcription in that format with the corresponding Content-Type header (application/x-subrip for SRT, text/vtt for VTT).

Failure

If the job fails to complete, the webhook will receive the following error response:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"error": "Some error message"
},
"status": "FAILED"
}

Cancelation

If the job is canceled, the webhook will receive the following confirmation response:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "CANCELED"
}

Add-ons Responses

The output may include additional values depending on the add-ons used.

Parse Speakers

Returns one segment per speaker including the speaker_id string for each segment. May be empty ("") when used together with the Label Audio add-on if the segment has no speech:

{ "speaker_id": "A" }

Parse Words

Returns an array of words within each segment. Includes the start and end timestamps and text contents:

{
"words": [
{
"start": 2.345,
"end": 2.567,
"text": "hi"
}
]
}

Label Audio

Returns the audio classification label for each segment. In JSON output, this is a label field:

{ "label": "music" }

In SRT and VTT outputs, non-speech segments are represented with the label in square brackets, for example [music]. Silence is represented as an empty string.

Notes

  • The input_url audio file must be publicly accessible
  • For best results, omit all optional parameters and let the system auto-configure based on the input
  • When language is not provided, the system will auto-detect the audio language of each segment
  • Missing segments indicate that some of them fell below the minimum_confidence threshold
  • The confidence value in non-speech segments refers to the confidence of the audio classifier
  • Enabling is_private may limit our ability to troubleshoot since input_url will be masked
  • You are only charged for successfully completed transcriptions