Skip to main content

Create a New Transcription Job

This endpoint allows you to create a new transcription job by providing an audio file URL, a webhook callback URL, and optional configuration parameters.

Endpoint

POST https://api.speechischeap.com/v2/jobs/

Authentication

Requires authentication using a Bearer token in the HTTP header:

Authorization: Bearer YOUR_API_KEY

Request Body

The request body should be a JSON object with the following parameters:

Parameters

ParameterTypeDefaultDescription
input_urlstring-The URL of the audio file to transcribe (required). Must be between six seconds and 24 hours long.
can_label_audiobooleanfalseWhen enabled, includes an audio classification label in the transcription. See add-ons for pricing.
can_parse_speakersbooleanfalseWhen enabled, adds speaker_id to each segment based on the speaker's voice. See add-ons for pricing.
can_parse_wordsbooleanfalseWhen enabled, includes a timecode for every word in the transcription. See add-ons for pricing.
hotwordsstring""Specific words or phrases to help improve transcription accuracy.
is_privatebooleanfalseWhen enabled, redacts the original input_url for privacy.
languagestring""Two-letter ISO 639-1 language code (e.g., "en"). If not provided, the language will be auto-detected from the first segment.
minimum_confidencenumber0.5Filter out segments that fall below this confidence threshold. Applies both to transcriptions and to non-speech audio labels when can_label_audio is true.
promptstring""Custom prompt to adjust transcription style. Should match the audio language.
segment_durationnumber30Duration of each transcription segment in seconds. Must be between six and 30 seconds.
user_agentstring[internal]Custom user agent header to fetch the audio file.
webhook_urlstring""The URL where the transcription results may be sent to via a POST request.

Example Request

Using cURL

curl --request POST \
--url https://api.speechischeap.com/v2/jobs/ \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook"
}'

Including all optional parameters

{
"input_url": "https://example.com/audio-file.mp3",
"webhook_url": "https://your-domain.com/webhook",
"can_label_audio": true,
"can_parse_speakers": true,
"can_parse_words": true,
"hotwords": "AI, ML, Neural Networks",
"language": "en",
"minimum_confidence": 0,
"prompt": "This is a technical discussion about artificial intelligence",
"segment_duration": 15
}

Response

Synchronous Job Responses

The API uses standard HTTP status codes to indicate the outcome of your request:

Status CodeDescription
202 AcceptedThe job was successfully created and is being processed
400 Bad RequestMissing or invalid parameters
401 UnauthorizedInvalid or missing authentication token
500 Internal Server ErrorServer-side error occurred

New Job Response

When a new transcription job is created successfully, you'll receive a 202 Accepted status code with the following response:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "PENDING"
}

Error Responses

Error responses include a message explaining what went wrong. For example:

HTTP/1.1 400 Bad Request
Content-Type: text/plain

Missing required "input_url" parameter

Asynchronous Webhook Responses

Success

When the job completes successfully, the webhook will receive a completion response similar to the following:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"segments": [
{
"confidence": 0.987,
"end": 12.345,
"id": 1,
"language": "en (99.95%)",
"processing_duration_in_s": 0.321,
"seek": 1234.5,
"start": 1.234,
"text": "This is an example of some transcribed text output.",
"words": null
}
]
},
"status": "COMPLETED"
}

Failure

If the job fails to complete, the webhook will receive the following error response:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {
"error": "Some error message"
},
"status": "FAILED"
}

Cancelation

If the job is canceled, the webhook will receive the following confirmation response:

{
"id": "00000000-1111-7222-b333-444444444444-sic",
"output": {},
"status": "CANCELED"
}

Add-ons Responses

The output may include additional values depending on the add-ons used.

Parse Speakers

Returns one segment per speaker including the speaker_id string for each segment. May be empty ("") when used together with the Label Audio add-on if the segment has no speech:

{ "speaker_id": "A" }

Parse Words

Returns an array of words within each segment. Includes the start and end timestamps and text contents:

{
"words": [
{
"start": 2.345,
"end": 2.567,
"text": "hi"
}
]
}

Label Audio

Returns the audio classification label for each segment:

{ "label": "music" }

Notes

  • The input_url audio file must be publicly accessible
  • For best results, omit all optional parameters and let the system auto-configure based on the input
  • When language is not provided, the system will auto-detect the audio language of each segment
  • Missing segments indicate that some of them fell below the minimum_confidence threshold
  • The confidence value in non-speech segments refers to the confidence of the audio classifier
  • Enabling is_private may limit our ability to troubleshoot since input_url will be masked
  • You are only charged for successfully completed transcriptions