Detect Caption API
Pricing
25 credits per request
Fixed cost regardless of video length or number of detected captions.
Overview
The Detect Caption API automatically detects and extracts text regions (captions/subtitles) from video frames using OCR (Optical Character Recognition) technology. This is useful for identifying existing captions in videos, content moderation, or text extraction workflows.
Endpoint
- URL:
POST https://api.revidapi.com/paid/detect-caption - Method:
POST
Request
Headers
x-api-key: Required. Your API key for authentication.Content-Type: Required. Must beapplication/json.
Body Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
video_url |
string (URI) | URL of the video file to analyze |
Optional Parameters
| Parameter | Type | Description |
|---|---|---|
sample_rate |
number | Number of frames per second to analyze. Default: 1 (1 frame per second) |
min_confidence |
number | Minimum confidence score (0-1) for text detection. Default: 0.5 |
language |
string | Language code for OCR (e.g., en, vi, auto). Default: auto |
region |
object | Specific region to analyze (x, y, width, height). If not provided, analyzes entire frame |
output_format |
string | Output format: json, srt, vtt. Default: json |
webhook_url |
string (URI) | URL to receive the result when processing is complete |
id |
string | Custom identifier for tracking the request |
Region Object (Optional)
| Parameter | Type | Description |
|---|---|---|
x |
integer | X coordinate of the region |
y |
integer | Y coordinate of the region |
width |
integer | Width of the region in pixels |
height |
integer | Height of the region in pixels |
Example Request
{
"video_url": "https://example.com/video.mp4",
"sample_rate": 1,
"min_confidence": 0.7,
"language": "en",
"output_format": "json",
"webhook_url": "https://example.com/webhook",
"id": "detect-caption-123"
}
curl -X POST "https://api.revidapi.com/paid/detect-caption" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"video_url": "https://example.com/video.mp4",
"sample_rate": 1,
"min_confidence": 0.7,
"language": "en",
"output_format": "json",
"webhook_url": "https://example.com/webhook",
"id": "detect-caption-123"
}'
Response
Immediate Response (202 Accepted)
When a webhook URL is provided, the API returns an immediate acknowledgment with a task_id:
{
"code": 202,
"id": "detect-caption-123",
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "processing"
}
Success Response (via Webhook or Direct)
JSON Format Response
{
"code": 200,
"id": "detect-caption-123",
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"response": {
"detections": [
{
"timestamp": 5.5,
"text": "Hello World",
"confidence": 0.95,
"bbox": {
"x": 100,
"y": 400,
"width": 200,
"height": 50
}
},
{
"timestamp": 10.2,
"text": "Welcome to RevidAPI",
"confidence": 0.88,
"bbox": {
"x": 150,
"y": 400,
"width": 300,
"height": 50
}
}
],
"total_detections": 2,
"output_url": "https://storage.example.com/results/detections.json"
},
"message": "success"
}
Error Responses
Invalid Request (400)
{
"code": 400,
"id": "detect-caption-123",
"message": "Invalid request: 'video_url' is a required property"
}
Authentication Error (401)
{
"code": 401,
"message": "Invalid API key"
}
Workflow Recommendation
For asynchronous processing:
- Create Task: Send POST request to create the task
- Wait: Add a wait node (30-45 seconds) to allow server processing time
- Check Status: Use GET endpoint to check task status:
GET https://api.revidapi.com/paid/get/job/status/{task_id} - Retrieve Result: Once status is "completed", retrieve the detection results from the response
Usage Notes
- Fixed Pricing: This endpoint charges a fixed 25 credits per request, regardless of video length.
- Sample Rate: Lower sample rates (e.g., 0.5 fps) analyze fewer frames and process faster, but may miss some captions.
- Confidence Threshold: Adjust
min_confidenceto filter out low-quality detections. - Language Detection: Use
autofor automatic language detection, or specify language codes for better accuracy. - Region Analysis: Specify a region to analyze only a specific area of the video (useful for fixed caption positions).
Common Issues
- Low Confidence: Captions with low contrast or small text may have lower confidence scores
- Processing Time: Higher sample rates increase processing time
- Video Format: Ensure video_url is accessible and in a supported format
Best Practices
- Use Webhooks: Always use webhooks for better reliability
- Unique IDs: Provide unique
idvalues for tracking - Sample Rate Tuning: Start with 1 fps and adjust based on your needs
- Confidence Filtering: Use appropriate confidence thresholds to balance detection rate and accuracy
- Region Specification: If captions appear in a fixed location, specify the region for faster processing