Detect Caption API

Pricing

25 credits per request

Fixed cost regardless of video length or number of detected captions.

Overview

The Detect Caption API automatically detects and extracts text regions (captions/subtitles) from video frames using OCR (Optical Character Recognition) technology. This is useful for identifying existing captions in videos, content moderation, or text extraction workflows.

Endpoint

URL: POST https://api.revidapi.com/paid/detect-caption
Method: POST

Request

Headers

x-api-key: Required. Your API key for authentication.
Content-Type: Required. Must be application/json.

Body Parameters

Required Parameters

Parameter	Type	Description
`video_url`	string (URI)	URL of the video file to analyze

Optional Parameters

Parameter	Type	Description
`sample_rate`	number	Number of frames per second to analyze. Default: `1` (1 frame per second)
`min_confidence`	number	Minimum confidence score (0-1) for text detection. Default: `0.5`
`language`	string	Language code for OCR (e.g., `en`, `vi`, `auto`). Default: `auto`
`region`	object	Specific region to analyze (x, y, width, height). If not provided, analyzes entire frame
`output_format`	string	Output format: `json`, `srt`, `vtt`. Default: `json`
`webhook_url`	string (URI)	URL to receive the result when processing is complete
`id`	string	Custom identifier for tracking the request

Region Object (Optional)

Parameter	Type	Description
`x`	integer	X coordinate of the region
`y`	integer	Y coordinate of the region
`width`	integer	Width of the region in pixels
`height`	integer	Height of the region in pixels

Example Request

{
  "video_url": "https://example.com/video.mp4",
  "sample_rate": 1,
  "min_confidence": 0.7,
  "language": "en",
  "output_format": "json",
  "webhook_url": "https://example.com/webhook",
  "id": "detect-caption-123"
}

curl -X POST "https://api.revidapi.com/paid/detect-caption" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "video_url": "https://example.com/video.mp4",
    "sample_rate": 1,
    "min_confidence": 0.7,
    "language": "en",
    "output_format": "json",
    "webhook_url": "https://example.com/webhook",
    "id": "detect-caption-123"
  }'

Response

Immediate Response (202 Accepted)

When a webhook URL is provided, the API returns an immediate acknowledgment with a task_id:

{
  "code": 202,
  "id": "detect-caption-123",
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "processing"
}

Success Response (via Webhook or Direct)

JSON Format Response

{
  "code": 200,
  "id": "detect-caption-123",
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "response": {
    "detections": [
      {
        "timestamp": 5.5,
        "text": "Hello World",
        "confidence": 0.95,
        "bbox": {
          "x": 100,
          "y": 400,
          "width": 200,
          "height": 50
        }
      },
      {
        "timestamp": 10.2,
        "text": "Welcome to RevidAPI",
        "confidence": 0.88,
        "bbox": {
          "x": 150,
          "y": 400,
          "width": 300,
          "height": 50
        }
      }
    ],
    "total_detections": 2,
    "output_url": "https://storage.example.com/results/detections.json"
  },
  "message": "success"
}

Error Responses

Invalid Request (400)

{
  "code": 400,
  "id": "detect-caption-123",
  "message": "Invalid request: 'video_url' is a required property"
}

Authentication Error (401)

{
  "code": 401,
  "message": "Invalid API key"
}

Workflow Recommendation

For asynchronous processing:

Create Task: Send POST request to create the task
Wait: Add a wait node (30-45 seconds) to allow server processing time
Check Status: Use GET endpoint to check task status: GET https://api.revidapi.com/paid/get/job/status/{task_id}
Retrieve Result: Once status is "completed", retrieve the detection results from the response

Usage Notes

Fixed Pricing: This endpoint charges a fixed 25 credits per request, regardless of video length.
Sample Rate: Lower sample rates (e.g., 0.5 fps) analyze fewer frames and process faster, but may miss some captions.
Confidence Threshold: Adjust min_confidence to filter out low-quality detections.
Language Detection: Use auto for automatic language detection, or specify language codes for better accuracy.
Region Analysis: Specify a region to analyze only a specific area of the video (useful for fixed caption positions).

Common Issues

Low Confidence: Captions with low contrast or small text may have lower confidence scores
Processing Time: Higher sample rates increase processing time
Video Format: Ensure video_url is accessible and in a supported format

Best Practices

Use Webhooks: Always use webhooks for better reliability
Unique IDs: Provide unique id values for tracking
Sample Rate Tuning: Start with 1 fps and adjust based on your needs
Confidence Filtering: Use appropriate confidence thresholds to balance detection rate and accuracy
Region Specification: If captions appear in a fixed location, specify the region for faster processing