Skip to content

Detect Caption API

Pricing

25 credits per request

Fixed cost regardless of video length or number of detected captions.

Overview

The Detect Caption API automatically detects and extracts text regions (captions/subtitles) from video frames using OCR (Optical Character Recognition) technology. This is useful for identifying existing captions in videos, content moderation, or text extraction workflows.

Endpoint

  • URL: POST https://api.revidapi.com/paid/detect-caption
  • Method: POST

Request

Headers

  • x-api-key: Required. Your API key for authentication.
  • Content-Type: Required. Must be application/json.

Body Parameters

Required Parameters

Parameter Type Description
video_url string (URI) URL of the video file to analyze

Optional Parameters

Parameter Type Description
sample_rate number Number of frames per second to analyze. Default: 1 (1 frame per second)
min_confidence number Minimum confidence score (0-1) for text detection. Default: 0.5
language string Language code for OCR (e.g., en, vi, auto). Default: auto
region object Specific region to analyze (x, y, width, height). If not provided, analyzes entire frame
output_format string Output format: json, srt, vtt. Default: json
webhook_url string (URI) URL to receive the result when processing is complete
id string Custom identifier for tracking the request

Region Object (Optional)

Parameter Type Description
x integer X coordinate of the region
y integer Y coordinate of the region
width integer Width of the region in pixels
height integer Height of the region in pixels

Example Request

{
  "video_url": "https://example.com/video.mp4",
  "sample_rate": 1,
  "min_confidence": 0.7,
  "language": "en",
  "output_format": "json",
  "webhook_url": "https://example.com/webhook",
  "id": "detect-caption-123"
}
curl -X POST "https://api.revidapi.com/paid/detect-caption" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "video_url": "https://example.com/video.mp4",
    "sample_rate": 1,
    "min_confidence": 0.7,
    "language": "en",
    "output_format": "json",
    "webhook_url": "https://example.com/webhook",
    "id": "detect-caption-123"
  }'

Response

Immediate Response (202 Accepted)

When a webhook URL is provided, the API returns an immediate acknowledgment with a task_id:

{
  "code": 202,
  "id": "detect-caption-123",
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "processing"
}

Success Response (via Webhook or Direct)

JSON Format Response

{
  "code": 200,
  "id": "detect-caption-123",
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "response": {
    "detections": [
      {
        "timestamp": 5.5,
        "text": "Hello World",
        "confidence": 0.95,
        "bbox": {
          "x": 100,
          "y": 400,
          "width": 200,
          "height": 50
        }
      },
      {
        "timestamp": 10.2,
        "text": "Welcome to RevidAPI",
        "confidence": 0.88,
        "bbox": {
          "x": 150,
          "y": 400,
          "width": 300,
          "height": 50
        }
      }
    ],
    "total_detections": 2,
    "output_url": "https://storage.example.com/results/detections.json"
  },
  "message": "success"
}

Error Responses

Invalid Request (400)

{
  "code": 400,
  "id": "detect-caption-123",
  "message": "Invalid request: 'video_url' is a required property"
}

Authentication Error (401)

{
  "code": 401,
  "message": "Invalid API key"
}

Workflow Recommendation

For asynchronous processing:

  1. Create Task: Send POST request to create the task
  2. Wait: Add a wait node (30-45 seconds) to allow server processing time
  3. Check Status: Use GET endpoint to check task status: GET https://api.revidapi.com/paid/get/job/status/{task_id}
  4. Retrieve Result: Once status is "completed", retrieve the detection results from the response

Usage Notes

  1. Fixed Pricing: This endpoint charges a fixed 25 credits per request, regardless of video length.
  2. Sample Rate: Lower sample rates (e.g., 0.5 fps) analyze fewer frames and process faster, but may miss some captions.
  3. Confidence Threshold: Adjust min_confidence to filter out low-quality detections.
  4. Language Detection: Use auto for automatic language detection, or specify language codes for better accuracy.
  5. Region Analysis: Specify a region to analyze only a specific area of the video (useful for fixed caption positions).

Common Issues

  1. Low Confidence: Captions with low contrast or small text may have lower confidence scores
  2. Processing Time: Higher sample rates increase processing time
  3. Video Format: Ensure video_url is accessible and in a supported format

Best Practices

  1. Use Webhooks: Always use webhooks for better reliability
  2. Unique IDs: Provide unique id values for tracking
  3. Sample Rate Tuning: Start with 1 fps and adjust based on your needs
  4. Confidence Filtering: Use appropriate confidence thresholds to balance detection rate and accuracy
  5. Region Specification: If captions appear in a fixed location, specify the region for faster processing