OCR API Endpoint

Version 2.197 (Release Notes ↗)

Description

OCR API: Extract text from images or video frames with high accuracy. Supports multiple languages and offers specialized endpoints for document scanning and PDF conversion. Given an input image or video frame with human readable characters. Detect input language & extract text content from there. OCR stand for Optical Character Recognition and PixLab uses state-of-the-art processing algorithms so expect accurate results given a good quality image.
For a more specialized approach such as scanning government issued documents like passports or ID cards, the /docscan endpoint do perform such a task. If you are dealing with PDF documents, you can convert them at first to raw images via the /pdftoimg endpoint.

HTTP Methods

GET

HTTP Parameters

Required

Fields Type Description
img URL Input image URL. If you want to upload your image directly from your app, call store before and use the output link. Only JPEG, PNG & BMP file format are supported. convert is of particular help if you have a different image format.
key String Your PixLab API Key ↗.

Optional

Fields Type Description
lang String Input language code if known. Please do not set this field if you have no idea about the input language. The supported BCP-47 languages code as of this release are: en (English), de (German), ar (Arabic), he (Modern Hebrew), hi (Hindi), fr (French), cs (Czech), da (Danish), nl (Dutch), fi (Finnish), el (Greek), hu (Hungarian), it (Italian), Ja (Japanese), ko (Korean), nb (Norwegian), pl (Polish), pt (Portuguese), zh-Hans (Chinese Simplified), zh-Hant (ChineseTraditional), ru (Russian), es (Spanish), sv (Swedish), tr (Turkish)
orientation Boolean Detect and correct text orientation in the input image.
nl Boolean Output new line (\n) character on each detected line.
br Boolean Output HTML line break (</br>) on each detected line.

HTTP Response

application/json

This endpoint returns a JSON response. The following fields are included in the response body:
Fields Type Description
status Integer HTTP status code. 200 indicates success, any other code indicates failure.
output String Extracted text content from the input image.
lang String BCP-47 language code of detected text.
bbox Array Bounding box coordinates for each extracted word. Each array element contains:
{
word: Extracted text,
x: Top-left X coordinate,
y: Top-left Y coordinate,
w: Rectangle width,
h: Rectangle height
}

Use these coordinates with drawrectangles or crop endpoints. Available on Prod plan and higher tiers.
error String Error details when status ≠ 200.

Code Samples


import requests
from typing import Dict, Any


def extract_text_from_image(image_url: str, api_key: str) -> None:
    """Extract text from an image using PixLab OCR API."""
    params = {
        'img': image_url,
        'orientation': True,  # Correct text orientation
        'nl': True,  # Output new lines if any
        'key': api_key
    }
    
    try:
        response = requests.get('https://api.pixlab.io/ocr', params=params)
        response.raise_for_status()
        data: Dict[str, Any] = response.json()
        
        if data['status'] != 200:
            print(f"Error: {data['error']}")
            return
        
        print(f"Input language: {data['lang']}")
        print(f"Text Output: {data['output']}")
        
        for box in data['bbox']:
            print(f"Word: {box['word']}")
            print(f"Bounding box - X: {box['x']} Y: {box['y']} "
                  f"Width: {box['w']} Height: {box['h']}")
    
    except requests.exceptions.RequestException as e:
        print(f"API request failed: {e}")


if __name__ == "__main__":
    extract_text_from_image(
        image_url="http://quotesten.com/wp-content/uploads/2016/06/Confucius-Quote.jpg",
        api_key="PIXLAB_API_KEY"
    )
← Return to API Endpoint Listing