API Endpoint Access URL
https://api.pixlab.io/llmparse
Get Your API Key & Try LLM Document Parsing Now ↗Description
The LLM Parse API endpoint processes and parses documents across various formats, including DOCS, EXCEL, HTML and PDF, generating clean Markdown output (LLM friendly format) suitable for use with your preferred Large Language Model. This endpoint streamlines document processing and parsing, supporting formats from popular office applications such as DOCS, EXCEL, HTML, and PDF. It offers intelligent content understanding and delivers clean markdown output, eliminating the need for complex file format parsing and allowing your LLM to focus on core context. Key features of the LLM Parse API includes:
- Parsing of multiple document formats, including PDF, DOCX, PPTX, XLSX, HTML, and more
- Advanced PDF understanding, including page layout, reading order, table structure, code, formulas, image classification, and more
- Unified, expressive document representation format
- Default export to LLM-friendly formats including Markdown and JSON for easy analysis without the input format noise, such as unnecessary HTML tags and binary metadata
- SDK-Free REST API for easy plug-and-play integrations with external apps and libraries
- Extensive OCR support for scanned PDFs and images
- Simple and convenient SDK-Free REST API
- Metadata extraction, including title, authors, references & language
For image analysis, we recommend leveraging the PixLab APIs, such as the QUERY, TAG-IMG and DESCRIBE API endpoints, in addition to the comprehensive suite of Vision Language Models API endpoints.
HTTP Methods
GET, POST
HTTP Parameters
Required
Fields | Type | Description |
---|---|---|
doc |
URL | URL to the input document to be parsed. If you want to upload your document directly from your app, then submit a multipart/form-data POST request instead. Refer to the POST Request Data section below. The document must be in a supported format, such as office documents, including XLS, DOC, PDF, HTML, Text, JSON , etc. |
key |
String | Your PixLab API Key ↗. You can also embed your key in the WWW-Authenticate: HTTP header and omit this parameter if you want to. |
output |
String | Format | Desired LLM-friendly output format. Supported output formats as of this release are: Markdown (default), and JSON . |
Optional
Fields | Type | Description |
---|---|---|
max_tokens |
integer | The maximum number of output tokens to generate. Defaults to no limit. |
POST Request Body
This section outlines the requirements for POST requests when uploading your documents directly from your apps.
Allowed Content-Types:
multipart/form-data
The default MIME type when uploading your documents for parsing using the POST
method is! multipart/form-data
. See the REST API code samples or The PixLab Github Repository↗ for a working example.
HTTP Response
application/json
The default response format is the PixLab simple LLM response format which is unified across our vLM API endpoints, and is suitable for most applications that includes the bare minimum information including the LLM-friendly output text of the parsed document in your preferred format, tokens count, etc.
PixLab Simple vLM Response Format
{
"status": 200,
"id": "6783E34342",
"output": "LLM friendly output",
"format": "markdown or JSON",
"object": "llm-parse",
"created": 1694623155,
"model": "pix-llm",
"total_input_tokens": 2048,
"total_output_tokens": 1057,
}
Fields | Type | Description |
---|---|---|
status |
Integer | HTTP 200 indicates success. Any other code indicates failure. |
id |
Integer | random ID to identify the generated response output. |
output |
String | LLM-friendly text output either in markdown or JSON format of the parsed document. |
format |
String | Output format of the parsed document which is either in markdown or JSON . |
object |
String | Invoked vLM API endpoint. |
created |
Timestamp | Timestamp of generated output creation. |
model |
String | Underlying LLM model ID/Name. |
total_input_tokens |
Integer | total number of ingested tokens. |
total_output_tokens |
Integer | Total number of output tokens. |
error |
String | Error description when status != 200. |
Code Samples
# For a comprehensive list of production-ready code samples, please consult the PixLab Github Repository: https://github.com/symisc/pixlab.
// For a comprehensive list of production-ready code samples, please consult the PixLab Github Repository: https://github.com/symisc/pixlab.
<?php
# For a comprehensive list of production-ready code samples, please consult the PixLab Github Repository: https://github.com/symisc/pixlab.
Similar API Endpoints
tagimg, nsfw, docscan, image-embed, chat, llm-tools, answer, describe, text-embed, llm-tool-call, query, coder