API Endpoint Access URL
https://api.pixlab.io/llmparse
Get Your API Key & Try LLM Parse Now ↗Description
The LLM Parse API converts documents into LLM-ready output for retrieval, summarization, RAG ingestion, search, and automation workflows. Send a document URL to https://api.pixlab.io/llmparse, choose an output format, and PixLab queues a parsing job that extracts the document into clean Markdown, structured JSON, or plain text.
The exposed endpoint is powered by the implemented docparse operation. It downloads the source document, runs document conversion with layout-aware parsing, and returns a jobId immediately. Poll the job result until parsing is completed.
- Parse PDF, DOCX, PPTX, XLSX, HTML, and other office-style documents into LLM-friendly content
- Export parsed documents as
md,json, ortext - Preserve useful document structure such as headings, reading order, tables, lists, and sections where possible
- Queue long-running document parsing jobs asynchronously and poll by
jobId - Use a simple SDK-free POST endpoint for backend services, automation workers, RAG pipelines, and document ingestion systems
- Reduce file-format noise before sending content to LLMs, vector databases, search indexes, or downstream analysis tools
For image analysis, use the QUERY, TAG-IMG, and DESCRIBE endpoints. For raw OCR extraction from images, use the OCR endpoint. For embedded image text translation, use Image Text Translation.
HTTP Methods
POST
HTTP Parameters
Required
| Fields | Type | Description |
|---|---|---|
key |
String | Your PixLab API Key ↗. You can also embed your key in the WWW-Authenticate: HTTP header and omit this parameter if you want to. |
url |
URL | Publicly reachable URL to the input document to parse. The backend also accepts downloadUrl as an alias. The document should be a PDF, DOCX, PPTX, XLSX, HTML, text, or another supported office/document format. |
Optional
| Fields | Type | Description |
|---|---|---|
format |
String | Desired output format. Supported values are md, json, and text. Defaults to md. |
extension |
String | File extension hint used by the parser when opening the downloaded document, for example pdf, docx, xlsx, pptx, or html. Defaults to pdf. |
POST Request Body
The exposed llm-parse endpoint accepts POST requests only. Submit a JSON body containing the input document URL and desired output format.
Allowed Content-Types:
application/json
Large documents are processed asynchronously. The initial response returns a jobId. Poll the job endpoint returned by your integration layer until the job status becomes completed or failed.
HTTP Response
application/json
The LLM Parse API starts an asynchronous document parsing job. A successful POST returns a queued job identifier immediately. Use the returned jobId to poll for completion. When the job is completed, the result contains the requested output format and parsed document data.
Accepted Job Response
{
"rc": true,
"status": "accepted",
"jobId": "doc_01hx9z3p9r6n6k2a",
"message": "Job queued. Poll /job/{jobId} for results."
}
Completed Job Result
{
"status": "completed",
"result": {
"format": "md",
"data": "# Parsed document\n\nClean LLM-ready Markdown output..."
}
}
| Fields | Type | Description |
|---|---|---|
rc |
Boolean | True when the parsing job was accepted. False when the request failed validation. |
status |
String | Initial response status is accepted. Job polling can return queued, processing, completed, or failed. |
jobId |
String | Identifier used to poll the document parsing job until completion. |
result.format |
String | Output format returned by the completed job: md, json, or text. |
result.data |
String | Object | Parsed document output. Markdown and text formats return strings; JSON format returns structured document data. |
result.error |
String | Error message when the job fails, for example if the document cannot be downloaded or exceeds the maximum allowed size. |
Code Samples
import time
import requests
API_KEY = "PIXLAB_API_KEY"
SUBMIT_URL = "https://api.pixlab.io/llmparse"
JOB_URL = "https://api.pixlab.io/job"
# Start an async LLM document parsing job.
submit = requests.post(
SUBMIT_URL,
json={
"key": API_KEY,
"url": "https://example.com/report.pdf",
"format": "md", # Optional: md, json, or text. Defaults to md.
"extension": "pdf" # Optional parser hint. Defaults to pdf.
},
timeout=60
)
job = submit.json()
if not job.get("rc"):
raise RuntimeError(job.get("err") or job.get("error") or "LLM parse job was not accepted")
job_id = job["jobId"]
print(f"Queued document parsing job: {job_id}")
# Poll until the parser completes.
while True:
status = requests.get(f"{JOB_URL}/{job_id}", params={"key": API_KEY}, timeout=30).json()
state = status.get("status")
if state == "completed":
result = status["result"]
print(result["format"])
print(result["data"])
break
if state == "failed":
raise RuntimeError(status.get("result", {}).get("error", "Document parsing failed"))
time.sleep(2)
const API_KEY = 'PIXLAB_API_KEY';
const SUBMIT_URL = 'https://api.pixlab.io/llmparse';
const JOB_URL = 'https://api.pixlab.io/job';
async function parseDocumentForLlm() {
const submitResponse = await fetch(SUBMIT_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
key: API_KEY,
url: 'https://example.com/report.pdf',
format: 'md', // Optional: md, json, or text. Defaults to md.
extension: 'pdf' // Optional parser hint. Defaults to pdf.
})
});
const job = await submitResponse.json();
if (!job.rc) {
throw new Error(job.err || job.error || 'LLM parse job was not accepted');
}
console.log(`Queued document parsing job: ${job.jobId}`);
while (true) {
const statusResponse = await fetch(`${JOB_URL}/${job.jobId}?key=${encodeURIComponent(API_KEY)}`);
const status = await statusResponse.json();
if (status.status === 'completed') {
console.log(status.result.format);
console.log(status.result.data);
return status.result;
}
if (status.status === 'failed') {
throw new Error(status.result?.error || 'Document parsing failed');
}
await new Promise(resolve => setTimeout(resolve, 2000));
}
}
parseDocumentForLlm().catch(console.error);
<?php
$apiKey = 'PIXLAB_API_KEY';
$submitUrl = 'https://api.pixlab.io/llmparse';
$jobUrl = 'https://api.pixlab.io/job';
$payload = json_encode([
'key' => $apiKey,
'url' => 'https://example.com/report.pdf',
'format' => 'md', // Optional: md, json, or text. Defaults to md.
'extension' => 'pdf' // Optional parser hint. Defaults to pdf.
]);
$ch = curl_init($submitUrl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']);
$response = curl_exec($ch);
curl_close($ch);
$job = json_decode($response, true);
if (empty($job['rc'])) {
throw new Exception($job['err'] ?? $job['error'] ?? 'LLM parse job was not accepted');
}
$jobId = $job['jobId'];
echo "Queued document parsing job: {$jobId}" . PHP_EOL;
while (true) {
$statusJson = file_get_contents($jobUrl . '/' . urlencode($jobId) . '?key=' . urlencode($apiKey));
$status = json_decode($statusJson, true);
if (($status['status'] ?? '') === 'completed') {
echo $status['result']['format'] . PHP_EOL;
echo is_string($status['result']['data']) ? $status['result']['data'] : json_encode($status['result']['data']);
break;
}
if (($status['status'] ?? '') === 'failed') {
throw new Exception($status['result']['error'] ?? 'Document parsing failed');
}
sleep(2);
}
Similar API Endpoints
tagimg, nsfw, docscan, image-embed, chat, llm-tools, answer, describe, text-embed, llm-tool-call, query, coder