Get Your Documents Ready for Gen AI
Docling converts messy documents into structured data and simplifies downstream document and AI processing by detecting tables, formulas, reading order, OCR, and much more.
pip install docling
Why Choose Docling?
Advanced PDF Understanding
Detect page layout, reading order, table structure, code, formulas, and image classification with state-of-the-art models.
Learn more →Multiple Formats
Parse PDF, DOCX, PPTX, XLSX, HTML, Markdown, images, audio files, and more into a unified format.
Learn more →AI Integrations
Plug-and-play integrations with LangChain, LlamaIndex, Crew AI, Haystack, and other popular frameworks.
Learn more →Local Execution
Run everything locally for sensitive data and air-gapped environments. Your data stays private.
Learn more →Extensive OCR
Advanced OCR support for scanned PDFs and images with high accuracy text extraction.
Learn more →Simple CLI
Convert documents directly from your terminal with a simple, powerful command-line interface.
Learn more →Quick Start
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
doc = converter.convert(source).document
print(doc.export_to_markdown())
Supported Document Formats
And many more! See full list
Export Formats
Export your parsed documents to formats that simplify processing and ingestion into AI, RAG, and agentic systems.
Text
Plain text extraction
Markdown
Structured markdown with tables and formatting
HTML
Rich HTML output with styling
JSON
Lossless JSON representation
DocTags
Structured document tags format
Use Cases
Docling powers document processing across various industries and applications
Research & Academia
Process research papers, extract citations, analyze academic documents, and convert PDFs to structured formats for literature reviews and knowledge extraction.
Legal Document Processing
Extract key information from legal documents, contracts, and case files. Maintain document structure while enabling search and analysis.
Business Intelligence
Convert business reports, financial statements, and presentations into structured data for analysis, reporting, and decision-making.
Healthcare & Medical Records
Process medical documents, patient records, and research papers while maintaining privacy with local processing capabilities.
Content Management
Convert documents to Markdown or HTML for content management systems, documentation sites, and knowledge bases.
AI & Machine Learning
Prepare documents for RAG systems, fine-tuning datasets, and AI model training. Extract structured data for machine learning pipelines.
Powered by Advanced Technology
Heron Layout Model
State-of-the-art layout detection model for fast and accurate PDF parsing. Understands complex page structures, multi-column layouts, and reading order.
Visual Language Models
Support for VLMs like GraniteDocling with MLX acceleration on Apple Silicon. Enhanced understanding of document content through vision-language integration.
Advanced OCR
High-accuracy OCR for scanned documents and images. Supports multiple languages and character sets with intelligent text extraction.
Table Detection
Intelligent table extraction preserving cell relationships, headers, and data structure. Export tables in multiple formats for downstream processing.
Performance & Scalability
Fast Processing
Optimized pipelines for quick document conversion. Process documents in seconds, not minutes.
Batch Processing
Handle multiple documents efficiently with parallel processing support. Scale from single documents to large document collections.
Memory Efficient
Stream large documents without loading everything into memory. Process documents of any size efficiently.
Cross-Platform
Works seamlessly on macOS, Linux, and Windows. Supports both x86_64 and arm64 architectures including Apple Silicon.
Join the Community
Docling is part of the LF AI & Data Foundation and backed by IBM Research. Join thousands of developers building the future of document processing.
Open Source
MIT licensed and open source. Contribute, fork, and use freely in your projects.
Active Development
Regular updates with new features, improvements, and bug fixes. Active community support and contributions.
Enterprise Ready
Used by enterprises worldwide. Production-ready with comprehensive documentation and support.
Ready to Get Started?
Join thousands of developers using Docling to process documents for AI applications.