Surya is an OCR and document layout analysis toolkit developed by Vikram Paruchuri and distributed via Mindee, first released in January 2024 under the GPL-3.0 license. It supports text recognition across more than 90 languages, document layout detection, reading order prediction, table recognition, and equation detection, providing a comprehensive set of tools for extracting structured information from document images.
Surya is designed to operate without cloud API dependencies, running fully on local hardware with support for CPU and GPU inference. It is commonly used for digitizing scanned documents, extracting text from PDFs with complex layouts, and building automated document processing pipelines.
Other models worth comparing for similar use cases.
License terms and commercial-use guidance for Surya.
License information is provided as a guide and is not legal advice.