What license does GLM-OCR use?

This model is released under the MIT License, a short and permissive open-source license that allows commercial use, modification, and redistribution.

Can I use GLM-OCR commercially?

Yes. Under the terms of the MIT license, you can freely use this model for commercial purposes. You must retain the copyright notice and license text when redistributing.

GLM-OCR – Try, Compare & Deploy

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder-decoder architecture by Zhipu AI. The model combines a 0.4B-parameter CogViT visual encoder pre-trained on large-scale image-text data, a lightweight cross-modal connector with efficient token downsampling, and a 0.5B-parameter GLM language decoder, totaling 0.9B parameters. To address the inefficiency of standard autoregressive decoding in deterministic OCR tasks, GLM-OCR introduces a Multi-Token Prediction (MTP) mechanism that predicts multiple tokens per step, significantly improving decoding throughput while keeping memory overhead low through shared parameters. Training proceeds through four stages: visual encoder pretraining with MIM, CLIP, and distillation objectives; vision-language pretraining on document parsing, grounding, and VQA data; supervised fine-tuning on curated OCR datasets covering text, formula, table, and key information extraction; and full-task reinforcement learning to improve accuracy and structural consistency.

At the system level, GLM-OCR adopts a two-stage pipeline in which PP-DocLayout-V3 first performs layout analysis, followed by parallel region-level recognition. This design enables robust handling of diverse document layouts including tables, formulas, and multi-column text. The model supports document parsing and targeted recognition tasks, producing structured outputs in Markdown, JSON, and LaTeX formats across more than 100 languages. On the OmniDocBench V1.5 benchmark, GLM-OCR scores 94.62, and achieves 94.0 on OCRBench and 96.5 on UniMERNet for formula recognition.

Category	Passed	Score
Handwritten Math	10 / 10	100%
Text Recognition	27 / 30	90%
License Plate Recognition	27 / 30	90%
Focused Scene OCR	87 / 99	87.9%
VQA & Extraction	49 / 60	81.7%

Z.ai: GLM-OCR

GLM-OCR Overview

GLM-OCR Interactive Demo

Upload an image

GLM-OCR Details & Performance

Details

Resources

Vision Tasks

Features

Performance

Arena Rankings

GLM-OCR Vision Evals

GLM-OCR License