Found 238 repositories(showing 30)
NanoNets
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
Topdu
OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful reproductions of the core implementations from a wide range of academic papers.
getomni-ai
OCR Benchmark
NiceRingNode
[arXiv 25] OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities
SCUT-DLVCLab
[ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
mbzuai-oryx
[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
video-db
Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments
EKYCSolutions
A standardized benchmark dataset for Khmer Optical Character Recognition (OCR) engine.
Studio-Intrinsic
No description available
datalab-to
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
ayushbits
Source and Data of our EMNLP Paper 'A Benchmark and Dataset for Post-OCR text correction in Sanskrit'
andyhuo520
OCR Benchmark: GLM-OCR vs PaddleOCR-VL-1.5 on OmniDocBench — comprehensive evaluation across text, tables, formulas, and handwriting
soduco
All the material (paper, code, dataset, results) of our DAS 2022 paper (OCR+NER benchmark)
In this project, we implemented an Korean optical character regocnition(OCR) algorithm that can detect and recognize Korean text in images such as signboards, book covers and etc. using the pre-trained model of deep text recognition benchmark provided by Clova AI. We used the data of the DACON SW-central university joint AI contest and AI Hub.
Hegghammer
Replication materials for the article "OCR with Tesseract, Amazon Textract, and Google Document AI: A Benchmarking Experiment"
hgbrian
a small OCR benchmark for biological sequences
MahmoudSalah
KORIE: A Multi-Task Benchmark for Detection, OCR, and IE on Korean Retail Receipts
errajibadr
No description available
axeld5
From Text Dataset to OCR Dataset / Benchmark
A9T9
OCR benchmark test images (English, Chinese, Number detection)
Surya-OCR-Hardware-Benchmarking is a repository dedicated to evaluating and analyzing the performance of the Surya OCR model across different hardware configurations. It provides tools and scripts for benchmarking GPU and CPU performance with various batch sizes, aiming to optimize OCR tasks for efficiency and speed in real-world applications.
This Kannada OCR benchmarking dataset contains 250 images, carefully chosen to have various kinds of recognition challenges. Some of the pages have italics and bold characters. Some of them have Halegannada poems and text; others are letterpress-printed pages, where the vowel modifiers appear as separate symbols and do not touch the consonants they go with. Some pages have interspersed English words; still others have tables with a lot of numeric data. In addition, there are old pages containing either a lot of broken characters or many words with two or more characters merged into a single connected component.
johnidm
This repository includes the results of my latest OCR study
andererka
Code repository for the paper: 'MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features'
shijincai
The industry's first "Open Source OCR Arena," a free, no-login utility for one-click benchmarking of 7 top-tier models (Marker, MinerU, MonkeyOCR, Docling, Dolphin, OCRFlux, PP-StructureV3) on your PDF/image files, specializing in PDF-to-Markdown conversion.
Aryia-Behroziuan
The classical problem in computer vision, image processing, and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. Different varieties of the recognition problem are described in the literature:[citation needed] Object recognition (also called object classification) – one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. Blippar, Google Goggles and LikeThat provide stand-alone programs that illustrate this functionality. Identification – an individual instance of an object is recognized. Examples include identification of a specific person's face or fingerprint, identification of handwritten digits, or identification of a specific vehicle. Detection – the image data are scanned for a specific condition. Examples include detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation. Currently, the best algorithms for such tasks are based on convolutional neural networks. An illustration of their capabilities is given by the ImageNet Large Scale Visual Recognition Challenge; this is a benchmark in object classification and detection, with millions of images and 1000 object classes used in the competition.[29] Performance of convolutional neural networks on the ImageNet tests is now close to that of humans.[29] The best algorithms still struggle with objects that are small or thin, such as a small ant on a stem of a flower or a person holding a quill in their hand. They also have trouble with images that have been distorted with filters (an increasingly common phenomenon with modern digital cameras). By contrast, those kinds of images rarely trouble humans. Humans, however, tend to have trouble with other issues. For example, they are not good at classifying objects into fine-grained classes, such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this with ease[citation needed]. Several specialized tasks based on recognition exist, such as: Content-based image retrieval – finding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image (give me all images similar to image X), or in terms of high-level search criteria given as text input (give me all images which contain many houses, are taken during winter, and have no cars in them). Computer vision for people counter purposes in public places, malls, shopping centres Pose estimation – estimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation or picking parts from a bin. Optical character recognition (OCR) – identifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII). 2D code reading – reading of 2D codes such as data matrix and QR codes. Facial recognition Shape Recognition Technology (SRT) in people counter systems differentiating human beings (head and shoulder patterns) from objects
FastAccounting
OCR correction benchmark
OCR-D
Benchmarking OCR-D workflows in Docker
gitwzl
Menu OCR and translation evaluation benchmark for VLLMs
taggun
Desktop app to perform tests and benchmark for receipt OCR