Found 85 repositories(showing 30)
rednote-hilab
Multilingual Document Layout Parsing in a Single Vision-Language Model
FL33TW00D
No description available
akam-ot
Training and Fine-tuning code for DotsOCR
sljeff
Python client for dots.ocr with vLLM and Replicate backend support. Minimal deps, no file I/O.
wjbmattingly
No description available
muhammadsohaib60
Our project is based on one of the most important application of machine learning i.e. pattern recognition. Optical character recognition or optical character reader is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image. We are working on developing an OCR for URDU. We studied a couple of research papers related to our project. So far, we have found that Both Arabic and Urdu are written in Perso-Arabic script; at the written level, therefore, they share similarities. The styles of Arabic and Persian writing have a heavy influence on the Urdu script. There are 6 major styles for writing Arabic, Persian and Pashto as well. Urdu is written in Naskh writing style which is most famous of all. Optical character recognition (OCR) is the process of converting an image of text, such as a scanned paper document or electronic fax file, into computer-editable text [1]. The text in an image is not editable: the letters are made of tiny dots (pixels) that together form a picture of text. During OCR, the software analyzes an image and converts the pictures of the characters to editable text based on the patterns of the pixels in the image. After OCR, the converted text can be exported and used with a variety of word-processing, page layout and spreadsheet applications [2]. One of the main aims of OCR is to emulate the human ability to read at a much faster rate by associating symbolic identities with images of characters. Its potential applications include Screen Readers, Refreshable Braille Displays [3], reading customer filled forms, reading postal address off envelops, archiving and retrieving text etc. OCR’s ultimate goal is to develop a communication interface between the computer and its potential users. Urdu is the national language of Pakistan. It is a language that is understood by over 300 million people belonging to Pakistan, India and Bangladesh. Due to its historical database of literature, there is definitely a need to devise automatic systems for conversion of this literature into electronic form that may be accessible on the worldwide web. Although much work has been done in the field of OCR, Urdu and other languages using the Arabic script like Farsi, Urdu and Arabic, have received least attention. This is due in part to a lack of interest in the field and in part to the intricacies of the Arabic script. Owing to this state of indifference, there remains a huge amount of Urdu and Arabic literature unattended and rotting away on some old shelves. The proposed research aims to develop workable solutions to many of the problems faced in realization of an OCR designed specifically for Urdu Noori Nastaleeq Script, which is widely used in Urdu newspapers, governmental documents and books. The underlying processes first isolate and classify ligatures based on certain carefully chosen special, contour and statistical features and eventually recognize them with the aid of Feed-Forward Back Propagation Neural Networks. The input to the system is a monochrome bitmap image file of Urdu text written in Noori Nastaleeq and the output is the equivalent text converted to an editable text file.
tmzncty
这是一个基于 DotsOCR 库开发的 OCR(光学字符识别)处理工具箱,包含 PDF 转 Word (DOCX) 的完整应用。 本项目旨在提供简单易用的工具,帮助用户将 PDF 文档或图片转换为可编辑的 Word 文档或 Markdown 格式,支持复杂的版面分析(如表格、公式、图片等)。
PRITHIVSAKTHIUR
A Gradio-based demo application for comparing state-of-the-art OCR models: DeepSeek-OCR, Dots.OCR, HunyuanOCR, and Nanonets-OCR2-3B.
sandraschi
FastMCP server providing advanced OCR capabilities with current state-of-the-art models (DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, Qwen-Image-Layered decomposition), WIA scanner control, and multi-format document processing for PDFs, CBZ comics, and images.
thaoluon
No description available
yolain
A custom node for ComfyUI that provides text extraction via the DotsOCR engine.
Amir-Alipour
Run the dots.ocr AI model fully on CPU
ChaosAIs
A demo application connected with local dots.ocr service
jason-ni
Documents and publications of the Dots.OCR.Runner App
Thangtran27
Finetuning dots.ocr with LoRa and QLoRa
LuSrackhall
一个使用Pixi管理依赖的dots.ocr快速部署方案
chriscarrollsmith
Run the open-source dots.ocr OCR model as a Modal function
PRITHIVSAKTHIUR
This Gradio application demonstrates the capabilities of the "dots.ocr" model, a powerful multilingual document parser.
as49537023
dots_ocr
SilviyaDangol
simplified workflow for creating dataset for finetuning dotsocr
bariscelikW
dotsOCR and LLM integration for exam grading
manzolo
Dockerized OCR with vLLM - dots.ocr vision-language model for document text extraction
knightofcookies
No description available
neosun100
No description available
sljeff
No description available
CesarPetrescu
No description available
HassanMSh
An open-source tool to OCR Arabic scanned books and organize historical records into a searchable chronological database.
EEJQX
文档图像解析,集成mineru和dots.ocr,用于构建文档图像ocr数据集pipline
hbksilver
Create a workflow that: Reads the 6th page of "Session 11 - exercise 2 - UiPathOrchestratorAzureInstallationGuide2016.1" (in the attachment) Reads the 2nd page of "Session 11 - exercise 2 - ScannedDoc" (in the attachment) Sends an email with both documents attached and with the text from point 1 and text from point 2 as Body. Practice 2 - Walkthrough The first PDF document we are trying to extract text from is a native pdf, (the text is selectable, the doc is created from a Word document probably, etc) so we will use the dedicated activity for this case, Read PDF Text. We click on the dots next to File Name property and browse to the location we have the Orchestrator Installation Guide document saved at. We replace the Range property value from “All” to 6, in order to read just the page 6 from the document. We create a variable in the Output, to save the retrieved text for later. The second pdf document we are trying to extract from is a scanned pdf (everything is a picture, you cannot select any element, etc) so we will use the dedicated activity for this case, Read PDF With OCR. @e click on the dots next to File Name property and browse to the location we have the “Scanned.pdf” document saved at. We replace the Range property value from “All” to 6, in order to read just the page 6 from the document. We create a variable in the Output, to save the retrieved text for later. We add a Send Outlook Mail Message (can be any Send XXX Mail Message activity)set the To property. Set the Subject property. Set the Body property to the concatenated variables that keep the text pieces read from the two documents installationPDFText + invoicePDFText. To open the workflow with UiPath Studio, follow these steps: Click on the Download button below and save the archive on your local machine Extract the files Open UiPath Studio From the Start tab, click Open Browse for the extracted files Click on the xaml fileFrom the Design tab, hit Run
babara6666
PDF ocr with dots.ocr