Found 125 repositories(showing 30)
Dicklesworthstone
Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs
ARahim3
Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.
yigitkonur
PDF to markdown using vision LLMs — tables, layouts, and structure preserved
abgulati
Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-script based invocation make it difficult to use for application development. Here, it has been containerized and made available via an API, greatly enhancing its ease-of-use.
samestrin
A Python-based REST API for PDF OCR using AI models with PyTorch and Transformers that runs in a Docker container.
MyRockae
an asynchronous service that processes file uploads, extracts text content using OCR, and interfaces with external LLM APIs to generate quizzes, flashcards, and other interactive educational content, ensuring efficient file handling and reliable data transfer to third-party AI services for real-time content generation.
am009
LLM PDF OCR工具,Markdown/Latex 文章翻译工具。支持逐段翻译和直接校对。支持数学公式。基于大语言模型(LLM)API
Abhishek-B-R
Med-Remind is a web-based tool that scans handwritten or printed doctor prescriptions and automatically creates timely medication reminders in your Google Calendar. Powered by advanced Image-to-Text AI (OCR + LLMs) and Google Calendar API, RxReminder bridges the gap between paper prescriptions and digital health management.
ceodaniyal
Free OCR powered by LLMs using OpenRouter — extract text from images with no API costs. Works with image URLs and Base64 inputs using free vision-capable models.
Jaruphat
Complete FREE Docker setup for automated Thai document processing with n8n, FFmpeg, Tesseract OCR, and Ollama LLM. Extract structured data from PDFs/images to Google Sheets without API costs!
laked0601
A research project for analysing the data held in the public domain at the UK Companies House register. Uses a combination of OCR, OpenAI's LLM APIs and Python.
cherjr
NormCap-like simple app with OCR made by LLM (via OpenRouter API)
fmancini
App de OCR de boletas y facturas con revisión con LLM Local con Ollama o la API de OpenAI
Noob-Developer-Real
A Django-based college project that integrates third-party OCR and LLM APIs to extract and translate text from uploaded documents. Built to explore backend development, API integration, and real-world deployment limitations.
Temiloluwa
A full-stack serverless solution for translating document images between languages using AWS Lambda, S3, SQS, API Gateway, DynamoDB, and advanced AI (OCR/LLMs). Includes a Next.js web frontend, REST API (API Gateway), and shared infrastructure/CI/CD support for rapid, production-grade AI deployments on AWS.
LiveisFpv
No description available
End‑to‑end starter you can run, extend, and deploy. Supports images & PDFs, returns normalized JSON with per‑field confidence. Includes batch endpoint, API key auth (optional), and a tiny demo page.
jaffer-hussain
n8n, ocr.space API, LLM (Gemini / OpenAI), google sheets
Invoice data recognition with Drive API OCR + LLM text completion
koljam
LLM-powered OCR for Papra via any OpenAI-compatible vision API
zaker-amin
Android app that helps visually impaired users understand English and Turkish texts in images using OCR, TTS, and Gemini LLM API
ahmedembeddedxx
Contextual OCR is a small API-based application that use PyTesseract & DeepSeek r1 APIs to extract text from PDFs and refine using backend LLM. It is an open-source version of gpt-4o-mini context OCR.
KohenAvocats
MCP server providing OCR capabilities to LLMs via Google Cloud Vision API - Read scanned PDFs, handwritten text, and images with any orientation
Ailzr
使用fyne完成gui,调用本地paddle-ocr和ollama提供的llm api进行翻译
eslinko
A modular AI connector framework that allows seamless integration with multiple AI APIs (OCR, LLMs, Speech-to-Text, Image Processing). Build your AI pipelines like LEGO!
ceodaniyal
Free, offline OCR using local LLMs with Ollama. Convert images to text with vision-enabled models running entirely on your machine — no cloud, no API costs, full privacy.
Astrio12345
A Python-based intelligent document reader that uses OpenCV and Tesseract OCR to extract text from images, and integrates Hugging Face LLM APIs for text translation and summarization.
eujuliu
This API was developed to receive PDFs of electricity bills, perform OCR with LLM, extract structured information, and generate energy and financial indicators ready for analysis and dashboards.
shivamsharma-1996
Scan food and cosmetic ingredients with your camera. Uses Firebase ML Kit for OCR(optical character recognition) and an LLM API to assess and rate ingredient risk levels.
subikshan2006
Built a fully offline AI Assistant combining voice commands, local LLMs (LLaMA), vision (OCR, image captioning), and system control. Enabled natural voice Q&A from documents/screenshots, app launcher, and PDF search without API usage. Stack: Python, LangChain, LLaMA.cpp, OCR, Whisper, TTS, FAISS