Search Results

Found 125 repositories(showing 30)

llm_aided_ocr

Dicklesworthstone

💛75

Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs

2.9k

204

NOASSERTION

Python

Updated 2 days ago

ai-assistllama2llm+3

mlx-tune

ARahim3

🧡67

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

1.0k

Apache-2.0

Python

Updated 1 hour ago

apple-silicondeep-learninghuggingface+17

api-llm-ocr

yigitkonur

🧡66

PDF to markdown using vision LLMs — tables, layouts, and structure preserved

890

NOASSERTION

Python

Updated 5 days ago

document-aifastapiocr+5

Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-script based invocation make it difficult to use for application development. Here, it has been containerized and made available via an API, greatly enhancing its ease-of-use.

AGPL-3.0

Python

Updated 1 month ago

llm-pdf-ocr-api

samestrin

❤️45

A Python-based REST API for PDF OCR using AI models with PyTorch and Transformers that runs in a Docker container.

MIT

Python

Updated 2 months ago

aiai-ocrapi+12

Backend.AIGenerator.WebJob

MyRockae

❤️35

an asynchronous service that processes file uploads, extracts text content using OCR, and interfaces with external LLM APIs to generate quizzes, flashcards, and other interactive educational content, ensuring efficient file handling and reliable data transfer to third-party AI services for real-time content generation.

MIT

Python

Updated 1 month ago

apidjango-rest-frameworkgemini-api+4

LLM-online-tool

am009

❤️45

LLM PDF OCR工具，Markdown/Latex 文章翻译工具。支持逐段翻译和直接校对。支持数学公式。基于大语言模型（LLM）API

JavaScript

Updated 1 month ago

llmocrpdf-ocr+1

Med-Remind

Abhishek-B-R

❤️45

Med-Remind is a web-based tool that scans handwritten or printed doctor prescriptions and automatically creates timely medication reminders in your Google Calendar. Powered by advanced Image-to-Text AI (OCR + LLMs) and Google Calendar API, RxReminder bridges the gap between paper prescriptions and digital health management.

TypeScript

Updated 2 months ago

free-llm-image-to-text

ceodaniyal

🧡65

Free OCR powered by LLMs using OpenRouter — extract text from images with no API costs. Works with image URLs and Base64 inputs using free vision-capable models.

Python

Updated 2 hours ago

ai-ocrapi-integrationcomputer-vision+10

n8n-ffmpeg-tesseract-ollama

Jaruphat

❤️45

Complete FREE Docker setup for automated Thai document processing with n8n, FFmpeg, Tesseract OCR, and Ollama LLM. Extract structured data from PDFs/images to Google Sheets without API costs!

Updated 2 months ago

CompaniesHouseGPT-Public

laked0601

❤️35

A research project for analysing the data held in the public domain at the UK Companies House register. Uses a combination of OCR, OpenAI's LLM APIs and Python.

MIT

Python

Updated 10 months ago

screen-ocr-llm

cherjr

❤️40

NormCap-like simple app with OCR made by LLM (via OpenRouter API)

MIT

Python

Updated 4 months ago

grokocrpython+1

app_lectura_boletas_facturas

fmancini

❤️35

App de OCR de boletas y facturas con revisión con LLM Local con Ollama o la API de OpenAI

MIT

Python

Updated 11 months ago

ClarityRx

Noob-Developer-Real

❤️40

A Django-based college project that integrates third-party OCR and LLM APIs to extract and translate text from uploaded documents. Built to explore backend development, API integration, and real-world deployment limitations.

HTML

Updated 1 month ago

Document-Image-Translator

Temiloluwa

❤️35

A full-stack serverless solution for translating document images between languages using AWS Lambda, S3, SQS, API Gateway, DynamoDB, and advanced AI (OCR/LLMs). Includes a Next.js web frontend, REST API (API Gateway), and shared infrastructure/CI/CD support for rapid, production-grade AI deployments on AWS.

Python

Updated 4 months ago

LLM-OCR-API

LiveisFpv

❤️35

No description available

Python

Updated 2 months ago

Marksheet-Extraction-API--FastAPI---OCR---LLM-

Bhavani3839

❤️40

End‑to‑end starter you can run, extend, and deploy. Supports images & PDFs, returns normalized JSON with per‑field confidence. Includes batch endpoint, API key auth (optional), and a tiny demo page.

HTML

Updated 2 months ago

Invoice-Automation

jaffer-hussain

❤️35

n8n, ocr.space API, LLM (Gemini / OpenAI), google sheets

Updated 3 months ago

gas-invoice-analyst

xd2

❤️35

Invoice data recognition with Drive API OCR + LLM text completion

JavaScript

Updated 8 months ago

papra-ai

koljam

🧡60

LLM-powered OCR for Papra via any OpenAI-compatible vision API

TypeScript

Updated 3 days ago

dmspapra

PicToSpeech

zaker-amin

❤️35

Android app that helps visually impaired users understand English and Turkish texts in images using OCR, TTS, and Gemini LLM API

Java

Updated 7 months ago

android-appapkjava+4

contextual-ocr

ahmedembeddedxx

❤️40

Contextual OCR is a small API-based application that use PyTesseract & DeepSeek r1 APIs to extract text from PDFs and refine using backend LLM. It is an open-source version of gpt-4o-mini context OCR.

Apache-2.0

Jupyter Notebook

Updated 1 year ago

mcp-server-google-vision

KohenAvocats

❤️40

MCP server providing OCR capabilities to LLMs via Google Cloud Vision API - Read scanned PDFs, handwritten text, and images with any orientation

MIT

Python

Updated 3 months ago

claudegoogle-visionhandwriting-recognition+6

llmTranslator

Ailzr

🧡50

使用fyne完成gui，调用本地paddle-ocr和ollama提供的llm api进行翻译

MIT

Updated 1 month ago

z888-ai-hub

eslinko

❤️40

A modular AI connector framework that allows seamless integration with multiple AI APIs (OCR, LLMs, Speech-to-Text, Image Processing). Build your AI pipelines like LEGO!

MIT

Updated 1 year ago

local-llm-ocr-ollama

ceodaniyal

❤️30

Free, offline OCR using local LLMs with Ollama. Convert images to text with vision-enabled models running entirely on your machine — no cloud, no API costs, full privacy.

Python

Updated 1 month ago

ai-ocrcomputer-visionfree-ocr+11

AI_Document_Analyzer

Astrio12345

❤️45

A Python-based intelligent document reader that uses OpenCV and Tesseract OCR to extract text from images, and integrates Hugging Face LLM APIs for text translation and summarization.

HTML

Updated 2 months ago

electricity-bill-extractor

eujuliu

🧡55

This API was developed to receive PDFs of electricity bills, perform OCR with LLM, extract structured information, and generate energy and financial indicators ready for analysis and dashboards.

GPL-3.0

TypeScript

Updated 3 weeks ago

RiskCheck

shivamsharma-1996

❤️35

Scan food and cosmetic ingredients with your camera. Uses Firebase ML Kit for OCR(optical character recognition) and an LLM API to assess and rate ingredient risk levels.

Kotlin

Updated 8 months ago

Multi-Modal-AI-Assistant-Jarvis-style-

subikshan2006

❤️35

Built a fully offline AI Assistant combining voice commands, local LLMs (LLaMA), vision (OCR, image captioning), and system control. Enabled natural voice Q&A from documents/screenshots, app launcher, and PDF search without API usage. Stack: Python, LangChain, LLaMA.cpp, OCR, Whisper, TTS, FAISS

Python

Updated 9 months ago

GitHub Explorer

Search Results

llm_aided_ocr

mlx-tune

api-llm-ocr

kosmos-2_5-containerized

llm-pdf-ocr-api

Backend.AIGenerator.WebJob

LLM-online-tool

Med-Remind

free-llm-image-to-text

n8n-ffmpeg-tesseract-ollama

CompaniesHouseGPT-Public

screen-ocr-llm

app_lectura_boletas_facturas

ClarityRx

Document-Image-Translator

LLM-OCR-API

Marksheet-Extraction-API--FastAPI---OCR---LLM-

Invoice-Automation

gas-invoice-analyst

papra-ai

PicToSpeech

contextual-ocr

mcp-server-google-vision

llmTranslator

z888-ai-hub

local-llm-ocr-ollama

AI_Document_Analyzer

electricity-bill-extractor

RiskCheck

Multi-Modal-AI-Assistant-Jarvis-style-

llm_aided_ocr

mlx-tune

api-llm-ocr

kosmos-2_5-containerized

llm-pdf-ocr-api

Backend.AIGenerator.WebJob

LLM-online-tool

Med-Remind

free-llm-image-to-text

n8n-ffmpeg-tesseract-ollama

CompaniesHouseGPT-Public

screen-ocr-llm

app_lectura_boletas_facturas

ClarityRx

Document-Image-Translator

LLM-OCR-API

Marksheet-Extraction-API--FastAPI---OCR---LLM-

Invoice-Automation

gas-invoice-analyst

papra-ai

PicToSpeech

contextual-ocr

mcp-server-google-vision

llmTranslator

z888-ai-hub

local-llm-ocr-ollama

AI_Document_Analyzer

electricity-bill-extractor

RiskCheck

Multi-Modal-AI-Assistant-Jarvis-style-