Found 18 repositories(showing 18)
livefiredev
No description available
Baskar-forever
PDF Table Extractor is an innovative Python project designed to tackle the challenge of extracting tables from scanned PDF documents. Leveraging advanced optical character recognition (OCR) and image processing techniques.
sfkbstnc
A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.
shellatheresyapandiangan
A Python package for extracting tables from images and PDFs using OCR. Requires external tools like pdfimages, tesseract, and mogrify. Modules handle PDF-to-image conversion, table detection, cell extraction, OCR, and CSV generation. Includes a demo for testing with sample images.
Manikandan-2205
InvoAI is an AI-powered Invoice OCR Automation System built with Python. It extracts key details and tables from PDF or image invoices using OCR and machine learning, returning clean JSON via API with an interactive UI for visualization and validation.
fatima0773
No description available
No description available
tdiprima
Python CLI that extracts text, tables, and OCR'd images from PDFs, with optional OpenAI summarization.
Shetteemah
A Python-based OCR mini-project to extract text, tables, and specific scores from scanned medical record images, built to explore OCR applications.
sinanguyer
This Python script uses OCR to extract tabular data from images, removing table lines and enhancing text clarity with image processing. The extracted data is structured into a pandas DataFrame and can be saved to an Excel file, automating data extraction from scanned documents.
sameeraherath
Convert scanned documents (images or PDFs) into clean CSV files using OCR and Python. A lightweight package that extracts text, tables, and structured data from hard copy documents and exports them into CSV format.
Fenil5786
This Python script processes a PDF document to extract financial data such as revenue, profit before tax, and profit after tax. It utilizes OCR for image-based text extraction and structured data extraction from tables.
ericearl
TabularOCR is a Python library that provides an easy-to-use Optical Character Recognition (OCR) solution for extracting tables from images and PDFs. It offers flexible output options, allowing you to export the extracted data in CSV, XLSX, or other spreadsheet formats.
Nandana-pramod
An AI-powered Image-Based Invoice & Form Filler Agent that extracts structured data like product details, codes, quantities, and totals from both table and non-table invoices. Uses OCR (Tesseract) and Python for text recognition, enabling automated data entry into digital systems
anooj-gandham
multimodalparser is a versatile Python library for extracting structured data from various file types, including PDFs, images, Word documents, Excel files, JSON, CSV, and plain text. It supports text extraction, OCR, table parsing, and metadata retrieval, making it ideal for multimodal data processing and analysis.
akshayds23
A modern, responsive FastAPI application that turns a plain questions.txt file and an optional dataset (CSV/XLSX/JSON/Parquet/PDF/Images/DB) into structured answers. Under the hood, Curia Logica convenes a council of models (OpenAI, Gemini, Claude), extracts tables from PDFs and images (PyMuPDF + OCR), generates runnable Python to compute results,
shubhampandey013
A standalone Python-based solution for extracting structured data from complex web pages, including JavaScript-rendered content, HTML tables, and image-based information using OCR. The project converts visually rich and unstructured web data into clean, LLM-ready JSON format through a clear and extensible extraction pipeline.
OCR, also known as Optical Character Recognition allows you to 'recognise' text from within a document, whether it be an image, a PDF or a table. Leveraging OCR, you can easily extract this text to be able to use it for additional processing, NLP, or fit it into your regular workflow. The great thing about EasyOCR (which is shown in the project) is that it works with Python and is quite accurate without any fine tuning, this means you can spend less time processing and more time doing the fun stuff.
All 18 repositories loaded