Search Results

Found 18 repositories(showing 18)

ocr-extract-table-from-image-python

livefiredev

❤️25

No description available

Python

Updated 3 months ago

TableExtractor-Advanced-PDF-Table-Extraction

Baskar-forever

❤️40

PDF Table Extractor is an innovative Python project designed to tackle the challenge of extracting tables from scanned PDF documents. Leveraging advanced optical character recognition (OCR) and image processing techniques.

MIT

Jupyter Notebook

Updated 4 months ago

ocr-pythonscanedpdf-extractiontable-extraction+2

pdf-extractor-cli

sfkbstnc

❤️40

A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.

MIT

Python

Updated 5 months ago

pdfpdf-extractorpdf-ocr-extraction+4

ocr_masterdata

shellatheresyapandiangan

❤️40

A Python package for extracting tables from images and PDFs using OCR. Requires external tools like pdfimages, tesseract, and mogrify. Modules handle PDF-to-image conversion, table detection, cell extraction, OCR, and CSV generation. Includes a demo for testing with sample images.

MIT

Python

Updated 1 year ago

InvoAI

Manikandan-2205

❤️45

InvoAI is an AI-powered Invoice OCR Automation System built with Python. It extracts key details and tables from PDF or image invoices using OCR and machine learning, returning clean JSON via API with an interactive UI for visualization and validation.

JavaScript

Updated 2 months ago

ocr-extract-table-from-image-python

fatima0773

❤️20

No description available

Python

Updated 1 year ago

Extract_Table_Data_from_Image_Using_python-Open-CV_and_OCR

jewel-106

❤️25

No description available

Python

Updated 1 year ago

pdf_to_text

tdiprima

🧡65

Python CLI that extracts text, tables, and OCR'd images from PDFs, with optional OpenAI summarization.

Python

Updated 1 day ago

ocropenaipdf+2

mini-OCR

Shetteemah

❤️40

A Python-based OCR mini-project to extract text, tables, and specific scores from scanned medical record images, built to explore OCR applications.

MIT

Jupyter Notebook

Updated 7 months ago

image-to-table-generator

sinanguyer

❤️35

This Python script uses OCR to extract tabular data from images, removing table lines and enhancing text clarity with image processing. The extracted data is structured into a pandas DataFrame and can be saved to an Excel file, automating data extraction from scanned documents.

Python

Updated 1 year ago

scan2csv

sameeraherath

❤️35

Convert scanned documents (images or PDFs) into clean CSV files using OCR and Python. A lightweight package that extracts text, tables, and structured data from hard copy documents and exports them into CSV format.

Updated 7 months ago

Finance-Report-Analyzer

Fenil5786

❤️35

This Python script processes a PDF document to extract financial data such as revenue, profit before tax, and profit after tax. It utilizes OCR for image-based text extraction and structured data extraction from tables.

Python

Updated 8 months ago

TabularOCR

ericearl

❤️40

TabularOCR is a Python library that provides an easy-to-use Optical Character Recognition (OCR) solution for extracting tables from images and PDFs. It offers flexible output options, allowing you to export the extracted data in CSV, XLSX, or other spreadsheet formats.

MIT

Updated 1 year ago

Image-Based-Invoice-and-Form-Filler-Agent

Nandana-pramod

❤️35

An AI-powered Image-Based Invoice & Form Filler Agent that extracts structured data like product details, codes, quantities, and totals from both table and non-table invoices. Uses OCR (Tesseract) and Python for text recognition, enabling automated data entry into digital systems

Python

Updated 7 months ago

multimodalparser

anooj-gandham

❤️40

multimodalparser is a versatile Python library for extracting structured data from various file types, including PDFs, images, Word documents, Excel files, JSON, CSV, and plain text. It supports text extraction, OCR, table parsing, and metadata retrieval, making it ideal for multimodal data processing and analysis.

MIT

Updated 1 year ago

Curia-logica

akshayds23

🧡50

A modern, responsive FastAPI application that turns a plain questions.txt file and an optional dataset (CSV/XLSX/JSON/Parquet/PDF/Images/DB) into structured answers. Under the hood, Curia Logica convenes a council of models (OpenAI, Gemini, Claude), extracts tables from PDFs and images (PyMuPDF + OCR), generates runnable Python to compute results,

Apache-2.0

Python

Updated 2 months ago

Web-Data-Extraction-Pipeline

shubhampandey013

❤️45

A standalone Python-based solution for extracting structured data from complex web pages, including JavaScript-rendered content, HTML tables, and image-based information using OCR. The project converts visually rich and unstructured web data into clean, LLM-ready JSON format through a clear and extensible extraction pipeline.

Python

Updated 2 months ago

Optical-Character-Recognition-with-EasyOCR-and-Python

Shubham654

❤️35

OCR, also known as Optical Character Recognition allows you to 'recognise' text from within a document, whether it be an image, a PDF or a table. Leveraging OCR, you can easily extract this text to be able to use it for additional processing, NLP, or fit it into your regular workflow. The great thing about EasyOCR (which is shown in the project) is that it works with Python and is quite accurate without any fine tuning, this means you can spend less time processing and more time doing the fun stuff.

Jupyter Notebook

Updated 4 years ago

All 18 repositories loaded

GitHub Explorer

Search Results

ocr-extract-table-from-image-python

TableExtractor-Advanced-PDF-Table-Extraction

pdf-extractor-cli

ocr_masterdata

InvoAI

ocr-extract-table-from-image-python

Extract_Table_Data_from_Image_Using_python-Open-CV_and_OCR

pdf_to_text

mini-OCR

image-to-table-generator

scan2csv

Finance-Report-Analyzer

TabularOCR

Image-Based-Invoice-and-Form-Filler-Agent

multimodalparser

Curia-logica

Web-Data-Extraction-Pipeline

Optical-Character-Recognition-with-EasyOCR-and-Python

ocr-extract-table-from-image-python

TableExtractor-Advanced-PDF-Table-Extraction

pdf-extractor-cli

ocr_masterdata

InvoAI

ocr-extract-table-from-image-python

Extract_Table_Data_from_Image_Using_python-Open-CV_and_OCR

pdf_to_text

mini-OCR

image-to-table-generator

scan2csv

Finance-Report-Analyzer

TabularOCR

Image-Based-Invoice-and-Form-Filler-Agent

multimodalparser

Curia-logica

Web-Data-Extraction-Pipeline

Optical-Character-Recognition-with-EasyOCR-and-Python