Search Results

Found 365 repositories(showing 30)

contextgem

shcherbak-ai

🧡68

ContextGem: Effortless LLM extraction from documents

1.8k

150

Apache-2.0

Python

Updated 3 days ago

aicontract-analysisdata-extraction+15

GraphRAG-rs is a high-performance, state-of-the-art Rust implementation of GraphRAG (Graph-based Retrieval Augmented Generation) that builds knowledge graphs from documents and enables natural language querying with configurable entity extraction and local LLM integration

226

MIT

Rust

Updated 8 hours ago

mmore

swiss-ai

🧡56

Massive Multimodal Open RAG & Extraction A scalable multimodal pipeline for processing, indexing, and querying multimodal documents Ever needed to take 8000 PDFs, 2000 videos, and 500 spreadsheets and feed them to an LLM as a knowledge base? Well, MMORE is here to help you!

202

Apache-2.0

Python

Updated 3 days ago

LLMAIx

KatherLab

💛70

Document Information Extraction & Anonymization using local LLMs

155

AGPL-3.0

Python

Updated 3 days ago

marie-ai

marieai

🧡55

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

MIT

Python

Updated 1 day ago

dockerdocument-layout-analysisdocument-parser+14

llm-document-extraction

brandonrobertz

❤️40

A proof of concept tool for using local LLMs to transform messy text documents into structured JSON

MIT

Python

Updated 6 months ago

inDox

osllmai

❤️25

The Indox Ecosystem offers integrated AI tools for data workflows. Our four components (IndoxArcg, IndoxMiner, IndoxJudge, and IndoxGen) enhance AI applications with advanced retrieval, extraction, evaluation, and generation capabilities, supporting multiple document formats and LLM providers.

AGPL-3.0

Jupyter Notebook

Updated 4 months ago

aidocumentindex+5

rag-eval

sundi133

❤️30

Automated extraction [ET] & generation of high quality dataset based on your documents (pdf, csv, json, text files etc) to evaluate any LLM app endpoints

Apache-2.0

Python

Updated 2 months ago

Image_KIE_LLM

jiangnanboy

❤️35

利用llm大语言模型提取卡证票据关键信息。Key Information Extraction from Image with LLM(large language model).Basically, it can extract key information from all bills and documents.

MIT

Python

Updated 3 months ago

imagekey-information-extractionkie+3

FoundationModelsOCR

AviTsadok

❤️45

iOS demo app using Apple’s FoundationModels to extract data from scanned invoices. Combines Vision for image processing with LLM-powered field extraction. Runs fully on-device. Ideal for expense tracking, finance apps, or smart document parsing.

Swift

Updated 1 month ago

HD-LoA-Prompting

hzzhou01

❤️30

LLMs Learn Task Heuristics from Demonstrations: A Heuristic-Driven Prompting Strategy for Document-Level Event Argument Extraction (ACL 2024)

Python

Updated 3 months ago

rag-pdf-chatbot

nilesh325

🧡50

RAG‑PDF‑Analyzer is a Streamlit chatbot that lets users upload PDFs and query them with natural language. It uses PyPDF2 for text extraction, HuggingFace embeddings with FAISS for semantic search, and Mistral LLM via LangGraph to deliver context‑aware answers from documents.

Python

Updated 5 days ago

RAG-based-Intelligent-Conversational-AI-Agent-for-Knowledge-Extraction-Using-LangChain-Gemini-LLM

Pavansomisetty21

🧡50

In this we implements a Retrieval-Augmented Generation (RAG) based conversational AI agent designed for intelligent knowledge extraction from PDF documents. Leveraging LangChain and Google’s Gemini LLM

MIT

Jupyter Notebook

Updated 1 month ago

ai-agentai-agentsai-agents-framework+17

biomed-extractor

ElenJ

🧡55

Biomedical Document Assistant: An LLM-Powered Information Extraction and Summarization Tool for Clinical Studies

MIT

Jupyter Notebook

Updated 3 weeks ago

ClaudeSkills

kay-ou

💛70

A Claude Skill collection designed for task automation, document parsing, and intelligent dispatch. such as utilities for webpage-to-Markdown conversion, API doc parsing, and structured JSON extraction—optimized for LLM-driven workflows

MIT

Python

Updated 6 days ago

claude-codeclaude-skillsclaude-skills-creator+2

CDoc

ChatDocDev

❤️35

CDoc lets you chat with your documents using local LLMs, combining Ollama, ChromaDB, and LangChain for offline, secure, and efficient information extraction. Perfect for researchers, developers, and professionals seeking quick insights from their documents.

Python

Updated 1 year ago

chormadbfastapilangchain+6

agentic-rag-legal-challenge

TagirRamilevich

🧡50

My ARLC 2026 solution: RAG pipeline for answering legal questions over 300+ DIFC documents. Hybrid retrieval, cross-encoder reranking, deterministic extraction, LLM with grounding verification.

Python

Updated 1 week ago

idp-workflow

lordlinus

❤️35

Intelligent Document Processing pipeline powered by Azure Durable Functions — 6-step orchestration with dual-model extraction, human-in-the-loop review, and multi-provider LLM support (Azure OpenAI, Claude, Azure AI Models).

Python

Updated 1 week ago

azure-functionsdocument-processingdurable-execution+1

agentic-doc-extraction-system

DevJadhav

❤️45

Build an enterprise-grade Agentic Document Extraction System that leverages an LLM Council Architecture where multiple AI models collaborate, deliberate, and reach consensus on document extraction tasks—similar to how a panel of experts would analyze complex documents together.

MIT

Python

Updated 1 month ago

LLMCleanPDFReader

uallende

❤️35

This NLP project leverages a quantised LLM to read and correct text extracted from PDFs. Ideal for students, professionals, and data scientists, it helps clean up and organize text data from various documents. Built to run even on small GPUs with 8GB VRAM, it's a fun learning project aimed at making PDF text extraction smarter and cleaner.

Jupyter Notebook

Updated 1 year ago

doc2dataset

ltphen

🧡60

Transform documents into LLM fine-tuning datasets with intelligent extraction and quality filtering

MIT

Python

Updated 2 weeks ago

document-cutter

hifiylang

💛70

document-cutter is a production-ready semantic document chunking service built for RAG and knowledge extraction. It supports multi-format parsing, OCR, PDF image-region understanding, token-first chunking, and hybrid boundary refinement with rules, embeddings, and LLMs to generate high-quality, retrieval-friendly chunks from complex documents.

MIT

Python

Updated 5 days ago

CADInsight_AI

Pree3105

❤️45

Built an intelligent CAD analysis pipeline combining 3D mesh processing, engineering document extraction, and LLM reasoning to generate manufacturability, defect, and quality insights. Developed synthetic datasets and evaluated multiple ML models, selecting Random Forest for geometric classification and integrating into an analytics layer.

JavaScript

Updated 2 months ago

Information-extraction-in-official-documents-using-LLMs

MathieuDesponds

❤️35

Assessed MistralAI-7B capabilities for document information extraction while ensuring client confidentiality, using In-Context Learning, Chain-of-Thought, and LoRa fine-tuning. Develop cost-effective strategies for deploying LLMs in production environments

Jupyter Notebook

Updated 1 year ago

master-thesismistral-7bnlp

docex

archerprotect

❤️40

Dead simple document extraction OCR powered by LLMs

MIT

Python

Updated 10 months ago

SuperNova

ixabrar

❤️45

AI-powered educational tutor using RAG with semantic document search, persistent memory extraction, and personalized learning powered by Groq LLM

TypeScript

Updated 1 month ago

doc-intelligence-platform

Naikbhavesh123

🧡65

This project is an Enterprise Multilingual AI Document Intelligence Platform designed for comprehensive document analysis and fraud verification. Its processing pipeline combines various ML models, including EasyOCR for text extraction, LayoutLMv3 for field extraction, ViT for document classification, and Claude LLM for intelligent text correction.

Python

Updated 2 days ago

docflow.ai

jaopaulomilitao

❤️35

A complete pipeline of receive documents (PDFs or images), text extraction with OCR and make questions about the suject using a LLM. 🤖📃

Jupyter Notebook

Updated 6 months ago

InfoAPI

RhizoNymph

❤️35

An information indexing and retrieval information for LLMs and agents. Uses FastAPI, MinIO, OpenSearch, and Qdrant (with ColBERT embeddings via FastEmbed). Uses an LLM with structured output for document classification and type specific metadata extraction. Exposes index_document and search routes.

Python

Updated 8 months ago

agentsindexingllms+2

kep

IBM

❤️35

Knowledge Extraction Pipeline is a modular and extensible repository designed to extract structured knowledge from scientific and technical documents using large language models (LLMs). It allows users to define flexible JSON schemas and examples, perform robust extraction with detailed error handling.

Apache-2.0

Python

Updated 2 months ago

GitHub Explorer

Search Results

contextgem

graphrag-rs

mmore

LLMAIx

marie-ai

llm-document-extraction

inDox

rag-eval

Image_KIE_LLM

FoundationModelsOCR

HD-LoA-Prompting

rag-pdf-chatbot

RAG-based-Intelligent-Conversational-AI-Agent-for-Knowledge-Extraction-Using-LangChain-Gemini-LLM

biomed-extractor

ClaudeSkills

CDoc

agentic-rag-legal-challenge

idp-workflow

agentic-doc-extraction-system

LLMCleanPDFReader

doc2dataset

document-cutter

CADInsight_AI

Information-extraction-in-official-documents-using-LLMs

docex

SuperNova

doc-intelligence-platform

docflow.ai

InfoAPI

kep

contextgem

graphrag-rs

mmore

LLMAIx

marie-ai

llm-document-extraction

inDox

rag-eval

Image_KIE_LLM

FoundationModelsOCR

HD-LoA-Prompting

rag-pdf-chatbot

RAG-based-Intelligent-Conversational-AI-Agent-for-Knowledge-Extraction-Using-LangChain-Gemini-LLM

biomed-extractor

ClaudeSkills

CDoc

agentic-rag-legal-challenge

idp-workflow

agentic-doc-extraction-system

LLMCleanPDFReader

doc2dataset

document-cutter

CADInsight_AI

Information-extraction-in-official-documents-using-LLMs

docex

SuperNova

doc-intelligence-platform

docflow.ai

InfoAPI

kep