Found 365 repositories(showing 30)
shcherbak-ai
ContextGem: Effortless LLM extraction from documents
automataIA
GraphRAG-rs is a high-performance, state-of-the-art Rust implementation of GraphRAG (Graph-based Retrieval Augmented Generation) that builds knowledge graphs from documents and enables natural language querying with configurable entity extraction and local LLM integration
swiss-ai
Massive Multimodal Open RAG & Extraction A scalable multimodal pipeline for processing, indexing, and querying multimodal documents Ever needed to take 8000 PDFs, 2000 videos, and 500 spreadsheets and feed them to an LLM as a knowledge base? Well, MMORE is here to help you!
KatherLab
Document Information Extraction & Anonymization using local LLMs
marieai
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
brandonrobertz
A proof of concept tool for using local LLMs to transform messy text documents into structured JSON
osllmai
The Indox Ecosystem offers integrated AI tools for data workflows. Our four components (IndoxArcg, IndoxMiner, IndoxJudge, and IndoxGen) enhance AI applications with advanced retrieval, extraction, evaluation, and generation capabilities, supporting multiple document formats and LLM providers.
sundi133
Automated extraction [ET] & generation of high quality dataset based on your documents (pdf, csv, json, text files etc) to evaluate any LLM app endpoints
jiangnanboy
利用llm大语言模型提取卡证票据关键信息。Key Information Extraction from Image with LLM(large language model).Basically, it can extract key information from all bills and documents.
AviTsadok
iOS demo app using Apple’s FoundationModels to extract data from scanned invoices. Combines Vision for image processing with LLM-powered field extraction. Runs fully on-device. Ideal for expense tracking, finance apps, or smart document parsing.
hzzhou01
LLMs Learn Task Heuristics from Demonstrations: A Heuristic-Driven Prompting Strategy for Document-Level Event Argument Extraction (ACL 2024)
nilesh325
RAG‑PDF‑Analyzer is a Streamlit chatbot that lets users upload PDFs and query them with natural language. It uses PyPDF2 for text extraction, HuggingFace embeddings with FAISS for semantic search, and Mistral LLM via LangGraph to deliver context‑aware answers from documents.
In this we implements a Retrieval-Augmented Generation (RAG) based conversational AI agent designed for intelligent knowledge extraction from PDF documents. Leveraging LangChain and Google’s Gemini LLM
ElenJ
Biomedical Document Assistant: An LLM-Powered Information Extraction and Summarization Tool for Clinical Studies
kay-ou
A Claude Skill collection designed for task automation, document parsing, and intelligent dispatch. such as utilities for webpage-to-Markdown conversion, API doc parsing, and structured JSON extraction—optimized for LLM-driven workflows
ChatDocDev
CDoc lets you chat with your documents using local LLMs, combining Ollama, ChromaDB, and LangChain for offline, secure, and efficient information extraction. Perfect for researchers, developers, and professionals seeking quick insights from their documents.
TagirRamilevich
My ARLC 2026 solution: RAG pipeline for answering legal questions over 300+ DIFC documents. Hybrid retrieval, cross-encoder reranking, deterministic extraction, LLM with grounding verification.
lordlinus
Intelligent Document Processing pipeline powered by Azure Durable Functions — 6-step orchestration with dual-model extraction, human-in-the-loop review, and multi-provider LLM support (Azure OpenAI, Claude, Azure AI Models).
DevJadhav
Build an enterprise-grade Agentic Document Extraction System that leverages an LLM Council Architecture where multiple AI models collaborate, deliberate, and reach consensus on document extraction tasks—similar to how a panel of experts would analyze complex documents together.
uallende
This NLP project leverages a quantised LLM to read and correct text extracted from PDFs. Ideal for students, professionals, and data scientists, it helps clean up and organize text data from various documents. Built to run even on small GPUs with 8GB VRAM, it's a fun learning project aimed at making PDF text extraction smarter and cleaner.
ltphen
Transform documents into LLM fine-tuning datasets with intelligent extraction and quality filtering
hifiylang
document-cutter is a production-ready semantic document chunking service built for RAG and knowledge extraction. It supports multi-format parsing, OCR, PDF image-region understanding, token-first chunking, and hybrid boundary refinement with rules, embeddings, and LLMs to generate high-quality, retrieval-friendly chunks from complex documents.
Pree3105
Built an intelligent CAD analysis pipeline combining 3D mesh processing, engineering document extraction, and LLM reasoning to generate manufacturability, defect, and quality insights. Developed synthetic datasets and evaluated multiple ML models, selecting Random Forest for geometric classification and integrating into an analytics layer.
MathieuDesponds
Assessed MistralAI-7B capabilities for document information extraction while ensuring client confidentiality, using In-Context Learning, Chain-of-Thought, and LoRa fine-tuning. Develop cost-effective strategies for deploying LLMs in production environments
archerprotect
Dead simple document extraction OCR powered by LLMs
ixabrar
AI-powered educational tutor using RAG with semantic document search, persistent memory extraction, and personalized learning powered by Groq LLM
Naikbhavesh123
This project is an Enterprise Multilingual AI Document Intelligence Platform designed for comprehensive document analysis and fraud verification. Its processing pipeline combines various ML models, including EasyOCR for text extraction, LayoutLMv3 for field extraction, ViT for document classification, and Claude LLM for intelligent text correction.
jaopaulomilitao
A complete pipeline of receive documents (PDFs or images), text extraction with OCR and make questions about the suject using a LLM. 🤖📃
RhizoNymph
An information indexing and retrieval information for LLMs and agents. Uses FastAPI, MinIO, OpenSearch, and Qdrant (with ColBERT embeddings via FastEmbed). Uses an LLM with structured output for document classification and type specific metadata extraction. Exposes index_document and search routes.
IBM
Knowledge Extraction Pipeline is a modular and extensible repository designed to extract structured knowledge from scientific and technical documents using large language models (LLMs). It allows users to define flexible JSON schemas and examples, perform robust extraction with detailed error handling.