Found 16,852 repositories(showing 30)
apify
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
weiwill88
🧠 纯原生 Python 实现的 RAG 框架 | FAISS + BM25 混合检索 | 支持 Ollama / SiliconFlow | 适合新手入门学习
shibing624
RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能,基于本地LLM、embedding模型、reranker模型实现,支持GraphRAG,无须安装任何第三方agent库。
raghavan
RAG based tool for indexing and searching PDF text data using OpenAI API and FAISS (Facebook AI Similarity Search) index, designed for rapid information retrieval and superior search accuracy.
RafalWilinski
Fullstack "Chat with your PDFs" RAG (Retrieval Augmented Generation) app built fully on Cloudflare
tonykipkemboi
A full-stack demo showcasing a local RAG (Retrieval Augmented Generation) pipeline to chat with your PDFs.
laiso
Generate comprehensive PDFs of entire websites, ideal for RAG.
NoEdgeAI
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。
swiss-ai
Massive Multimodal Open RAG & Extraction A scalable multimodal pipeline for processing, indexing, and querying multimodal documents Ever needed to take 8000 PDFs, 2000 videos, and 500 spreadsheets and feed them to an LLM as a knowledge base? Well, MMORE is here to help you!
shubham0204
A custom RAG pipeline for multi-document QA from PDF/DOCX documents, in Android
Azure
The GPT-RAG Data Ingestion service automates processing of diverse documents—PDFs, images, spreadsheets, transcripts, and SharePoint—readying them for Azure AI Search. It applies smart chunking, generates text and image embeddings, and enables rich, multimodal retrieval.
veyliss
一个本地优先的AI知识库系统(RAG),用于把本地文档接入辅导搜索与大模型对话流程。目前支持md、txt、pdf(文本)类型
KylinMountain
Convert files into markdown to help RAG or LLM understand, based on markitdown and MinerU, which could provide high quality pdf parser.
ArmaanSeth
A multi-pdf chatbot based on RAG architecture, allows users to upload multiple pdfs and ask questions from them.
curiousily
Completely local RAG. Chat with your PDF documents (with open LLM) and UI to that uses LangChain, Streamlit, Ollama (Llama 3.1), Qdrant and advanced methods like reranking and semantic chunking.
iamarunbrahma
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
sudarshan-koirala
Simple Chainlit UI for running llms locally using Ollama and LangChain
itanishqshelar
SmartRAG is a privacy-first multimodal RAG system that lets you chat intelligently with your documents, images, and audio. Upload PDFs, Word files, or recordings and get accurate, context-aware answers all processed locally on your device with no external APIs.
laxmimerit
This repository contains implementations of Retrieval-Augmented Generation (RAG) in Jupyter notebooks. It includes examples of building chatbots with and without history, processing PDFs with RAG, and using DeepSeek models for local RAG and financial document analysis.
nico-martin
A Webapp that uses Retrieval Augmented Generation (RAG) and Large Language Models to interact with a PDF directly in the browser.
thu-vu92
In this project, I explored how to extract structured information from PDF documents, using Langchain and OpenAI models
QuentinFuxa
Agentic RAG platform purpose-built for small language models (SLM). Robust PDF/SQL search
leoneversberg
A local LLM chatbot with RAG for PDF input files
hasan-py
Chat with PDF using LangChain, Streamlit, Ollama (for LLM inference) and PDFPlumber. Overall which is an example of a Retrieval-Augmented Generation (RAG) system with Deepseek r1 model.
aahepburn
An open‑source desktop RAG application that enables semantic search across your Zotero library. Easily discover conceptually related papers and ideas within your PDF collection using local or cloud‑based LLMs. The app provides source attribution, metadata filtering, and seamless integration with Zotero. macOS and Linux. Windows support limited.
Niez-Gharbi
Build your own Custom RAG Chatbot using Gradio, Langchain and Llama2
microsoft
No description available
A PDF search ingestion RAG application with Docker + LangChain.js + Gemini
LianjiaTech
文档解析(Document Parser),支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式,高效提取与解析内容,生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser,助力 RAG、知识库、全文检索等智能应用。