Search Results

Found 16,852 repositories(showing 30)

crawlee

apify

💚98

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

22.7k

1.3k

Apache-2.0

TypeScript

Updated 15 minutes ago

apifyautomationcrawler+14

crawlee-python

apify

💛81

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

8.7k

702

Apache-2.0

Python

Updated 5 minutes ago

apifyautomationbeautifulsoup+14

Local_Pdf_Chat_RAG

weiwill88

🧡67

🧠 纯原生 Python 实现的 RAG 框架 | FAISS + BM25 混合检索 | 支持 Ollama / SiliconFlow | 适合新手入门学习

870

159

Python

Updated 10 hours ago

bm25chinese-nlpdeepseek+5

ChatPDF

shibing624

🧡62

RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能，基于本地LLM、embedding模型、reranker模型实现，支持GraphRAG，无须安装任何第三方agent库。

844

144

Apache-2.0

Python

Updated 1 week ago

chatdocchatpdfgraphrag+4

PdfGptIndexer

raghavan

🧡61

RAG based tool for indexing and searching PDF text data using OpenAI API and FAISS (Facebook AI Similarity Search) index, designed for rapid information retrieval and superior search accuracy.

677

MIT

Python

Updated 1 week ago

cloudflare-rag

RafalWilinski

🧡66

Fullstack "Chat with your PDFs" RAG (Retrieval Augmented Generation) app built fully on Cloudflare

596

TypeScript

Updated 5 days ago

chatgptcloudflarellm+1

ollama_pdf_rag

tonykipkemboi

💛72

A full-stack demo showcasing a local RAG (Retrieval Augmented Generation) pipeline to chat with your PDFs.

503

190

MIT

TypeScript

Updated 6 days ago

langchainnextjsollama+3

site2pdf

laiso

💛71

Generate comprehensive PDFs of entire websites, ideal for RAG.

302

MIT

TypeScript

Updated 5 hours ago

pdfdeal

NoEdgeAI

🧡60

A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装，同时附带本地的文本处理(提升PDF在RAG中的召回率)。

285

MIT

Python

Updated 2 weeks ago

doc2xocrpdf+1

mmore

swiss-ai

🧡56

Massive Multimodal Open RAG & Extraction A scalable multimodal pipeline for processing, indexing, and querying multimodal documents Ever needed to take 8000 PDFs, 2000 videos, and 500 spreadsheets and feed them to an LLM as a knowledge base? Well, MMORE is here to help you!

202

Apache-2.0

Python

Updated 2 days ago

OnDevice-RAG-Android

shubham0204

🧡65

A custom RAG pipeline for multi-document QA from PDF/DOCX documents, in Android

179

Apache-2.0

Kotlin

Updated 5 days ago

androidlarge-language-modelson-device-ml+2

gpt-rag-ingestion

Azure

💛71

The GPT-RAG Data Ingestion service automates processing of diverse documents—PDFs, images, spreadsheets, transcripts, and SharePoint—readying them for Azure AI Search. It applies smart chunking, generates text and image embeddings, and enables rich, multimodal retrieval.

170

MIT

Python

Updated 5 days ago

ai-localbase

veyliss

🧡65

一个本地优先的AI知识库系统（RAG），用于把本地文档接入辅导搜索与大模型对话流程。目前支持md、txt、pdf（文本）类型

146

MIT

Updated 1 day ago

aigoknowledge-base+4

markify

KylinMountain

🧡50

Convert files into markdown to help RAG or LLM understand, based on markitdown and MinerU, which could provide high quality pdf parser.

133

NOASSERTION

Python

Updated 1 month ago

markdownpdfrag

ChatPDF

ArmaanSeth

💛71

A multi-pdf chatbot based on RAG architecture, allows users to upload multiple pdfs and ask questions from them.

128

Apache-2.0

Python

Updated 1 day ago

ragbase

curiousily

🧡61

Completely local RAG. Chat with your PDF documents (with open LLM) and UI to that uses LangChain, Streamlit, Ollama (Llama 3.1), Qdrant and advanced methods like reranking and semantic chunking.

123

MIT

Python

Updated 3 days ago

langchainllama3llm+4

pdf-to-markdown

iamarunbrahma

💛70

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

122

MIT

Python

Updated 13 hours ago

document-conversiondocument-processinginformation-retrieval+8

rag-chat-with-pdf

sudarshan-koirala

💛70

Simple Chainlit UI for running llms locally using Ollama and LangChain

119

MIT

Python

Updated 2 days ago

SmartRAG

itanishqshelar

💛70

SmartRAG is a privacy-first multimodal RAG system that lets you chat intelligently with your documents, images, and audio. Upload PDFs, Word files, or recordings and get accurate, context-aware answers all processed locally on your device with no external APIs.

111

MIT

Python

Updated 20 hours ago

ollama-chatbot

laxmimerit

🧡56

This repository contains implementations of Retrieval-Augmented Generation (RAG) in Jupyter notebooks. It includes examples of building chatbots with and without history, processing PDFs with RAG, and using DeepSeek models for local RAG and financial document analysis.

108

Jupyter Notebook

Updated 1 week ago

ask-my-pdf

nico-martin

🧡55

A Webapp that uses Retrieval Augmented Generation (RAG) and Large Language Models to interact with a PDF directly in the browser.

105

MIT

TypeScript

Updated 2 weeks ago

ragtransformers-jswebai+1

structured-rag-pdf

thu-vu92

🧡56

In this project, I explored how to extract structured information from PDF documents, using Langchain and OpenAI models

103

Jupyter Notebook

Updated 2 weeks ago

PolyRAG

QuentinFuxa

🧡55

Agentic RAG platform purpose-built for small language models (SLM). Robust PDF/SQL search

MIT

Python

Updated 2 weeks ago

llm-chatbot-rag

leoneversberg

🧡60

A local LLM chatbot with RAG for PDF input files

MIT

Jupyter Notebook

Updated 3 weeks ago

chatbotllmnlp+1

chat-with-pdf-RAG

hasan-py

🧡50

Chat with PDF using LangChain, Streamlit, Ollama (for LLM inference) and PDFPlumber. Overall which is an example of a Retrieval-Augmented Generation (RAG) system with Deepseek r1 model.

Python

Updated 2 weeks ago

RAG-Assistant-for-Zotero

aahepburn

🧡60

An open‑source desktop RAG application that enables semantic search across your Zotero library. Easily discover conceptually related papers and ideas within your PDF collection using local or cloud‑based LLMs. The app provides source attribution, metadata filtering, and seamless integration with Zotero. macOS and Linux. Windows support limited.

NOASSERTION

Python

Updated 39 minutes ago

natural-language-processingollama-apirag-chatbot+2

PDF-RAG-with-Llama2-and-Gradio

Niez-Gharbi

💛70

Build your own Custom RAG Chatbot using Gradio, Langchain and Llama2

Apache-2.0

Python

Updated 4 days ago

chatbotchromagenerative-ai+6

RAG-PDF-Analyzer-WPF-Sample

microsoft

❤️45

No description available

MIT

Updated 3 weeks ago

rag-search-ingestion-langchainjs-gemini

glaucia86

🧡60

A PDF search ingestion RAG application with Docker + LangChain.js + Gemini

MIT

PowerShell

Updated 2 weeks ago

bella-domify

LianjiaTech

🧡65

文档解析（Document Parser），支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式，高效提取与解析内容，生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser，助力 RAG、知识库、全文检索等智能应用。

GPL-2.0

Python

Updated 8 hours ago

document-parserparserpdf-parser

GitHub Explorer

Search Results

crawlee

crawlee-python

Local_Pdf_Chat_RAG

ChatPDF

PdfGptIndexer

cloudflare-rag

ollama_pdf_rag

site2pdf

pdfdeal

mmore

OnDevice-RAG-Android

gpt-rag-ingestion

ai-localbase

markify

ChatPDF

ragbase

pdf-to-markdown

rag-chat-with-pdf

SmartRAG

ollama-chatbot

ask-my-pdf

structured-rag-pdf

PolyRAG

llm-chatbot-rag

chat-with-pdf-RAG

RAG-Assistant-for-Zotero

PDF-RAG-with-Llama2-and-Gradio

RAG-PDF-Analyzer-WPF-Sample

rag-search-ingestion-langchainjs-gemini

bella-domify

crawlee

crawlee-python

Local_Pdf_Chat_RAG

ChatPDF

PdfGptIndexer

cloudflare-rag

ollama_pdf_rag

site2pdf

pdfdeal

mmore

OnDevice-RAG-Android

gpt-rag-ingestion

ai-localbase

markify

ChatPDF

ragbase

pdf-to-markdown

rag-chat-with-pdf

SmartRAG

ollama-chatbot

ask-my-pdf

structured-rag-pdf

PolyRAG

llm-chatbot-rag

chat-with-pdf-RAG

RAG-Assistant-for-Zotero

PDF-RAG-with-Llama2-and-Gradio

RAG-PDF-Analyzer-WPF-Sample

rag-search-ingestion-langchainjs-gemini

bella-domify