Found 95 repositories(showing 30)
oxylabs
Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.
vifreefly
Write web scrapers in Ruby using a clean, AI-assisted DSL. Kimurai uses AI to figure out where the data lives, then caches the selectors and scrapes with pure Ruby. Get the intelligence of an LLM without the per-request latency or token costs.
patrickloeber
A list of useful Open Source tools and scrapers to gather data for LLMs
oxylabs
Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio JS SDK for intelligent web data gathering.
neuledge
⚡️ Real-time Knowledge Graph for AI Agents. Connect LLMs to verified weather, stock, and currency data via instant tool-calling. No API keys, no scrapers, just grounded facts in <100ms.
Pangolin-spg
Real-time Web Scraper Skill for OpenClaw & AI Agents. Fetch live data from Google Search, Amazon, and Walmart using Pangolinfo's AI Mode API. Returns LLM-ready JSON/Markdown.
olddove-laoge
AI智能网页数据提取工具 - 基于 LLM,自动分析页面层级结构,智能爬取各类你想要的数据。 AI-Powered Web Scraper - Uses LLM to intelligently analyze page structure and crawls various types of data you want.
AndreaBozzo
Next-gen AI scraper — LLM-powered structured data extraction
D-artisan
An AI-powered web scraper application that leverages free LLM providers to perform intelligent web scraping based on user prompts. The system allows users to input scraping instructions via a user-friendly UI, process web data using LLM APIs, and output results in professional formats 🚀
ACM-VIT
A flexible web scraper that intelligently adapts to different website structures using multiple extraction strategies (newspaper3k, readability-lxml, BeautifulSoup, and optional headless rendering). It outputs clean, structured data for RAG pipelines or local LLMs, with an optional extension to automatically build RAG indexes from web queries.
jessaminesimple608
🌐 Streamline web scraping with Scraper MCP, a server that optimizes content for AI by filtering data and reducing token usage for LLMs.
x-hannibal
An intelligent, context-aware web search filter for Open WebUI. EasySearch bypasses noisy standard web scrapers, utilizing parallel fetching, structural HTML cleaning, and dynamic context-awareness to feed your LLM only the highest quality data.
soorajInsights
The AI Web Scraper is a Python tool that combines web scraping and AI to extract specific data from websites using natural language prompts. It uses Selenium, BeautifulSoup, and LangChain for scraping and parsing, with support for Ollama (LLM) for advanced content processing, handling challenges like captchas and IP bans.
devBhas
DevCrawler - An LLM Friendly Web Crawler & Data Scraper
leoshan
InferenceX Data Scraper - 从 SemiAnalysis InferenceX 平台自动采集 LLM 推理性能基准数据
Newton-Maina
An AI-powered web scraper that uses Selenium for dynamic content extraction and local LLMs (via Ollama) to parse unstructured HTML into clean, structured data.
ntvinh2005
Web scraper scraping data from wikipedia page and then store in in smaller chunk of data for later retrieval. These small chunk of text can be used as data to train Large Language Models. It also scrape text a network of related topics surrounded the main topic, help provide sufficient dataset for LLM training quickly and automatically
Tanguy9862
Developed a Python-based web scraper leveraging generative AI with LangChain and GPT-4o-mini to extract and classify FDA drug approval data. Processed over 1,770 records, dynamically categorizing medications and treatment areas using LLMs to simplify complex medical information into actionable insights.
kartiksharma1202
A web scraper application integrated with LLMs for processing scraped data.
luminati-io
How to integrate LangChain with Bright Data's Web Scraper API for efficient web scraping and real-world LLM data enrichment.
gail-mar
roject Pipeline LinkedIn Scraper → Job Dataset → EDA → LLM Generator → Streamlit App The project moves from data collection → analysis → automated application generation
Gauravmangate27
AI-Powered Scraper Bot is an intelligent web scraping and data processing system that leverages Large Language Models (LLMs) such as GPT.
Burton-David
AI-powered Chamber of Commerce directory scraper using local LLM (Qwen2-7B) for B2B lead generation. Extracts business data with adaptive navigation and structured output.
mr-veyrion
Python-based news scraper & sentiment analyzer. Fetches India metro news, uses OpenRouter LLMs for analysis, and serves data via Flask to a Three.js globe UI.
Cloudiu9
A web scraper designed to extract structured data from a specific site, with options to process the content using an LLM or export it as JSON.
NomanAhmed234
AI Web Scraper + LLM-Powered Insights is a powerful, open-source Streamlit web application that allows users to scrape data from any website simply by entering its URL.
koldo66crack
LLM-based RentHop scraper for VeloCity (Columbia startup). Phase 1: hardcoded pagination collects listing URLs. Phase 2: Gemini LLM extracts structured rental data from HTML. Helps students find affordable housing with smart deduplication, multi-API key support, and incremental updates.
pphothidaen
A Zero-Waste Agentic AI Distributed Architecture. Orchestrating Local LLMs (Mac M4), Enterprise Data Lakes (DS224+), and Legacy Edge Scrapers (ARM) into a unified "Stream of Consciousness." Decoupled, token-efficient, and privacy-first. “Knowledge is not a static state, but a continuous process of arising and ceasing.”
ZenXen7
This project is a Python-based webscraper utilizing the Ollama Language Model (LLM) to enhance web scraping capabilities with natural language processing. The scraper efficiently extracts data from websites and uses Ollama’s advanced language model to parse, clean, and analyze the data.
mangoon5
A news aggregation site that uses a combination of web scraping, an SQL Database, and a LLM to generate summaries of news articles. The web scraper is built using BeautifulSoup and aggregates data from various news outlets. The data is stored in an SQLite3 database.