Search Results

Found 95 repositories(showing 30)

oxylabs-ai-studio-py

oxylabs

💛73

Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.

2.7k

MIT

Python

Updated 4 hours ago

ai-crawlerai-scraperai-scraping+9

kimuraframework

vifreefly

🧡58

Write web scrapers in Ruby using a clean, AI-assisted DSL. Kimurai uses AI to figure out where the data lives, then caches the selectors and scrapes with pure Ruby. Get the intelligence of an LLM without the per-request latency or token costs.

1.1k

161

MIT

Ruby

Updated 1 week ago

antidetect-browsercamoufoxcrawler+6

llm-data-scrapers

patrickloeber

🧡55

A list of useful Open Source tools and scrapers to gather data for LLMs

248

Updated 1 week ago

oxylabs-ai-studio-js

oxylabs

💛70

MIT

TypeScript

Updated 3 days ago

ai-crawlerai-mapai-scraper+11

graph

neuledge

🧡65

⚡️ Real-time Knowledge Graph for AI Agents. Connect LLMs to verified weather, stock, and currency data via instant tool-calling. No API keys, no scrapers, just grounded facts in <100ms.

Apache-2.0

TypeScript

Updated 1 day ago

ai-agentsgraph-ragknowledge-graph+7

openclaw-skill-pangolinfo

Pangolin-spg

💛70

Real-time Web Scraper Skill for OpenClaw & AI Agents. Fetch live data from Google Search, Amazon, and Walmart using Pangolinfo's AI Mode API. Returns LLM-ready JSON/Markdown.

MIT

Python

Updated 6 days ago

CrawlMind

olddove-laoge

🧡65

AI智能网页数据提取工具 - 基于 LLM，自动分析页面层级结构，智能爬取各类你想要的数据。 AI-Powered Web Scraper - Uses LLM to intelligently analyze page structure and crawls various types of data you want.

JavaScript

Updated 5 days ago

Ares

AndreaBozzo

❤️40

Next-gen AI scraper — LLM-powered structured data extraction

Apache-2.0

Rust

Updated 1 week ago

claude-code-skillsdockerjson-schema+5

dartisan-ai-webscraper

D-artisan

❤️30

An AI-powered web scraper application that leverages free LLM providers to perform intelligent web scraping based on user prompts. The system allows users to input scraping instructions via a user-friendly UI, process web data using LLM APIs, and output results in professional formats 🚀

Python

Updated 6 months ago

scrag

ACM-VIT

❤️20

A flexible web scraper that intelligently adapts to different website structures using multiple extraction strategies (newspaper3k, readability-lxml, BeautifulSoup, and optional headless rendering). It outputs clean, structured data for RAG pipelines or local LLMs, with an optional extension to automatically build RAG indexes from web queries.

MIT

Python

Updated 4 months ago

forktoberhacktoberfesthacktoberfest-accepted+1

scraper-mcp

jessaminesimple608

💛70

🌐 Streamline web scraping with Scraper MCP, a server that optimizes content for AI by filtering data and reducing token usage for LLMs.

MIT

Python

Updated 4 hours ago

agentsaicamoufox+15

open-webui-easysearch

x-hannibal

🧡60

An intelligent, context-aware web search filter for Open WebUI. EasySearch bypasses noisy standard web scrapers, utilizing parallel fetching, structural HTML cleaning, and dynamic context-awareness to feed your LLM only the highest quality data.

MIT

Python

Updated 1 week ago

open-webuiopen-webui-functions

AI-Web-Scraper

soorajInsights

❤️35

The AI Web Scraper is a Python tool that combines web scraping and AI to extract specific data from websites using natural language prompts. It uses Selenium, BeautifulSoup, and LangChain for scraping and parsing, with support for Ollama (LLM) for advanced content processing, handling challenges like captchas and IP bans.

Python

Updated 9 months ago

DevCrawler

devBhas

❤️40

DevCrawler - An LLM Friendly Web Crawler & Data Scraper

MIT

Python

Updated 4 months ago

aidatascrapingllm-training+1

inferencex-scraper

leoshan

🧡55

InferenceX Data Scraper - 从 SemiAnalysis InferenceX 平台自动采集 LLM 推理性能基准数据

Python

Updated 1 week ago

AI-Scraper-Parser

Newton-Maina

❤️45

An AI-powered web scraper that uses Selenium for dynamic content extraction and local LLMs (via Ollama) to parse unstructured HTML into clean, structured data.

Python

Updated 2 months ago

WikiScraper_for_LLM

ntvinh2005

❤️40

Web scraper scraping data from wikipedia page and then store in in smaller chunk of data for later retrieval. These small chunk of text can be used as data to train Large Language Models. It also scrape text a network of related topics surrounded the main topic, help provide sufficient dataset for LLM training quickly and automatically

MIT

JavaScript

Updated 9 months ago

AI-Powered-FDA-Drug-Scraper

Tanguy9862

❤️45

Developed a Python-based web scraper leveraging generative AI with LangChain and GPT-4o-mini to extract and classify FDA drug approval data. Processed over 1,770 records, dynamically categorizing medications and treatment areas using LLMs to simplify complex medical information into actionable insights.

MIT

Python

Updated 2 months ago

data-classificationdata-normalizationgpt-4o-mini+4

web_scrapper

kartiksharma1202

❤️35

A web scraper application integrated with LLMs for processing scraped data.

Python

Updated 10 months ago

langchain-web-scraping

luminati-io

❤️30

How to integrate LangChain with Bright Data's Web Scraper API for efficient web scraping and real-world LLM data enrichment.

Updated 10 months ago

captcha-solvinglangchainpython+5

final_project

gail-mar

🧡65

roject Pipeline LinkedIn Scraper → Job Dataset → EDA → LLM Generator → Streamlit App The project moves from data collection → analysis → automated application generation

Jupyter Notebook

Updated 4 days ago

AI-Powered-Scraper-Bot

Gauravmangate27

❤️40

AI-Powered Scraper Bot is an intelligent web scraping and data processing system that leverages Large Language Models (LLMs) such as GPT.

MIT

Updated 7 months ago

lead-gen-pipeline

Burton-David

❤️30

AI-powered Chamber of Commerce directory scraper using local LLM (Qwen2-7B) for B2B lead generation. Extracts business data with adaptive navigation and structured output.

Python

Updated 4 months ago

aib2bbusiness-intelligence+7

Realtime_Sentiment_News_Analyzer

mr-veyrion

🧡50

Python-based news scraper & sentiment analyzer. Fetches India metro news, uses OpenRouter LLMs for analysis, and serves data via Flask to a Three.js globe UI.

MIT

Python

Updated 1 month ago

WebScraperLLM

Cloudiu9

❤️30

A web scraper designed to extract structured data from a specific site, with options to process the content using an LLM or export it as JSON.

Python

Updated 4 months ago

scraperscrapingweb

auto-web-scraper

NomanAhmed234

❤️35

AI Web Scraper + LLM-Powered Insights is a powerful, open-source Streamlit web application that allows users to scrape data from any website simply by entering its URL.

Python

Updated 6 months ago

beautifulsoupdata-extractionpython-scraper+4

LLM-Web-Scraping

koldo66crack

❤️35

LLM-based RentHop scraper for VeloCity (Columbia startup). Phase 1: hardcoded pagination collects listing URLs. Phase 2: Gemini LLM extracts structured rental data from HTML. Helps students find affordable housing with smart deduplication, multi-API key support, and incremental updates.

HTML

Updated 5 months ago

CittaProject

pphothidaen

🧡50

A Zero-Waste Agentic AI Distributed Architecture. Orchestrating Local LLMs (Mac M4), Enterprise Data Lakes (DS224+), and Legacy Edge Scrapers (ARM) into a unified "Stream of Consciousness." Decoupled, token-efficient, and privacy-first. “Knowledge is not a static state, but a continuous process of arising and ceasing.”

Python

Updated 2 weeks ago

Webscraper-with-LLM

ZenXen7

❤️35

This project is a Python-based webscraper utilizing the Ollama Language Model (LLM) to enhance web scraping capabilities with natural language processing. The scraper efficiently extracts data from websites and uses Ollama’s advanced language model to parse, clean, and analyze the data.

Python

Updated 4 months ago

llmollama-apiwebscraper

daily-digest

mangoon5

❤️35

A news aggregation site that uses a combination of web scraping, an SQL Database, and a LLM to generate summaries of news articles. The web scraper is built using BeautifulSoup and aggregates data from various news outlets. The data is stored in an SQLite3 database.

Python

Updated 8 months ago

GitHub Explorer

Search Results

oxylabs-ai-studio-py

kimuraframework

llm-data-scrapers

oxylabs-ai-studio-js

graph

openclaw-skill-pangolinfo

CrawlMind

Ares

dartisan-ai-webscraper

scrag

scraper-mcp

open-webui-easysearch

AI-Web-Scraper

DevCrawler

inferencex-scraper

AI-Scraper-Parser

WikiScraper_for_LLM

AI-Powered-FDA-Drug-Scraper

web_scrapper

langchain-web-scraping

final_project

AI-Powered-Scraper-Bot

lead-gen-pipeline

Realtime_Sentiment_News_Analyzer

WebScraperLLM

auto-web-scraper

LLM-Web-Scraping

CittaProject

Webscraper-with-LLM

daily-digest

oxylabs-ai-studio-py

kimuraframework

llm-data-scrapers

oxylabs-ai-studio-js

graph

openclaw-skill-pangolinfo

CrawlMind

Ares

dartisan-ai-webscraper

scrag

scraper-mcp

open-webui-easysearch

AI-Web-Scraper

DevCrawler

inferencex-scraper

AI-Scraper-Parser

WikiScraper_for_LLM

AI-Powered-FDA-Drug-Scraper

web_scrapper

langchain-web-scraping

final_project

AI-Powered-Scraper-Bot

lead-gen-pipeline

Realtime_Sentiment_News_Analyzer

WebScraperLLM

auto-web-scraper

LLM-Web-Scraping

CittaProject

Webscraper-with-LLM

daily-digest