Found 33,233 repositories(showing 30)
firecrawl
🔥 The Web Data API for AI - Power AI agents with clean web data
unclecode
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
NaiboWang
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
ScrapeGraphAI
Python scraper based on AI
apify
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
binux
A Powerful Spider(Web Crawler) System in Python.
shengqiangzhang
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
crawlab-team
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
code4craft
A scalable web crawler framework for Java.
ssssssss-team
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
apify
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
BruceDone
A collection of awesome web crawler,spider in different languages
adithya-s-k
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
bda-research
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
firecrawl
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
hakluke
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
yasserg
Open Source Web Crawler for Java
hiddendevj
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。
bitmagnet-io
A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
jasonxtn
The Ultimate Information Gathering Toolkit
internetarchive
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
apache
Apache Nutch is an extensible and scalable web crawler
CrawlScript
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Qianlitp
A powerful browser crawler for web vulnerability scanners
oxylabs
Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural language prompt.
oxylabs
Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.
xtuhcy
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
fhamborg
news-please - an integrated web crawler and information extractor for news that just works
spider-rs
Web crawler and scraper for Rust
sjdirect
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.