Search Results

Found 33,233 repositories(showing 30)

firecrawl

💚95

🔥 The Web Data API for AI - Power AI agents with clean web data

105.1k

6.9k

AGPL-3.0

TypeScript

Updated 7 minutes ago

aiai-agentsai-crawler+16

crawl4ai

unclecode

💚100

🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

63.5k

6.5k

Apache-2.0

Python

Updated 22 minutes ago

EasySpider

NaiboWang

💚100

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。

44.1k

5.4k

AGPL-3.0

JavaScript

Updated 2 minutes ago

batch-processingbatch-scriptcode-free+17

Scrapegraph-ai

ScrapeGraphAI

💚95

Python scraper based on AI

23.2k

2.0k

MIT

Python

Updated 1 hour ago

ai-crawlerai-scrapingai-search+17

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

22.7k

1.3k

Apache-2.0

TypeScript

Updated 1 hour ago

apifyautomationcrawler+14

pyspider

binux

💚95

A Powerful Spider(Web Crawler) System in Python.

17.0k

3.7k

Apache-2.0

Python

Updated 14 hours ago

crawlerpython

examples-of-web-crawlers

shengqiangzhang

💚100

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

14.6k

3.8k

MIT

HTML

Updated 3 hours ago

agent-poolcrawlerexample+12

crawlab

crawlab-team

💚97

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

12.2k

1.9k

BSD-3-Clause

Updated 6 hours ago

crawlabcrawlercrawling-tasks+10

webmagic

code4craft

💚97

A scalable web crawler framework for Java.

11.7k

4.2k

Apache-2.0

Java

Updated 22 hours ago

crawlerframeworkjava+1

spider-flow

ssssssss-team

💚96

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

11.3k

2.2k

MIT

Java

Updated 12 hours ago

crawlerjsoupspider+6

crawlee-python

apify

💛81

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

8.7k

702

Apache-2.0

Python

Updated 19 hours ago

apifyautomationbeautifulsoup+14

awesome-crawler

BruceDone

💛85

A collection of awesome web crawler,spider in different languages

7.2k

746

MIT

Updated 11 hours ago

awesomecrawlernode-crawler+4

omniparse

adithya-s-k

💛77

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

6.8k

539

GPL-3.0

Python

Updated 16 hours ago

ingestion-apiocromniparser+5

node-crawler

bda-research

💛86

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

6.8k

873

MIT

TypeScript

Updated 1 day ago

cheeriocrawlerextract-data+4

firecrawl-mcp-server

firecrawl

💛83

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

6.0k

670

MIT

JavaScript

Updated 1 hour ago

batch-processingclaudecontent-extraction+11

hakrawler

hakluke

💛75

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

5.0k

539

GPL-3.0

Updated 8 minutes ago

bugbountycrawlinghacking+4

crawler4j

yasserg

💚90

Open Source Web Crawler for Java

4.6k

1.9k

Apache-2.0

Java

Updated 12 hours ago

Crawler_Illegal_Cases_In_China

hiddendevj

💛73

Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律，避免触碰数据合规红线。

4.6k

315

HTML

Updated 13 hours ago

chinacrawlerlaw

bitmagnet

bitmagnet-io

💛76

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.

3.9k

226

MIT

Updated 12 hours ago

bittorrentdhtprowlarr+7

Argus

jasonxtn

💛78

The Ultimate Information Gathering Toolkit

3.4k

410

MIT

Python

Updated 18 hours ago

cms-detectiondirectory-finderdns-lookup+12

heritrix3

internetarchive

💛81

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

3.2k

782

NOASSERTION

Java

Updated 12 hours ago

heritrixjavawarc+1

nutch

apache

💛81

Apache Nutch is an extensible and scalable web crawler

3.1k

1.3k

Apache-2.0

Java

Updated 12 hours ago

apachecrawlinghadoop+3

WebCollector

CrawlScript

💛87

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

3.1k

1.4k

GPL-3.0

Java

Updated 2 days ago

crawlergo

Qianlitp

💛78

A powerful browser crawler for web vulnerability scanners

3.0k

499

GPL-3.0

Updated 4 days ago

arsenalblackhatchrome-devtools+8

ai-crawler-py

oxylabs

🧡63

Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural language prompt.

2.8k

Updated 46 minutes ago

aiai-agentsai-crawler+5

oxylabs-ai-studio-py

oxylabs

💛73

Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.

2.7k

MIT

Python

Updated 1 hour ago

ai-crawlerai-scraperai-scraping+9

gecco

xtuhcy

💛81

Easy to use lightweight web crawler（易用的轻量化网络爬虫）

2.5k

877

MIT

Java

Updated 12 hours ago

crawlerdynamicfastjson+3

news-please

fhamborg

💛77

news-please - an integrated web crawler and information extractor for news that just works

2.4k

450

Apache-2.0

Python

Updated 20 hours ago

cc-newsccnewscommoncrawl+17

spider

spider-rs

🧡69

Web crawler and scraper for Rust

2.4k

194

MIT

Rust

Updated 1 hour ago

ai-agentautomationcrawler+6

abot

sjdirect

💛78

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

2.3k

554

Apache-2.0

Updated 3 days ago

abotabot-nugetc-sharp+17

GitHub Explorer

Search Results

firecrawl

crawl4ai

EasySpider

Scrapegraph-ai

crawlee

pyspider

examples-of-web-crawlers

crawlab

webmagic

spider-flow

crawlee-python

awesome-crawler

omniparse

node-crawler

firecrawl-mcp-server

hakrawler

crawler4j

Crawler_Illegal_Cases_In_China

bitmagnet

Argus

heritrix3

nutch

WebCollector

crawlergo

ai-crawler-py

oxylabs-ai-studio-py

gecco

news-please

spider

abot

firecrawl

crawl4ai

EasySpider

Scrapegraph-ai

crawlee

pyspider

examples-of-web-crawlers

crawlab

webmagic

spider-flow

crawlee-python

awesome-crawler

omniparse

node-crawler

firecrawl-mcp-server

hakrawler

crawler4j

Crawler_Illegal_Cases_In_China

bitmagnet

Argus

heritrix3

nutch

WebCollector

crawlergo

ai-crawler-py

oxylabs-ai-studio-py

gecco

news-please

spider

abot