Found 110,376 repositories(showing 30)
scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
D4Vinci
π·οΈ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
apify
CrawleeβA web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
dzhng
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.
getmaxun
π₯ The open-source no-code platform for web scraping, crawling, search and AI data extraction β’ Turn websites into structured APIs in minutes π₯
clips
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
apify
CrawleeβA web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
lorien
List of libraries, tools and APIs for web scraping and data processing.
go-rod
A Chrome DevTools Protocol driver for web automation and scraping.
firecrawl
π₯ Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
MontFerret
Declarative web scraping
adbar
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
niespodd
Analysis of Bot Protection systems with available countermeasures πΏ. How to defeat anti-bot system π» and get around browser fingerprinting scripts π΅οΈββοΈ when scraping the web?
REMitchell
Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do
dotnetcore
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
geziyor
Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
oxylabs
Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.
lorien
Web Scraping Framework
emadehsan
Getting started with Puppeteer and Chrome Headless for Web Scraping
AnotiaWang
(Supports DeepSeek R1) An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models.
firecrawl
π₯ Visual workflow builder for AI agents powered by Firecrawl - drag-and-drop web scraping pipelines with real-time execution
howie6879
Async Python 3.6+ web scraping micro-framework based on asyncio
TheWebScrapingClub
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
Ge0rg3
A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
propublica
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
josh0xA
Open Source Intelligence Interface for Deep Web Scraping
tidyverse
Simple web scraping for R
yhat
A simple, higher level interface for Go web scraping.
ulixee
The web browser built for scraping