Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Stars
8.7k
Forks
703
Watchers
8.7k
Open Issues
82
Overall repository health assessment
No package.json found
This might not be a Node.js project
305
commits
282
commits
174
commits
120
commits
105
commits
49
commits
39
commits
24
commits
7
commits
3
commits
chore(release): Update changelog and package version [skip ci]
cf6737cView on GitHubfix: Apply SQLite optimizations to the custom `connection_string` in `SqlStorageClient` (#1837)
8b53e27View on GitHubtest: Fix flaky event manager tests by replacing sleep with wait (#1830)
2c691d0View on GitHubchore(release): Update changelog and package version [skip ci]
d5715f3View on GitHubfix: Prevent premature `EventManager` shutdown when multiple crawlers share it (#1810)
2efb668View on GitHubdocs: Fix version switching for API reference pages (#1823)
f6be00bView on GitHubtest: Increase sleep tolerance in request_max_duration test for Windows CI (#1827)
e2d7069View on GitHubchore(release): Update changelog and package version [skip ci]
41c21b7View on GitHubfix(file-system): Reclaim orphaned in-progress requests on RQ recovery (#1825)
e86794aView on GitHubchore(deps): update rhysd/actionlint action to v1.7.12 (#1826)
a3b635eView on GitHubchore(release): Update changelog and package version [skip ci]
286fa3aView on GitHub