Async web crawler built in Python — multi-worker, robots.txt compliant, JS-aware. Stores crawl data in PostgreSQL and ships with a CLI and a Streamlit dashboard for inspection and export.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
48
commits
No package.json found
This might not be a Node.js project
fix: increased empty attempt number to prevent workers leaving early
b4dbb04View on GitHubperf: limit concurrent playwright pages with semaphore to reduce memory usage
0d76f92View on GitHubfix: prevent concurrent robots.txt fetches with per-domain locking
a36cd9dView on GitHubrefactor: introduce producer/asyncio.Queue to eliminate deadlocks on concurrent inserts
c75a88aView on GitHubperf: increase connection pool size and add httpx limits for high concurrency
c8ee1deView on GitHub