GoncaloMarquesSerrano/WebCrawler - GitHub Explorer | GitHub Explorer | Trending

Stars

0

Forks

0

Watchers

0

Open Issues

0

Repository Health Score

💛

70/100

Good

Overall repository health assessment

Score Breakdown

Activity

Active development - updated this week

30/30

100%

Recent Commits

fix: increased empty attempt number to prevent workers leaving early

gs•3 days ago

b4dbb04View on GitHub

perf: limit concurrent playwright pages with semaphore to reduce memory usage

gs•3 days ago

0d76f92View on GitHub

feat: per-domain rate limiting respecting robots crawl-delay

gs•3 days ago

da1425fView on GitHub

fix: prevent concurrent robots.txt fetches with per-domain locking

gs•3 days ago

a36cd9dView on GitHub

refactor: introduce producer/asyncio.Queue to eliminate deadlocks on concurrent inserts

gs•3 days ago

c75a88aView on GitHub

perf: batch insert links to reduce db roundtrips

gs•4 days ago

06ab217View on GitHub

perf: batch insert links to reduce db roundtrips

gs•4 days ago

6daa207View on GitHub

perf: increase connection pool size and add httpx limits for high concurrency

gs•4 days ago

c8ee1deView on GitHub

chore: removed unused get_crawl_delay function

gs•4 days ago

7df2e00View on GitHub

fix: updated javascript rendered page heuristics

gs•4 days ago

7b6541bView on GitHub

fix: removed excel export option when datasets get to large

gs•4 days ago

c8e1d1bView on GitHub

Add MIT License to the project

Gonçalo Serrano•4 days ago

5591136View on GitHub

feat: add finished_at field when crawl job ends

gs•4 days ago

6efdbc7View on GitHub

fix: resolve illegal character exception duting export

gs•4 days ago

2594554View on GitHub

docs: update README

gs•5 days ago

9e4d7f1View on GitHub

View all commits

GitHub Explorer

WebCrawler

Score Breakdown

Issues Activity: Last 6 months

Hottest Issues