Found 37 repositories(showing 30)
apache
A scalable, mature and versatile web crawler based on Apache Storm
commoncrawl
News crawling with StormCrawler - stores content as WARC
DigitalPebble
Crawl configurations for benchmarking / testing StormCrawler
DigitalPebble
Resources for running StormCrawler with Docker services
sebastian-nagel
Process web archives (WARC format) with StormCrawler and index content into OpenSearch
DigitalPebble
Ansible playbook for deploying a Storm cluster
apache
Source for the Apache StormCrawler web site
anveshv18
StormCrawler Documentation
cnf271
Stormcrawler with Elasticsearch
HPI-BP2017N2
Based on Stormcrawler to crawl a list of domains and hand the pages to a data store
desp0916
Learning StormCrawler
DigitalPebble
WARC resources for StormCrawler
cnf271
StormCrawler with SQS
jzonthemtn
StormCrawler and OpenSearch
Solr with hybrid retrieval (BM25 + vectors) from Chorus + Learning-to-Rank (LTR) second-pass re-rank (via Hello-LTR configs) + (Optional) a crawler (StormCrawler) that feeds Solr
ivanandrejic
No description available
nvshik
No description available
Tiago4k
Stormcrawler connected to Elasticsearch
jwaxbny
Learn StormCrawler ElasticSearch
NeonNobleTech9
Test-Project with Stormcrawler
jordillachmrf
No description available
omisolaidowu
No description available
idowupremz
No description available
sharon-yuan
using storm topology to finish web crawler
mojtaba-elephant-maker
No description available
ZenRows
No description available
an-snatcher
No description available
zhang-jian-19400
cratch the data of http://weather.unisys.com/hurricane/atlantic/index.php
DuskoPre
No description available
kanwarkakkar
Modified Storm-Crawler ES