Found 406 repositories(showing 30)
getmaxun
🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥
fugary
Calibre new douban metadata source plugin. Douban no longer provides book APIs to the public, so it can only use web crawling to obtain data. This is a calibre Douban plugin based on web crawling.
0xMassi
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
JonasSchroeder
Crawl public Instagram data using R scripts without API access token. See InstaCrawlR Instructions.pdf
ScrapeGraphAI
Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction
franloza
Backend, modern REST API for obtaining match and odds data crawled from multiple sites. Using FastAPI, MongoDB as database, Motor as async MongoDB client, Scrapy as crawler and Docker.
WordPress
Identifies and collects data on cc-licensed content across web crawl data and public apis.
justoneapi
justoneapi Data API Services. We provide APIs for: Xiaohongshu, Red, Redbook, Rednote, Taobao, JD.com, Douyin (E-commerce), Douyin (Videos), Kuaishou, Pugongying, Xingtu, WeChat Official Accounts, Dianping, Bilibili, Zhihu, Weibo, Beike, Bigo, Temu, Lazada, SHEIN、Shopee, Baidu Index, Boss Zhipin, Zhaopin, Lagou, Toutiao, Facebook
gurtejrehal
Falcon Search has been created to aid the National Crime Records Bureau keeping in mind the need for an efficient AI data crawler that collects classified data from the web based on given keywords. It is a SaaS web data integration (WDI) platform which converts unstructured web data into structured format by extracting, preparing and integrating web data in areas of crime for consumption in criminal investigation agencies. Falcon provides a visual environment for automating the workflow of extracting and transforming web data. After specifying the target website url, the web data extraction module provides a visual environment for designing automated workflows for harvesting data, going beyond HTML/XML parsing of static content to automate end user interactions yielding data that would otherwise not be immediately visible. Once extracted, the software provides full data preparation capabilities that are used for harmonizing and cleansing the web data. For consuming the results, Falcon provides several options. It has its own visualization and dashboarding module to help criminal investigators gain the insights that they need. It also provides APIs that offer full access to everything that can be done on our platform, allowing web data to be integrated directly. FALCON is capable of crawling ten million links and scrape one million links per month using Celery Worker. It moreover has the potential of outperforming this number if tested under standard cloud platforms.
soruly
Crawl data from anilist API and store as JSON file
jincheng9
R API for Crawling Stock and Index Data from Sina Finance
ChukwuEmekaAjah
A JavaScript library that models essential HTML DOM API methods and properties relevant for extracting data from crawled web pages or XML documents
streettraffic
StreetTraffic is a Python package that crawls the traffic flow data of your favorite routes, cities by using the API provided by HERE.com
bigsk1
Integrates Supabase with Crawl4AI and AI Chat to create a powerful web crawling and semantic search solution. Streamlit supabase data visualization. Run all in Docker. API and more!
pinkpixel-dev
A Model Context Protocol (MCP) compliant server designed for comprehensive web research. It uses Tavily's Search and Crawl APIs to gather detailed information on a given topic, then structures this data in a format perfect for LLMs to create high-quality markdown documents.
We will process unstructured data from web (obtained by crawling some sample websites) by maybe: having a Apache SolR installation locally and manually feeding it web pages. We can use Stanford NLP API or Metamind API to extract semantics from the unstructured text. After we extract some semantics, we can construct a structured data format, probably RDF/XML/OWL and also have a visual representation of the graph of the data using Gruff
apigate-in
ReSearch is a web crawler and search engine that crawls a predefined set of domains, indexes the content of the pages, and provides a search API to query the indexed data.
LyzrCore
Lyzr Crawl is a high-performance web crawling API built with Go that enables developers to extract content and discover URLs from websites at scale. Part of the Lyzr.ai ecosystem, it provides a robust solution for web data extraction with real-time progress monitoring.
by-oneself
这是一个使用Python编写的微博评论爬虫脚本,使用微博API从微博中抓取评论数据。脚本从MySQL数据库中获取博文数据,然后针对每篇博文获取评论并将其保存回数据库。This is a Python script for crawling comments from Weibo using the Weibo API. The script retrieves blog data from a MySQL database and then retrieves comments for each blog post and saves them back to the database.
obeone
Firecrawl UI interact with the Firecrawl API. It allows you to scrape pages, launch crawls and extract structured data through a simple web interface.
nurainir
simple scripts for crawling data from a social media without its API
datacollectionspecialist
In this article, we will introduce two methods for crawling Google Scholar data: manual crawling (Scrapy/Selenium) and Scrapeless API.
Maders
an infrastructure for crawling, exposing api and visualizing Fragment.com/numbers data
yang1young
Crawl github data using API and no-API
aymenhmaidiwastaken
Self-contained SEO analysis tool with a web dashboard. Crawl any website, analyze 8 categories (technical, on-page, content, structured data, performance, security, accessibility, links), get a 0-100 score, and receive ready-to-use fixes — no external APIs needed.
ScrapeGraphAI
CLI for AI-powered web scraping, data extraction, search, and crawling powered by the ScrapeGraph AI API. Supports smart scraping, agentic browser automation, markdownify, sitemap discovery, and JSON mode for piping to AI agents.
sidhenriksen
Repo contains an API for pulling data from Fightmetric, a crawler to crawl the website and put the data into an SQLite database, and a predictive model which attempts to predict fight outcomes based on fighter stats
Hubs-App
Hubs is a content crawler application on Android. It provides apis to crawl web content and display data.
EndlessInternational
The Firecrawl gem implements a lightweight interface to the Firecrawl.dev API which takes a URL, crawls it and returns html, markdown, or structured data. It is of particular value when used with LLM's for grounding.
linkRachit
Analyzed the different ways of Marketing through Facebook, by crawling and using Facebook APIs to fetch data about Sponsored Pages, Posts and Redirection to WebSites. A web application was developed containing sub sections, analyzing each type of Advertisement - Pages, Posts, Site Redirection, along with a comparison for Pages and Posts. The goal was to report whether the advertisement was favorable or not, based on the available data.