Search Results

Found 406 repositories(showing 30)

maxun

getmaxun

💚98

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

15.3k

1.3k

AGPL-3.0

TypeScript

Updated 11 hours ago

agentsapiautomation+15

calibre-douban

fugary

💛72

Calibre new douban metadata source plugin. Douban no longer provides book APIs to the public, so it can only use web crawling to obtain data. This is a calibre Douban plugin based on web crawling.

1.3k

Apache-2.0

Python

Updated 2 days ago

webclaw

0xMassi

💛71

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

461

AGPL-3.0

Rust

Updated 1 hour ago

aiai-agentsai-scraping+17

InstaCrawlR

JonasSchroeder

❤️31

Crawl public Instagram data using R scripts without API access token. See InstaCrawlR Instructions.pdf

122

Updated 1 year ago

hashtag-scraperinstagramr+2

scrapegraph-py

ScrapeGraphAI

🧡65

Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction

MIT

Jupyter Notebook

Updated 21 hours ago

apijson-schemapython+8

apiestas

franloza

❤️30

Backend, modern REST API for obtaining match and odds data crawled from multiple sites. Using FastAPI, MongoDB as database, Motor as async MongoDB client, Scrapy as crawler and Docker.

MIT

Python

Updated 1 year ago

openverse-catalog

WordPress

❤️46

Identifies and collects data on cc-licensed content across web crawl data and public apis.

MIT

Python

Updated 2 months ago

airflowapache-airflowcreative-commons+6

justoneapi Data API Services. We provide APIs for: Xiaohongshu, Red, Redbook, Rednote, Taobao, JD.com, Douyin (E-commerce), Douyin (Videos), Kuaishou, Pugongying, Xingtu, WeChat Official Accounts, Dianping, Bilibili, Zhihu, Weibo, Beike, Bigo, Temu, Lazada, SHEIN、Shopee, Baidu Index, Boss Zhipin, Zhaopin, Lagou, Toutiao, Facebook

Updated 1 hour ago

apicrawlingdataapi+8

FALCON---AI-Data-Crawler

gurtejrehal

❤️30

Falcon Search has been created to aid the National Crime Records Bureau keeping in mind the need for an efficient AI data crawler that collects classified data from the web based on given keywords. It is a SaaS web data integration (WDI) platform which converts unstructured web data into structured format by extracting, preparing and integrating web data in areas of crime for consumption in criminal investigation agencies. Falcon provides a visual environment for automating the workflow of extracting and transforming web data. After specifying the target website url, the web data extraction module provides a visual environment for designing automated workflows for harvesting data, going beyond HTML/XML parsing of static content to automate end user interactions yielding data that would otherwise not be immediately visible. Once extracted, the software provides full data preparation capabilities that are used for harmonizing and cleansing the web data. For consuming the results, Falcon provides several options. It has its own visualization and dashboarding module to help criminal investigators gain the insights that they need. It also provides APIs that offer full access to everything that can be done on our platform, allowing web data to be integrated directly. FALCON is capable of crawling ten million links and scrape one million links per month using Celery Worker. It moreover has the potential of outperforming this number if tested under standard cloud platforms.

JavaScript

Updated 1 month ago

anilist-crawler

soruly

🧡65

Crawl data from anilist API and store as JSON file

MIT

TypeScript

Updated 7 hours ago

anilistanimecrawler

stock_crawler

jincheng9

❤️35

R API for Crawling Stock and Index Data from Sina Finance

Updated 2 years ago

beautiful-dom

ChukwuEmekaAjah

❤️25

A JavaScript library that models essential HTML DOM API methods and properties relevant for extracting data from crawled web pages or XML documents

MIT

HTML

Updated 3 years ago

streettraffic

❤️40

StreetTraffic is a Python package that crawls the traffic flow data of your favorite routes, cities by using the API provided by HERE.com

MIT

Python

Updated 1 year ago

supa-crawl-chat

bigsk1

🧡50

Integrates Supabase with Crawl4AI and AI Chat to create a powerful web crawling and semantic search solution. Streamlit supabase data visualization. Run all in Docker. API and more!

MIT

Python

Updated 2 months ago

crawl4aicrawlerdocker+9

deep-research-mcp

pinkpixel-dev

🧡50

A Model Context Protocol (MCP) compliant server designed for comprehensive web research. It uses Tavily's Search and Crawl APIs to gather detailed information on a given topic, then structures this data in a format perfect for LLMs to create high-quality markdown documents.

MIT

JavaScript

Updated 1 week ago

ai-toolsdata-aggregationdeep-research+15

Relation-extraction-using-Semantic-Web

ekalgolas

❤️35

We will process unstructured data from web (obtained by crawling some sample websites) by maybe: having a Apache SolR installation locally and manually feeding it web pages. We can use Stanford NLP API or Metamind API to extract semantics from the unstructured text. After we extract some semantics, we can construct a structured data format, probably RDF/XML/OWL and also have a visual representation of the graph of the data using Gruff

Java

Updated 2 years ago

ReSearch

apigate-in

❤️40

ReSearch is a web crawler and search engine that crawls a predefined set of domains, indexes the content of the pages, and provides a search API to query the indexed data.

MIT

Updated 3 months ago

lyzr-crawl

LyzrCore

🧡50

Lyzr Crawl is a high-performance web crawling API built with Go that enables developers to extract content and discover URLs from websites at scale. Part of the Lyzr.ai ecosystem, it provides a robust solution for web data extraction with real-time progress monitoring.

MIT

Updated 1 month ago

weiboBlogCommentsCrawl

by-oneself

❤️45

这是一个使用Python编写的微博评论爬虫脚本，使用微博API从微博中抓取评论数据。脚本从MySQL数据库中获取博文数据，然后针对每篇博文获取评论并将其保存回数据库。This is a Python script for crawling comments from Weibo using the Weibo API. The script retrieves blog data from a MySQL database and then retrieves comments for each blog post and saves them back to the database.

Python

Updated 1 month ago

firecrawl-ui

obeone

❤️40

Firecrawl UI interact with the Firecrawl API. It allows you to scrape pages, launch crawls and extract structured data through a simple web interface.

MIT

Vue

Updated 2 weeks ago

firecrawlfirecrawl-apifirecrawl-ui

crawlsocmed

nurainir

❤️35

simple scripts for crawling data from a social media without its API

GPL-3.0

Shell

Updated 3 years ago

How-to-Scrape-Google-Scholar

datacollectionspecialist

🧡55

In this article, we will introduce two methods for crawling Google Scholar data: manual crawling (Scrapy/Selenium) and Scrapeless API.

Updated 2 weeks ago

googlescrapergooglescrapingapipython+3

fragmenty

Maders

❤️30

an infrastructure for crawling, exposing api and visualizing Fragment.com/numbers data

HCL

Updated 4 months ago

awsaws-ec2aws-ecs+8

GithubCrawler

yang1young

❤️35

Crawl github data using API and no-API

Python

Updated 4 years ago

apicrawlergithub+1

SEO-Optimiser

aymenhmaidiwastaken

🧡65

Self-contained SEO analysis tool with a web dashboard. Crawl any website, analyze 8 categories (technical, on-page, content, structured data, performance, security, accessibility, links), get a 0-100 score, and receive ready-to-use fixes — no external APIs needed.

Python

Updated 5 days ago

performance-analysispythonseo+5

just-scrape

ScrapeGraphAI

🧡50

CLI for AI-powered web scraping, data extraction, search, and crawling powered by the ScrapeGraph AI API. Supports smart scraping, agentic browser automation, markdownify, sitemap discovery, and JSON mode for piping to AI agents.

MIT

TypeScript

Updated 2 weeks ago

aiai-agentsbrowser-automation+17

mma-prediction

sidhenriksen

❤️35

Repo contains an API for pulling data from Fightmetric, a crawler to crawl the website and put the data into an SQLite database, and a predictive model which attempts to predict fight outcomes based on fighter stats

Python

Updated 8 months ago

Hubs

Hubs-App

❤️40

Hubs is a content crawler application on Android. It provides apis to crawl web content and display data.

GPL-3.0

Java

Updated 4 years ago

firecrawl

EndlessInternational

❤️35

The Firecrawl gem implements a lightweight interface to the Firecrawl.dev API which takes a URL, crawls it and returns html, markdown, or structured data. It is of particular value when used with LLM's for grounding.

MIT

Ruby

Updated 1 month ago

Marketing-through-Facebook

linkRachit

❤️25

Analyzed the different ways of Marketing through Facebook, by crawling and using Facebook APIs to fetch data about Sponsored Pages, Posts and Redirection to WebSites. A web application was developed containing sub sections, analyzing each type of Advertisement - Pages, Posts, Site Redirection, along with a comparison for Pages and Posts. The goal was to report whether the advertisement was favorable or not, based on the available data.

HTML

Updated 2 years ago

apiboostrapdjango+6

GitHub Explorer

Search Results

maxun

calibre-douban

webclaw

InstaCrawlR

scrapegraph-py

apiestas

openverse-catalog

crawl-data-api

FALCON---AI-Data-Crawler

anilist-crawler

stock_crawler

beautiful-dom

streettraffic

supa-crawl-chat

deep-research-mcp

Relation-extraction-using-Semantic-Web

ReSearch

lyzr-crawl

weiboBlogCommentsCrawl

firecrawl-ui

crawlsocmed

How-to-Scrape-Google-Scholar

fragmenty

GithubCrawler

SEO-Optimiser

just-scrape

mma-prediction

Hubs

firecrawl

Marketing-through-Facebook

maxun

calibre-douban

webclaw

InstaCrawlR

scrapegraph-py

apiestas

openverse-catalog

crawl-data-api

FALCON---AI-Data-Crawler

anilist-crawler

stock_crawler

beautiful-dom

streettraffic

supa-crawl-chat

deep-research-mcp

Relation-extraction-using-Semantic-Web

ReSearch

lyzr-crawl

weiboBlogCommentsCrawl

firecrawl-ui

crawlsocmed

How-to-Scrape-Google-Scholar

fragmenty

GithubCrawler

SEO-Optimiser

just-scrape

mma-prediction

Hubs

firecrawl

Marketing-through-Facebook