Found 28,259 repositories(showing 30)
TheWebScrapingClub
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
scrapy
Scrapy project to scrape public web directories (educational) [DEPRECATED]
istresearch
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
This project is a Python script that scrapes data from a Gumroad site, generates a colorful and well-designed HTML page using OpenAI's GPT-4 model, and deploys the generated page to Vercel.
essamamdani
This project provides a powerful web scraping tool that fetches search results and converts them into Markdown format using FastAPI, SearXNG, and Browserless. It includes the capability to use proxies for web scraping and handles HTML content conversion to Markdown efficiently.
ading2210
A Python script to scrape OpenAI API keys that are exposed on public Replit projects.
sakship31
Django project to scrape a news website using Beautiful soup and display in our template.
imon333
GitHub Project: AI Job Application Automation 🚀 This project automates job searching, CV creation, and applications using Python, n8n, Selenium, and OpenAI. It scrapes LinkedIn, Indeed, StepStone, generates a custom CV & cover letter, and auto-applies to jobs. Integrated with Google Sheets/Airtable & Email alerts.
rmax
Scrapy project based on dirbot to show how to use Twisted's adbapi to store the scraped data in MySQL.
theailifestyle
This project is a Streamlit-based web application that leverages OpenAI's Assistants API to provide a ChatGPT-like experience. Users can have real-time conversations with the AI, upload documents to be used as context, and even scrape and convert website content to PDFs to enrich the AI's knowledge base.
lvgalvao
No description available
Viveckh
A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis and machine learning to predict housing prices in New York Tri-State Area.
ejbills
OpenArtemis is a privacy-focused web scraping Reddit frontend built with SwiftUI, that also operates as an open-source project.
jcwill415
Scrape, analyze & visualize stock market data for the S&P500 using Python. Build a basic trading strategy using machine learning to assess company performance and determine buy, sell, hold. Read me & instructions available in Spanish. This is a working repo, with plans to expand the project from technical analysis to fundamental analysis.
billimarie
Civic Tech & Data AI For Good project. Tracks prosecutor election messaging, mass incarceration indicators, and historical context. Utilizes Agentic AI for data scraping.
jimpick
Demo project showing how to create a simple web scraping service using AWS Lambda and API Gateway
linux-scraping
grsecurity is the most advanced Linux kernel hardening patchset. This repository, not affiliated with the upstream project, aggregate most available grsecurity patches applied to consistent Linux source trees. Check the signatures with https://github.com/linux-scraping/verify-sig
hazemabdelkawy
SunnahGPT is a natural language processing (NLP) project aimed at scraping hadith data from the popular website sunnah.com and applying OpenAI's GPT-3.5 model to generate textual embeddings for each hadith
superryeti
This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to websites using advanced protection.
casper-hansen
Web scraping Reddit without using Reddit API, and making a dataset, and using the dataset for a machine learning project.
israel-dryer
A collection of web scraping projects to practice your skills or build a portfolio
Patotricks15
Repository for my data science projects (Web scraping and automation + exploratory data analysis + machine learning + recommendation system)
EseToni
An open-source Python library that provides programmatic access to LinkedIn using web scraping. Inspired by the now-private linkedin-api, this project aims to offer a community-driven alternative for authentication, profile searches, messaging, and data extraction—ensuring an accessible and regularly updated LinkedIn integration. 🚀
Y0oshi
Project Eyes On is a high-speed, multi-threaded surveillance tool by Y0oshi (@rde0) for locating open IP cameras worldwide. Unifies Google Dorking and Directory Scraping into a single OSINT engine.
danielsaban
Data Engineering/Scraping Project. Creating a detailed Sports Relational Database for the Top European Soccer Leagues.
Echobob
All projects in the topic awesome scraped by See Urchin and post-processed by Firm.Watch AI/ML Filters
sintaxi
a screen scraping nhl.com api (this project was created as a demonstration)
livgust
Open-source project using Nodejs and Puppeteer to scrape websites for COVID vaccine availability in Massachusetts. Can be modified to suit other areas and needs.
tobim-dev
This project implements a REST API of the Cookidoo® website. For example, you can retrieve recipe information for a specific recipe or information for all recipes on your weekly schedule. To get the information, the Cookidoo® website is scraped.
abbas99-hub
This repository contains the code and instructions to build a job recommendation system using machine learning. The system is designed to provide personalized job recommendations based on user preferences and historical job data. The data for this project is scraped from Glassdoor, and the system is deployed using the Azure cloud platform.