Search Results

Found 1,050 repositories(showing 30)

upton

propublica

🧡58

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

1.6k

110

MIT

HTML

Updated 4 weeks ago

mlscraper

lorey

🧡67

🤖 Scrape data from HTML websites automatically by just providing examples

1.4k

Python

Updated 1 day ago

crawlercrawler-pythoncrawling+5

A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.

262

BSD-2-Clause

JavaScript

Updated 1 month ago

chrome-extensionjavascriptprogramming-by-example+2

Scrapeasy

joelbarmettlerUZH

❤️41

Scraping in python made easy - receive the content you like in just one line of code

105

MIT

Python

Updated 4 months ago

downlaodlibrarypip+4

OnlyTheRecipe

jpbulman

🧡50

Recipe websites have too much clutter, this scrapes *just* the recipe

TypeScript

Updated 17 hours ago

cheeriojsfoodnextjs+3

tweet-2-json

cosmocatalano

❤️30

A PHP script that scrapes Twitter.com and returns nicely-formatted JSON—just like an API

PHP

Updated 1 year ago

Youtube_Scraper

CriticalHunter

🧡65

Scrape data about an entire Channel or just a Playlist, or get stats about your Own Watch History.

MIT

Python

Updated 6 days ago

crawlerpythonpython3+4

Python-for-Data-Science-and-Machine-Learning-Bootcamp

abhiwalia15

❤️35

program with Python, how to create amazing data visualizations, and how to use Machine Learning with Python! Here a just a few of the topics we will be learning: Programming with Python NumPy with Python Using pandas Data Frames to solve complex tasks Use pandas to handle Excel Files Web scraping with python Connect Python to SQL Use matplotlib and seaborn for data visualizations Use plotly for interactive visualizations Machine Learning with SciKit Learn, including: Linear Regression K Nearest Neighbors K Means Clustering Decision Trees Random Forests Natural Language Processing Neural Nets and Deep Learning Support Vector Machines and much, much more!

Jupyter Notebook

Updated 4 months ago

chatgpt-scraper-api

ScrapingBee

🧡65

Collect structured responses from a ChatGPT scraper by sending a prompt with valid ChatGPT scraping API credentials. Enable live search, inject HTML context, and automate intelligent scraper ChatGPT workflows in just a few parameters.

Updated 9 hours ago

ai-scraperapichat-gpt+16

spider

johnnagro

❤️40

Spider is a Web spidering library for Ruby. It handles the robots.txt, scraping, collecting, and looping so that you can just handle the data.

MIT

Ruby

Updated 5 months ago

FLASK-End-to-end-Zomato-Restaurant-Price-Prediction-and-Deployment

MrBriit

🧡55

# **ABSTRACT** Main Objective: The main agenda of this project is: Perform extensive Exploratory Data Analysis(EDA) on the Zomato Dataset. Build an appropriate Machine Learning Model that will help various Zomato Restaurants to predict their respective Ratings based on certain features DEPLOY the Machine learning model via Flask that can be used to make live predictions of restaurants ratings A step by step guide is attached to this documnet as well as a video explanation of each concpet. Zomato is one of the best online food delivery apps which gives the users the ratings and the reviews on restaurants all over india.These ratings and the Reviews are considered as one of the most important deciding factors which determine how good a restaurant is. We will therefore use the real time Data set with variuos features a user would look into regarding a restaurant. We will be considering Banglore City in this analysis. Content The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the establishment of different types of restaurant at different places in Bengaluru, aggregate rating of each restaurant, Bengaluru being one such city has more than 12,000 restaurants with restaurants serving dishes from all over the world. With each day new restaurants opening the industry has’nt been saturated yet and the demand is increasing day by day. Inspite of increasing demand it however has become difficult for new restaurants to compete with established restaurants. Most of them serving the same food. Bengaluru being an IT capital of India. Most of the people here are dependent mainly on the restaurant food as they don’t have time to cook for themselves. With such an overwhelming demand of restaurants it has therefore become important to study the demography of a location. What kind of a food is more popular in a locality. Do the entire locality loves vegetarian food. If yes then is that locality populated by a particular sect of people for eg. Jain, Marwaris, Gujaratis who are mostly vegetarian. These kind of analysis can be done using the data, by studying the factors such as • Location of the restaurant • Approx Price of food • Theme based restaurant or not • Which locality of that city serves that cuisines with maximum number of restaurants • The needs of people who are striving to get the best cuisine of the neighborhood • Is a particular neighborhood famous for its own kind of food. “Just so that you have a good meal the next time you step out” The data is accurate to that available on the zomato website until 15 March 2019. The data was scraped from Zomato in two phase. After going through the structure of the website I found that for each neighborhood there are 6-7 category of restaurants viz. Buffet, Cafes, Delivery, Desserts, Dine-out, Drinks & nightlife, Pubs and bars. Phase I, In Phase I of extraction only the URL, name and address of the restaurant were extracted which were visible on the front page. The URl's for each of the restaurants on the zomato were recorded in the csv file so that later the data can be extracted individually for each restaurant. This made the extraction process easier and reduced the extra load on my machine. The data for each neighborhood and each category can be found here Phase II, In Phase II the recorded data for each restaurant and each category was read and data for each restaurant was scraped individually. 15 variables were scraped in this phase. For each of the neighborhood and for each category their onlineorder, booktable, rate, votes, phone, location, resttype, dishliked, cuisines, approxcost(for two people), reviewslist, menu_item was extracted. See section 5 for more details about the variables. Acknowledgements The data scraped was entirely for educational purposes only. Note that I don’t claim any copyright for the data. All copyrights for the data is owned by Zomato Media Pvt. Ltd.. Source: Kaggle

Jupyter Notebook

Updated 1 week ago

carte

carte-data

❤️35

A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.

GPL-3.0

Python

Updated 5 months ago

cartedata-catalogdata-discovery+3

AI-web_scraper

m92vyas

🧡55

Just mention want you want and it will extract/scrape data from the Web. Useful to create AI web search+extraction/scraping agent, RAG with web data etc.

Python

Updated 1 week ago

ai-web-scraperaiagentsfirecrawl+5

SpiderFoot-

anthophilee

🧡65

ادات جلب معلوماتUSES SpiderFoot can be used offensively (e.g. in a red team exercise or penetration test) for reconnaissance of your target or defensively to gather information about what you or your organisation might have exposed over the Internet. You can target the following entities in a SpiderFoot scan: IP address Domain/sub-domain name Hostname Network subnet (CIDR) ASN E-mail address Phone number Username Person's name Bitcoin address SpiderFoot's 200+ modules feed each other in a publisher/subscriber model to ensure maximum data extraction to do things like: Host/sub-domain/TLD enumeration/extraction Email address, phone number and human name extraction Bitcoin and Ethereum address extraction Check for susceptibility to sub-domain hijacking DNS zone transfers Threat intelligence and Blacklist queries API integration with SHODAN, HaveIBeenPwned, GreyNoise, AlienVault, SecurityTrails, etc. Social media account enumeration S3/Azure/Digitalocean bucket enumeration/scraping IP geo-location Web scraping, web content analysis Image, document and binary file meta data analysis Dark web searches Port scanning and banner grabbing Data breach searches So much more... INSTALLING & RUNNING To install and run SpiderFoot, you need at least Python 3.6 and a number of Python libraries which you can install with pip. We recommend you install a packaged release since master will often have bleeding edge features and modules that aren't fully tested. Stable build (packaged release): $ wget https://github.com/smicallef/spiderfoot/archive/v3.3.tar.gz $ tar zxvf v3.3.tar.gz $ cd spiderfoot ~/spiderfoot$ pip3 install -r requirements.txt ~/spiderfoot$ python3 ./sf.py -l 127.0.0.1:5001 Development build (cloning git master branch): $ git clone https://github.com/smicallef/spiderfoot.git $ cd spiderfoot $ pip3 install -r requirements.txt ~/spiderfoot$ python3 ./sf.py -l 127.0.0.1:5001 Check out the documentation and our asciinema videos for more tutorials. COMMUNITY Whether you're a contributor, user or just curious about SpiderFoot and OSINT in general, we'd love to have you join our community! SpiderFoot now has a Discord server for chat, and a Discourse server to serve as a more permanent knowledge base.

Updated 1 day ago

spider

psq

❤️35

Spider is a Web spidering library for Ruby. It handles the robots.txt, scraping, collecting, and looping so that you can just handle the data.

Ruby

Updated 8 months ago

Webtrench

nuhmanpk

❤️40

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

MIT

Python

Updated 2 months ago

audio-datasetsdatadata-collection+8

whatsapp-web-scraping

GabriellBP

❤️35

Just a simple whatsapp web scraping to collect data from conversations.

Python

Updated 7 months ago

crawlerpython3scraper+1

AI_Dating

Gabryxx7

🧡65

What to do when you end up single during a pandemic in 2020? Create an AI to deal with dating apps for you! Analyse bios, messages, pictures and more! Or just use this as a desktop client for Tinder (and Bumble) or to scrape some data for research purposes!

Python

Updated 3 days ago

nlpnltkpyqt5+7

Justdial-Scraper

yatin94

❤️35

JustDial Scraper to scrap all the requested data which includes their name, address, email address and phone number.

GPL-3.0

Python

Updated 4 months ago

beautifulsoupbeautifulsoup4buisness+13

top-tunisian-github-users

yassineyahyaoui

🧡50

This list is just an updated list from the "gayanvoice/top-github-users" repo. I just scraped, updated the numbers of total contributions and sorted it.

Updated 1 week ago

article-summary-deep-learning

ianramzy

❤️25

📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!

MIT

Python

Updated 1 year ago

fact-extractorflasknamed-entity-recognition+3

myscraper

GilangSan

🧡50

just save my scrape results

JavaScript

Updated 5 days ago

Justdial-Scraper

amruthpillai

❤️30

An automation script written in Node.js, powered by Puppeteer to scrape multiple pages of Justdial (an Indian Yellow Pages website) and exports specific information in CSV format

JavaScript

Updated 6 months ago

ScarperSocial

kal1gh0st

❤️35

Scrape emails, phone numbers and social media accounts from a website. You can use the found information to gather more information or just find ways to contact the site.

GPL-3.0

Python

Updated 4 months ago

scrapy-bhinneka-crawler

clasense4

❤️35

Scraping bhinneka.com, just for fun

Python

Updated 5 years ago

crawlerpythonscrapy

CrapGPT

VladDBA

🧡65

Just some neat little tricks to mess with some silly little content scraping copyright infringing bots.

MIT

Updated 4 days ago

ProxyHarvester

omersayak

❤️40

ProxyHarvester is a bash script designed to scrape, deduplicate, and test SOCKS5 proxies from a variety of sources. Whether you're a developer, researcher, or just someone in need of reliable proxy services, ProxyHarvester makes it easy to gather and verify proxies for your needs.

GPL-3.0

Shell

Updated 7 months ago

anonymousproxiesproxy+8

justETF-complete-ETF-scraper

theperu

❤️45

A simple script that takes all the ETFs listed on the JustETF website and parses thaem in an easy to navigate format

MIT

Python

Updated 1 month ago

etfetfsfinance+3

Bookmarks-to-Notion

CarlosUlisesOchoa

❤️25

An automated tool that generates Notion pages from bookmarks or just a list of URLs, using BeautifulSoup for web scraping, OpenAI's GPT model for refining page content, and the Notion API for page creation.

Python

Updated 4 months ago

aibeautifulsoupnotion+4

insta-scrapper

SajawalFareedi

❤️45

A very simple but useful bot for scrapping users from instagram. It scrapes every single follower, following, and post. Not just that, it also scrapes all the comments and there likes, tagged users, and comments replies

Apache-2.0

Python

Updated 2 months ago

instagramnodejspython+1

GitHub Explorer

Search Results

upton

mlscraper

helena

Scrapeasy

OnlyTheRecipe

tweet-2-json

Youtube_Scraper

Python-for-Data-Science-and-Machine-Learning-Bootcamp

chatgpt-scraper-api

spider

FLASK-End-to-end-Zomato-Restaurant-Price-Prediction-and-Deployment

carte

AI-web_scraper

SpiderFoot-

spider

Webtrench

whatsapp-web-scraping

AI_Dating

Justdial-Scraper

top-tunisian-github-users

article-summary-deep-learning

myscraper

Justdial-Scraper

ScarperSocial

scrapy-bhinneka-crawler

CrapGPT

ProxyHarvester

justETF-complete-ETF-scraper

Bookmarks-to-Notion

insta-scrapper

upton

mlscraper

helena

Scrapeasy

OnlyTheRecipe

tweet-2-json

Youtube_Scraper

Python-for-Data-Science-and-Machine-Learning-Bootcamp

chatgpt-scraper-api

spider

FLASK-End-to-end-Zomato-Restaurant-Price-Prediction-and-Deployment

carte

AI-web_scraper

SpiderFoot-

spider

Webtrench

whatsapp-web-scraping

AI_Dating

Justdial-Scraper

top-tunisian-github-users

article-summary-deep-learning

myscraper

Justdial-Scraper

ScarperSocial

scrapy-bhinneka-crawler

CrapGPT

ProxyHarvester

justETF-complete-ETF-scraper

Bookmarks-to-Notion

insta-scrapper