Found 1,050 repositories(showing 30)
propublica
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
lorey
🤖 Scrape data from HTML websites automatically by just providing examples
schasins
A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.
joelbarmettlerUZH
Scraping in python made easy - receive the content you like in just one line of code
jpbulman
Recipe websites have too much clutter, this scrapes *just* the recipe
cosmocatalano
A PHP script that scrapes Twitter.com and returns nicely-formatted JSON—just like an API
CriticalHunter
Scrape data about an entire Channel or just a Playlist, or get stats about your Own Watch History.
program with Python, how to create amazing data visualizations, and how to use Machine Learning with Python! Here a just a few of the topics we will be learning: Programming with Python NumPy with Python Using pandas Data Frames to solve complex tasks Use pandas to handle Excel Files Web scraping with python Connect Python to SQL Use matplotlib and seaborn for data visualizations Use plotly for interactive visualizations Machine Learning with SciKit Learn, including: Linear Regression K Nearest Neighbors K Means Clustering Decision Trees Random Forests Natural Language Processing Neural Nets and Deep Learning Support Vector Machines and much, much more!
ScrapingBee
Collect structured responses from a ChatGPT scraper by sending a prompt with valid ChatGPT scraping API credentials. Enable live search, inject HTML context, and automate intelligent scraper ChatGPT workflows in just a few parameters.
johnnagro
Spider is a Web spidering library for Ruby. It handles the robots.txt, scraping, collecting, and looping so that you can just handle the data.
# **ABSTRACT** Main Objective: The main agenda of this project is: Perform extensive Exploratory Data Analysis(EDA) on the Zomato Dataset. Build an appropriate Machine Learning Model that will help various Zomato Restaurants to predict their respective Ratings based on certain features DEPLOY the Machine learning model via Flask that can be used to make live predictions of restaurants ratings A step by step guide is attached to this documnet as well as a video explanation of each concpet. Zomato is one of the best online food delivery apps which gives the users the ratings and the reviews on restaurants all over india.These ratings and the Reviews are considered as one of the most important deciding factors which determine how good a restaurant is. We will therefore use the real time Data set with variuos features a user would look into regarding a restaurant. We will be considering Banglore City in this analysis. Content The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the establishment of different types of restaurant at different places in Bengaluru, aggregate rating of each restaurant, Bengaluru being one such city has more than 12,000 restaurants with restaurants serving dishes from all over the world. With each day new restaurants opening the industry has’nt been saturated yet and the demand is increasing day by day. Inspite of increasing demand it however has become difficult for new restaurants to compete with established restaurants. Most of them serving the same food. Bengaluru being an IT capital of India. Most of the people here are dependent mainly on the restaurant food as they don’t have time to cook for themselves. With such an overwhelming demand of restaurants it has therefore become important to study the demography of a location. What kind of a food is more popular in a locality. Do the entire locality loves vegetarian food. If yes then is that locality populated by a particular sect of people for eg. Jain, Marwaris, Gujaratis who are mostly vegetarian. These kind of analysis can be done using the data, by studying the factors such as • Location of the restaurant • Approx Price of food • Theme based restaurant or not • Which locality of that city serves that cuisines with maximum number of restaurants • The needs of people who are striving to get the best cuisine of the neighborhood • Is a particular neighborhood famous for its own kind of food. “Just so that you have a good meal the next time you step out” The data is accurate to that available on the zomato website until 15 March 2019. The data was scraped from Zomato in two phase. After going through the structure of the website I found that for each neighborhood there are 6-7 category of restaurants viz. Buffet, Cafes, Delivery, Desserts, Dine-out, Drinks & nightlife, Pubs and bars. Phase I, In Phase I of extraction only the URL, name and address of the restaurant were extracted which were visible on the front page. The URl's for each of the restaurants on the zomato were recorded in the csv file so that later the data can be extracted individually for each restaurant. This made the extraction process easier and reduced the extra load on my machine. The data for each neighborhood and each category can be found here Phase II, In Phase II the recorded data for each restaurant and each category was read and data for each restaurant was scraped individually. 15 variables were scraped in this phase. For each of the neighborhood and for each category their onlineorder, booktable, rate, votes, phone, location, resttype, dishliked, cuisines, approxcost(for two people), reviewslist, menu_item was extracted. See section 5 for more details about the variables. Acknowledgements The data scraped was entirely for educational purposes only. Note that I don’t claim any copyright for the data. All copyrights for the data is owned by Zomato Media Pvt. Ltd.. Source: Kaggle
carte-data
A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.
m92vyas
Just mention want you want and it will extract/scrape data from the Web. Useful to create AI web search+extraction/scraping agent, RAG with web data etc.
anthophilee
ادات جلب معلوماتUSES SpiderFoot can be used offensively (e.g. in a red team exercise or penetration test) for reconnaissance of your target or defensively to gather information about what you or your organisation might have exposed over the Internet. You can target the following entities in a SpiderFoot scan: IP address Domain/sub-domain name Hostname Network subnet (CIDR) ASN E-mail address Phone number Username Person's name Bitcoin address SpiderFoot's 200+ modules feed each other in a publisher/subscriber model to ensure maximum data extraction to do things like: Host/sub-domain/TLD enumeration/extraction Email address, phone number and human name extraction Bitcoin and Ethereum address extraction Check for susceptibility to sub-domain hijacking DNS zone transfers Threat intelligence and Blacklist queries API integration with SHODAN, HaveIBeenPwned, GreyNoise, AlienVault, SecurityTrails, etc. Social media account enumeration S3/Azure/Digitalocean bucket enumeration/scraping IP geo-location Web scraping, web content analysis Image, document and binary file meta data analysis Dark web searches Port scanning and banner grabbing Data breach searches So much more... INSTALLING & RUNNING To install and run SpiderFoot, you need at least Python 3.6 and a number of Python libraries which you can install with pip. We recommend you install a packaged release since master will often have bleeding edge features and modules that aren't fully tested. Stable build (packaged release): $ wget https://github.com/smicallef/spiderfoot/archive/v3.3.tar.gz $ tar zxvf v3.3.tar.gz $ cd spiderfoot ~/spiderfoot$ pip3 install -r requirements.txt ~/spiderfoot$ python3 ./sf.py -l 127.0.0.1:5001 Development build (cloning git master branch): $ git clone https://github.com/smicallef/spiderfoot.git $ cd spiderfoot $ pip3 install -r requirements.txt ~/spiderfoot$ python3 ./sf.py -l 127.0.0.1:5001 Check out the documentation and our asciinema videos for more tutorials. COMMUNITY Whether you're a contributor, user or just curious about SpiderFoot and OSINT in general, we'd love to have you join our community! SpiderFoot now has a Discord server for chat, and a Discourse server to serve as a more permanent knowledge base.
psq
Spider is a Web spidering library for Ruby. It handles the robots.txt, scraping, collecting, and looping so that you can just handle the data.
nuhmanpk
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
GabriellBP
Just a simple whatsapp web scraping to collect data from conversations.
Gabryxx7
What to do when you end up single during a pandemic in 2020? Create an AI to deal with dating apps for you! Analyse bios, messages, pictures and more! Or just use this as a desktop client for Tinder (and Bumble) or to scrape some data for research purposes!
yatin94
JustDial Scraper to scrap all the requested data which includes their name, address, email address and phone number.
yassineyahyaoui
This list is just an updated list from the "gayanvoice/top-github-users" repo. I just scraped, updated the numbers of total contributions and sorted it.
ianramzy
📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!
GilangSan
just save my scrape results
amruthpillai
An automation script written in Node.js, powered by Puppeteer to scrape multiple pages of Justdial (an Indian Yellow Pages website) and exports specific information in CSV format
kal1gh0st
Scrape emails, phone numbers and social media accounts from a website. You can use the found information to gather more information or just find ways to contact the site.
clasense4
Scraping bhinneka.com, just for fun
VladDBA
Just some neat little tricks to mess with some silly little content scraping copyright infringing bots.
omersayak
ProxyHarvester is a bash script designed to scrape, deduplicate, and test SOCKS5 proxies from a variety of sources. Whether you're a developer, researcher, or just someone in need of reliable proxy services, ProxyHarvester makes it easy to gather and verify proxies for your needs.
theperu
A simple script that takes all the ETFs listed on the JustETF website and parses thaem in an easy to navigate format
CarlosUlisesOchoa
An automated tool that generates Notion pages from bookmarks or just a list of URLs, using BeautifulSoup for web scraping, OpenAI's GPT model for refining page content, and the Notion API for page creation.
SajawalFareedi
A very simple but useful bot for scrapping users from instagram. It scrapes every single follower, following, and post. Not just that, it also scrapes all the comments and there likes, tagged users, and comments replies