Search Results

Found 166 repositories(showing 30)

events-scrapper

berettavexee

❤️40

Web scraping python script to convert a list of Facebook events pages into a ical calendar.

MIT

Python

Updated 2 months ago

agendafacebookfacebook-crawler+8

shc-prometheus-exporter

zabeloliver

❤️35

Exports Bosch SHC Events to be scrapped by Prometheus.

Updated 1 year ago

boschexportermetrics+4

For people who had used this program before - the format of the config file has changed and you will need to re-download everything due to the new update mechanism. I apologize for the inconvenience. the program is multi-threaded; the default number of threads is your cpu cores * 3. You can temporarily change the number via the command line interface, or permanently change the number via the source code (in lib/deviantart.py at line 13) each artwork filename is appended with its artwork ID at the end for update validation purpose. The program downloads artworks for a user from newest to oldest until an existing file is found on the disk downloaded artworks are categorized by user and ranking mode modification time of each artwork is set according to upload order such that you can sort files by modified date ranking will overwrites existing files Instructions install Python 3.6+ install requests library pip install --user requests edit config.json file in data folder manually or via command line interface save directory: the save directory path users: the username shown on website or in URL Usage display help message $ python main.py -h usage: main.py [-h] [-f FILE] [-l] [-s SAVE_DIR] [-t THREADS] {artwork,ranking} ... positional arguments: {artwork,ranking} artwork download artworks from user IDs specified in "users" field ranking download top N ranking artworks based on given conditions optional arguments: -h, --help show this help message and exit -f FILE load file for this instance (default: data\config.json) -l list current settings -s SAVE_DIR set save directory path -t THREADS set number of threads for this instance display artwork help message $ python main.py artwork -h usage: main.py artwork [-h] [-a [ID ...]] [-d all [ID ...]] [-c all [ID ...]] optional arguments: -h, --help show this help message and exit -a [ID ...] add user IDs -d all [ID ...] delete user IDs and their directories -c all [ID ...] clear directories of user IDs display ranking help message usage: main.py ranking [-h] [-order ORDER] [-type TYPE] [-content CONTENT] [-category CATEGORY] [-n N] optional arguments: -h, --help show this help message and exit -order ORDER orders: {whats-hot, undiscovered, most-recent, popular-24-hours, popular-1-week, popular-1-month, popular-all-time} (default: popular-1-week) -type TYPE types: {visual-art, video, literature} (default: visual- art) -content CONTENT contents: {all, original-work, fan-art, resource, tutorial, da-related} (default: all) -category CATEGORY categories: {all, animation, artisan-crafts, tattoo-and- body-art, design, digital-art, traditional, photography, sculpture, street-art, mixed-media, poetry, prose, screenplays-and-scripts, characters-and-settings, action, adventure, abstract, comedy, drama, documentary, horror, science-fiction, stock-and-effects, fantasy, adoptables, events, memes, meta} (default: all) -n N get top N artworks (default: 30) download artworks from user IDs stored in config file; update users' artworks if directories already exist python main.py artwork download the top 30 (default) artworks that are popular-1-month, of type visual-art (default), of content original-work, and of category digital-art python main.py ranking -order popular-1-month -content original-work -category digital-art delete user IDs and their directories (IDs in users field + artwork directories), then download / update artworks for remaining IDs in config file python main.py artwork -d wlop trungbui42 add user IDs then download / update bookmark artworks for newly added IDs + IDs in config file python main.py artwork -a wlop trungbui42 use temp.json file in data folder as the config file (only for this instance), add user IDs to that file, then download / update artworks to directory specified in that file python main.py artwork -f data/temp.json -a wlop trungbui42 clear directories for all user IDs in config file, set threads to 24, then download artworks (i.e. re-download artworks) python main.py artwork -c all -t 24 Challenges there are two ways to download an image: (1) download button URL. (2) direct image URL. The former is preferred because it grabs the highest image quality and other file formats including gif, swf, abr, and zip. However, this has a small problem. The URL contains a token that turns invalid if certain actions are performed, such as refreshing the page, reopening the browser, and exceeding certain time limit Solution: use session to GET or POST all URLs for direct image URL, the image quality is much lower than the original upload (the resolution and size of the original upload can be found in the right sidebar). This is not the case few years ago when the original image was accessible through right click, but on 2017, Wix acquired DeviantArt, and has been migrating the images to their own image hosting system from the original DeviantArt system. They linked most of the direct images to a stripped-down version of the original images; hence the bad image quality. Below are the three different formats of direct image URLs I found: URL with /v1/fill inside: this means that the image went through Wix's encoding system and is modified to a specific size and quality. There are two cases for this format: old uploads: remove ?token= and the following values, add /intermediary in front of /f/ in the URL, and change the image settings right after /v1/fill/ to w_{width},h_{height},q_100. The width and height used to have a maximum limit of 5100 where (1) the system results in 400 Bad Request if exceeds the value, and (2) the original size will be returned if the required image is larger than the original. However, this has been changed recently. Now there is no input limit for the size, so you can request any dimension for the image, which may results in disproportional image if the given dimension is incorrect. In this case, I use the original resolution specified by the artist as the width and height new uploads: the width and height of the image cannot be changed, but the quality can still be improved by replacing (q_\d+,strp|strp) with q_100 Example: original URL vs incorrect dimension URL vs modified URL. The original url has a file size of 153 KB and 1024x1280 resolution, while the modified URL has a file size of 4.64 MB and 2700×3375 resolution. URL with /f/ but not /v1/fill: this is the original image, so just download it URL with https://img\d{2} or https://pre\d{2}: this means that the image went through DeviantArt's system and is modified to a specific size. I could not figure out how to get the original image from these types of links, i.e. find https://orig\d{2} from them, so I just download the image as is DeviantArt randomizes the div and class elements in HTML in an attempt to prevent scrapping, so parsing plain HTML will not work Solution: DeviantArt now uses XHR requests to send data between client and server, meaning one can simulate the requests to extract and parse data from the JSON response. The XHR requests and responses can be found in browsers' developer tools under Network tab. You can simply go to the request URL to see the response object age restriction Solution: I found that DeviantArt uses cookies to save the age check result. So, by setting the session.cookies to the appropriate value, there will be no age check sometimes the requests module will close the program with errors An existing connection was forcibly closed by the remote host or Max retries exceeded with url: (image url). I am not sure the exact cause, but it is most likely due to the high amount of requests sent from the same IP address in a short period of time; hence the server refuses the connection Solution: use HTTPAdapter and Retry to retry session.get in case of ConnectionError exception

Updated 7 months ago

posters

artkostm

❤️20

Endpoints for city events scrapping from different platforms

Scala

Updated 5 years ago

akkadialogflowdoobie+12

tweagle

shaz13

❤️25

A Machine learning twitter scrapping tool for analysing users, pages and streaming events

MIT

Python

Updated 5 years ago

atrium-scrapper

minmaxmean

❤️20

VA reposted events scrapper

JavaScript

Updated 3 years ago

UFC-Data-Extractor

sagi778

❤️35

web scrappers for extracting UFC event & fighters statistics to .csv data files

Jupyter Notebook

Updated 1 year ago

beautifulsoupbeautifulsoup4crawler+15

Ticketscrapper

Reshadaliyevr

❤️45

Basic code to scrap the FOMO club's website to see if there is any available new years event tickets and get updates via Telegram bot

Python

Updated 1 month ago

Sarajevo-scrapper

markk300

❤️40

Scrapper for scrapping data events from sarajevo.ba

MIT

Updated 2 years ago

events-scrapper

jbozas

❤️35

Events scrapper with a Telegram integration.

Python

Updated 2 years ago

event-ripping

bingeboy

❤️25

Scrapping and mapping events

MIT

JavaScript

Updated 11 years ago

events-scrapper

spectrum-team

❤️25

No description available

Updated 1 year ago

event-scrapper-srt

tpwo

❤️40

Event Scrapper for https://swingrevolution.pl/

Apache-2.0

HTML

Updated 1 year ago

SportsStatsScrapper

jperaltar

❤️40

Scrapper of sports webs to obtain statistics from sport events

Apache-2.0

Python

Updated 8 years ago

car-events-scrapper

riteshsahu

❤️40

Scripts for scrapping car events data using puppeteer from few websites and then transforming that data in some required format and output to a csv file

Apache-2.0

JavaScript

Updated 1 year ago

nodejspuppeteerscrapper

Wellington-Events-Scrapper

intiMRA

🧡55

No description available

Python

Updated 1 day ago

ak-events-tracker

voiddp

🧡65

web scrapper of event rewards from wikis

0BSD

TypeScript

Updated 4 days ago

scrap-api

lirlia

❤️35

scrap event api doc

HTML

Updated 6 years ago

Web-scraping-001

CamiloCamachoV

❤️35

Web-scraping for Scrap Arts Music events

Python

Updated 4 years ago

EventManagement

noobnewbier

❤️35

Event Aggregator scrapped from Caliburn Micro

Updated 1 year ago

friends_birthdays

etienne-lb

❤️40

Scrapping your Facebook friends birthdays and creating Google Calendar events

MIT

Python

Updated 3 years ago

python-sports-scrapper

vpapakir

❤️40

Set of python scripts to scrap sport events from various sites

GPL-3.0

Python

Updated 1 year ago

TheScrapPack

KaueSabinoSRV17

❤️35

Web Scrapping project that will populate and update a database of MMA events

TypeScript

Updated 3 years ago

tweetlytics

cs-chandu

❤️25

CLI - A Machine learning Twitter scrapping tool for analysing users, pages and streaming events

MIT

Python

Updated 3 years ago

request-response-datas-crawling

dhanasekaranweb

❤️35

To scrap all clickable events with request and response in any websites using selenium.

Python

Updated 4 years ago

event-scraping-validation-app

zubayer-hossain

❤️35

This is a demo test project for event scrapping validation.

JavaScript

Updated 1 year ago

Geotagged-Web-Tweets-

Pranjal-bisht

❤️35

GIS based Events detection using geo-tagged tweets web scrapping and clustering using machine learning techniques

Python

Updated 3 years ago

flow-event-scraper

blocto

❤️20

This project demonstrates how to use FCL to scrap event from Flow block.

TypeScript

Updated 2 years ago

Web_Scraping_AndaluciaLab

fcoterroba

❤️35

Simple Andalucia Lab's web scrapping to automate the obtaining existing events outputting data to JSON. (Project in Spanish 🇪🇸)

Python

Updated 1 year ago

Consultation-eventdata

MaryBethBaker

❤️35

Use some open data, scrap some sites and parser for event dates and locations

Updated 4 years ago

GitHub Explorer

Search Results

events-scrapper

shc-prometheus-exporter

Python-Projct

posters

tweagle

atrium-scrapper

UFC-Data-Extractor

Ticketscrapper

Sarajevo-scrapper

events-scrapper

event-ripping

events-scrapper

event-scrapper-srt

SportsStatsScrapper

car-events-scrapper

Wellington-Events-Scrapper

ak-events-tracker

scrap-api

Web-scraping-001

EventManagement

friends_birthdays

python-sports-scrapper

TheScrapPack

tweetlytics

request-response-datas-crawling

event-scraping-validation-app

Geotagged-Web-Tweets-

flow-event-scraper

Web_Scraping_AndaluciaLab

Consultation-eventdata

events-scrapper

shc-prometheus-exporter

Python-Projct

posters

tweagle

atrium-scrapper

UFC-Data-Extractor

Ticketscrapper

Sarajevo-scrapper

events-scrapper

event-ripping

events-scrapper

event-scrapper-srt

SportsStatsScrapper

car-events-scrapper

Wellington-Events-Scrapper

ak-events-tracker

scrap-api

Web-scraping-001

EventManagement

friends_birthdays

python-sports-scrapper

TheScrapPack

tweetlytics

request-response-datas-crawling

event-scraping-validation-app

Geotagged-Web-Tweets-

flow-event-scraper

Web_Scraping_AndaluciaLab

Consultation-eventdata