Found 126 repositories(showing 30)
ptyadana
Collections of python projects including machine learning projects, image and pdf processing, password checkers, sending emails, sms, web scraping,flask web app,selenium automation testing,etc
ultranet1
Project Description: A music streaming company wants to introduce more automation and monitoring to their data warehouse ETL pipelines and they have come to the conclusion that the best tool to achieve this is Apache Airflow. As their Data Engineer, I was tasked to create a reusable production-grade data pipeline that incorporates data quality checks and allows for easy backfills. Several analysts and Data Scientists rely on the output generated by this pipeline and it is expected that the pipeline runs daily on a schedule by pulling new data from the source and store the results to the destination. Data Description: The source data resides in S3 and needs to be processed in a data warehouse in Amazon Redshift. The source datasets consist of JSON logs that tell about user activity in the application and JSON metadata about the songs the users listen to. Data Pipeline design: At a high-level the pipeline does the following tasks. Extract data from multiple S3 locations. Load the data into Redshift cluster. Transform the data into a star schema. Perform data validation and data quality checks. Calculate the most played songs for the specified time interval. Load the result back into S3. dag Structure of the Airflow DAG Design Goals: Based on the requirements of our data consumers, our pipeline is required to adhere to the following guidelines: The DAG should not have any dependencies on past runs. On failure, the task is retried for 3 times. Retries happen every 5 minutes. Catchup is turned off. Do not email on retry. Pipeline Implementation: Apache Airflow is a Python framework for programmatically creating workflows in DAGs, e.g. ETL processes, generating reports, and retraining models on a daily basis. The Airflow UI automatically parses our DAG and creates a natural representation for the movement and transformation of data. A DAG simply is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. A DAG describes how you want to carry out your workflow, and Operators determine what actually gets done. By default, airflow comes with some simple built-in operators like PythonOperator, BashOperator, DummyOperator etc., however, airflow lets you extend the features of a BaseOperator and create custom operators. For this project, I developed several custom operators. operators The description of each of these operators follows: StageToRedshiftOperator: Stages data to a specific redshift cluster from a specified S3 location. Operator uses templated fields to handle partitioned S3 locations. LoadFactOperator: Loads data to the given fact table by running the provided sql statement. Supports delete-insert and append style loads. LoadDimensionOperator: Loads data to the given dimension table by running the provided sql statement. Supports delete-insert and append style loads. SubDagOperator: Two or more operators can be grouped into one task using the SubDagOperator. Here, I am grouping the tasks of checking if the given table has rows and then run a series of data quality sql commands. HasRowsOperator: Data quality check to ensure that the specified table has rows. DataQualityOperator: Performs data quality checks by running sql statements to validate the data. SongPopularityOperator: Calculates the top ten most popular songs for a given interval. The interval is dictated by the DAG schedule. UnloadToS3Operator: Stores the analysis result back to the given S3 location. Code for each of these operators is located in the plugins/operators directory. Pipeline Schedule and Data Partitioning: The events data residing on S3 is partitioned by year (2018) and month (11). Our task is to incrementally load the event json files, and run it through the entire pipeline to calculate song popularity and store the result back into S3. In this manner, we can obtain the top songs per day in an automated fashion using the pipeline. Please note, this is a trivial analyis, but you can imagine other complex queries that follow similar structure. S3 Input events data: s3://<bucket>/log_data/2018/11/ 2018-11-01-events.json 2018-11-02-events.json 2018-11-03-events.json .. 2018-11-28-events.json 2018-11-29-events.json 2018-11-30-events.json S3 Output song popularity data: s3://skuchkula-topsongs/ songpopularity_2018-11-01 songpopularity_2018-11-02 songpopularity_2018-11-03 ... songpopularity_2018-11-28 songpopularity_2018-11-29 songpopularity_2018-11-30 The DAG can be configured by giving it some default_args which specify the start_date, end_date and other design choices which I have mentioned above. default_args = { 'owner': 'shravan', 'start_date': datetime(2018, 11, 1), 'end_date': datetime(2018, 11, 30), 'depends_on_past': False, 'email_on_retry': False, 'retries': 3, 'retry_delay': timedelta(minutes=5), 'catchup_by_default': False, 'provide_context': True, } How to run this project? Step 1: Create AWS Redshift Cluster using either the console or through the notebook provided in create-redshift-cluster Run the notebook to create AWS Redshift Cluster. Make a note of: DWN_ENDPOINT :: dwhcluster.c4m4dhrmsdov.us-west-2.redshift.amazonaws.com DWH_ROLE_ARN :: arn:aws:iam::506140549518:role/dwhRole Step 2: Start Apache Airflow Run docker-compose up from the directory containing docker-compose.yml. Ensure that you have mapped the volume to point to the location where you have your DAGs. NOTE: You can find details of how to manage Apache Airflow on mac here: https://gist.github.com/shravan-kuchkula/a3f357ff34cf5e3b862f3132fb599cf3 start_airflow Step 3: Configure Apache Airflow Hooks On the left is the S3 connection. The Login and password are the IAM user's access key and secret key that you created. Basically, by using these credentials, we are able to read data from S3. On the right is the redshift connection. These values can be easily gathered from your Redshift cluster connections Step 4: Execute the create-tables-dag This dag will create the staging, fact and dimension tables. The reason we need to trigger this manually is because, we want to keep this out of main dag. Normally, creation of tables can be handled by just triggering a script. But for the sake of illustration, I created a DAG for this and had Airflow trigger the DAG. You can turn off the DAG once it is completed. After running this DAG, you should see all the tables created in the AWS Redshift. Step 5: Turn on the load_and_transform_data_in_redshift dag As the execution start date is 2018-11-1 with a schedule interval @daily and the execution end date is 2018-11-30, Airflow will automatically trigger and schedule the dag runs once per day for 30 times. Shown below are the 30 DAG runs ranging from start_date till end_date, that are trigged by airflow once per day. schedule
501Commons
Data Import Automation is for anyone that is using the Salesforce Data Import Wizard or Data Loader on a regular basis to get the same set of external data uploaded to Salesforce. If you have ever saved a field map file (sdl file) from Data Loader then automation is probably a good option. Common tasks that are ideal for Data Import Automation are donation records coming from periodic financial reports, business cards/contacts from marketing events, external databases not integrated into Salesforce, or any other data that has the same fields you are regularly uploading. There is an investment in setting up the Data Import Automation that takes a couple of hours but once you have it setup the end user experience is Copy external data sources (e.g., csv, excel files) to a specific location Run the import automation program Review import emails (takes about 10-15 minutes for import to finish) which have a log of the import process and the Salesforce Data Loader success and error files are attached. 501 Commons Salesforce Importer Open Source project on GitHub https://github.com/501Commons/Salesforce-Importer The technology used is Salesforce Data Loader Command Line Microsoft Excel Python The bulk of the setup work is building the Microsoft Excel file by adding New Data Queries for your external sources & Salesforce Object query. Then using Excel Power Query merge data between your external data sources and Salesforce objects so that you have a list of data to insert (not already in Salesforce) and data to update. The project is open source so feel free to suggest and make contributions. Built and tested on Windows but no reason shouldn't work on Mac OS.
exemartinez
Automation tool that ingests LinkedIn contact exports and sends personalized outreach emails from a single template. Python, CSV processing, SMTP.
mazen-salah
The Trovo Account Registration Bot is a Python automation script that automates the process of creating new Trovo accounts for various purposes. Trovo is a live streaming platform, and this script streamlines the account creation process by filling out the registration form, solving CAPTCHA challenges, and verifying email addresses.
Taresh-oss
For Gmail Automation I have used Web Scraping (Python - Selenium Library). With the help of this automation, user will be able to send the job application immediately as well as can schedule it for specific date and time and the script will stop when all the mails have been sent to the respective email ids. The email will be having updated subject, body message and cover letter as per the Company name and Job Profile specified by the user in the excel sheet. All that user has to do is, run the script, rest all will be handled by the automation process.
Berryakin2010
You work for an online fruits store, and you need to develop a system that will update the catalog information with data provided by your suppliers. The suppliers send the data as large images with an associated description of the products in two files (.TIF for the image and .txt for the description). The images need to be converted to smaller jpeg images and the text needs to be turned into an HTML file that shows the image and the product description. The contents of the HTML file need to be uploaded to a web service that is already running using Django. You also need to gather the name and weight of all fruits from the .txt files and use a Python request to upload it to your Django server. You will create a Python script that will process the images and descriptions and then update your company's online website to add the new products. Once the task is complete, the supplier should be notified with an email that indicates the total weight of fruit (in lbs) that were uploaded. The email should have a PDF attached with the name of the fruit and its total weight (in lbs). Finally, in parallel to the automation running, we want to check the health of the system and send an email if something goes wrong.
KHolodilin
Secure and idempotent Python tool for automated email processing: download attachments via IMAP, send files via SMTP with automatic tracking, organize files by subject rules, archive processed emails, and manage passwords via keyring.
zainmz
A Robotic Process Automation created to generate a summary report and email it using Python
cameron-coding-projects
Python automation tool that analyses in memory running processes and alerts via email on threshold exceeds of CPU and deny listed processes.
AniketJoshi-ready
A collection of Python automation projects including a Duplicate File Cleaner with log & email automation and a System Process Logger with scheduling. Features checksum-based duplicate detection, automated logging, email notifications, and periodic execution using Python libraries.
Nandani-Rejoice
Gmail Automation – A Python pipeline that monitors Gmail for new emails, processes attachments (images, PDFs, CSVs, Excel), integrates with Supabase, and supports LLM-based processing with automated escalation.
tejveer77
AutoMetrics: Python automation tool for IT—processes Excel/CSV/TXT, calculates metrics, generates reports, and emails them. Dynamic SQLite storage, Tkinter GUI, and SMTP integration.
BernardWambua
This project is a Python-based automation tool designed to streamline the process of forwarding emails from an underwriting team to the appropriate insurance team members.
anilchaudhary449
The overall test automation for web application testing using Selenium WebDriver and Python. The project of Authorized partner portal demonstrates end-to-end automation of a multiple step registration process with email verification.
ChandanHegde07
The Email Automation Tool is a Python-based solution for automating the process of sending customized bulk emails. It is designed to work seamlessly with a list of recipients stored in a CSV file and supports key features like SMTP email sending, HTML email templates, attachments, and scheduled email dispatch.
ArviiSoft
🕵🏻♀️ A lightweight, script-based email checker written in Python. This utility allows you to verify and process email data from a file or service. Ideal for small-scale verification tasks or automation scenarios.
ShiwaniKadu
Developed an automation script in Python to log information about all running processes at periodic intervals, with an additional feature of automatically sending the log report through email.
Fazlullahmamond
The Email Automation tool is a Python-powered solution that streamlines email marketing by automating the entire outreach process. Simply enter your target keywords, and the script searches Google for top-ranking websites, extracts valid email addresses from them, and sends your pre-written email templates directly to those contacts.
firetofficial
This repository contains a Python tool that automates the process of creating an ExitLag account using a temporary email address. It leverages Selenium for web automation and interacts with the Mail.GW service to fetch temporary email addresses.
intelligent-shahid
Automates the login process for Yatra using Selenium WebDriver in Python. Supports email input and clicking the continue button with smart waits for elements. Ideal for beginners learning browser automation and Python-based testing frameworks.
stefagnone
A Python-based Dynamic Email Generator designed to automate and personalize email creation for marketing, outreach, and professional communication. This project demonstrates the integration of data-driven logic with automation to streamline the process of crafting customized email templates. With features like dynamic content handling, real-time pl
sriharsha024
This repository contains structured Python practice programs and mini projects covering fundamentals to advanced topics, including OOP, web scraping, image processing, CSV and PDF handling, and email automation using SMTP and IMAP.
the-silversurver
This script automates the process of logging into a Mailman admin interface, scraping email addresses, and saving them into a CSV file. The tool is built with Python and utilizes Selenium for web automation.
mubbashirulislam
The CV Automation Tool is a Python script that simplifies job applications by allowing users to send their CVs to multiple email addresses simultaneously. With customizable email content and attachment support, this tool streamlines the application process, saving time and effort for job seekers.
Abhaypratap73
Developed a Python-based Virtual AI Assistant using Speech Recognition and Natural Language Processing (NLP) for voice-controlled task automation. It performs tasks like web searches, sending emails, and managing reminders, with real-time feedback.
DE-KHALED
Automates an online fruit store workflow: processes images and descriptions, uploads data, generates PDF reports, sends email notifications, and performs system health checks. Built as part of the Google IT Automation with Python Certificate and refined for production use.
Ta-bot-pixel
This repository showcases an end-to-end automated solution for financial data processing, analysis, and deployment. Built for Regal Finance Solutions, the project integrates Outlook email automation, Google Cloud Platform (GCP) workflows, Python scripting, Power BI dashboards, machine learning model deployment with Flask & Dockerized application.
This repository showcases an end-to-end automated solution for financial data processing, analysis, and deployment. Built for Regal Finance Solutions, the project integrates Outlook email automation, Google Cloud Platform (GCP) workflows, Python scripting, Power BI dashboards, machine learning model deployment with Flask & Dockerized application.
RohitKankhedia
Data Miner (ver 1.0) - Converting and Consolidating unstructured data into structure data. Present days most of the companies are working on Data Lake concept to consolidate there existing data bases which are residing in there sub units or department or process. In term of consolidation most of technique include SAS , macro or automation,RPA where SAS & RPA is not being used by every company so there are cases where automation ,macro are now being written by developer to consolidate all data but again that time consuming . In order to simplify all aspect and work i have build a solution to eliminate time consuming macro writing (automation) for consolidation and refining all unstructured data at no cost because it is absolutely free... *Here unstructured data mean is where you don`t know where is the header in the whole worksheet or any type of database but what you know is the name of column name or data schema. Today on 10th Dec 2018 i am launching this tool in public which i have build this tool on python which again platform independent, so go ahead and try it and let me know if you require any amendment as per your requirement. For any update or queries please email of comment.... Believe in work not in words Regards, Rohit How to Use: Step 1: Browser the folder Step 2: Start Mining