Found 1,373 repositories(showing 30)
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Projects done in the Data Engineering Nanodegree by Udacity.com
immu0001
Classwork projects and home works done through Udacity data engineering nano degree
manuel-lang
Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift, Data Lake with Spark and Data Pipeline with Airflow.
Udacity Data Engineering Nanodegree Capstone Project
My solutions for the Udacity Data Engineering Nanodegree
Lal4Tech
Resources and projects from Udacity Data Engineering with AWS nano degree programme
AhmadChaiban
Udacity's 5 Month Data Engineering Nanodegree program. This repo includes all the projects completed.
anefischer
Projects I implemented to finish Udacity Nanodegree Programs from Data Engineering to Machine Learning Engineering.
kenhanscombe
Udacity data engineering nanodegree project
amalphonse
This repo contains my projects from the Udacity Data Engineering Nano degree
scurtis94
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
bondxue
:mushroom:Udacity Data Engineering Nanodegree Project 3
patrickbrus
A Machine Learning project for retail data analytics as part of the Machine Learning Engineering Nanodegree Capstone Project from Udacity
Udacity Data Engineering Nanodegree Projects
Project Overview Welcome to the Convolutional Neural Networks (CNN) project in the AI Nanodegree! In this project, you will learn how to build a pipeline that can be used within a web or mobile app to process real-world, user-supplied images. Given an image of a dog, your algorithm will identify an estimate of the canine’s breed. If supplied an image of a human, the code will identify the resembling dog breed. Sample Output Along with exploring state-of-the-art CNN models for classification, you will make important design decisions about the user experience for your app. Our goal is that by completing this lab, you understand the challenges involved in piecing together a series of models designed to perform various tasks in a data processing pipeline. Each model has its strengths and weaknesses, and engineering a real-world application often involves solving many problems without a perfect answer. Your imperfect solution will nonetheless create a fun user experience! Project Instructions Instructions Clone the repository and navigate to the downloaded folder. git clone https://github.com/udacity/dog-project.git cd dog-project Download the dog dataset. Unzip the folder and place it in the repo, at location path/to/dog-project/dogImages. Download the human dataset. Unzip the folder and place it in the repo, at location path/to/dog-project/lfw. If you are using a Windows machine, you are encouraged to use 7zip to extract the folder. Download the VGG-16 bottleneck features for the dog dataset. Place it in the repo, at location path/to/dog-project/bottleneck_features. (Optional) If you plan to install TensorFlow with GPU support on your local machine, follow the guide to install the necessary NVIDIA software on your system. If you are using an EC2 GPU instance, you can skip this step. (Optional) If you are running the project on your local machine (and not using AWS), create (and activate) a new environment. Linux (to install with GPU support, change requirements/dog-linux.yml to requirements/dog-linux-gpu.yml): conda env create -f requirements/dog-linux.yml source activate dog-project Mac (to install with GPU support, change requirements/dog-mac.yml to requirements/dog-mac-gpu.yml): conda env create -f requirements/dog-mac.yml source activate dog-project NOTE: Some Mac users may need to install a different version of OpenCV conda install --channel https://conda.anaconda.org/menpo opencv3 Windows (to install with GPU support, change requirements/dog-windows.yml to requirements/dog-windows-gpu.yml): conda env create -f requirements/dog-windows.yml activate dog-project (Optional) If you are running the project on your local machine (and not using AWS) and Step 6 throws errors, try this alternative step to create your environment. Linux or Mac (to install with GPU support, change requirements/requirements.txt to requirements/requirements-gpu.txt): conda create --name dog-project python=3.5 source activate dog-project pip install -r requirements/requirements.txt NOTE: Some Mac users may need to install a different version of OpenCV conda install --channel https://conda.anaconda.org/menpo opencv3 Windows (to install with GPU support, change requirements/requirements.txt to requirements/requirements-gpu.txt): conda create --name dog-project python=3.5 activate dog-project pip install -r requirements/requirements.txt (Optional) If you are using AWS, install Tensorflow. sudo python3 -m pip install -r requirements/requirements-gpu.txt Switch Keras backend to TensorFlow. Linux or Mac: KERAS_BACKEND=tensorflow python -c "from keras import backend" Windows: set KERAS_BACKEND=tensorflow python -c "from keras import backend" (Optional) If you are running the project on your local machine (and not using AWS), create an IPython kernel for the dog-project environment. python -m ipykernel install --user --name dog-project --display-name "dog-project" Open the notebook. jupyter notebook dog_app.ipynb (Optional) If you are running the project on your local machine (and not using AWS), before running code, change the kernel to match the dog-project environment by using the drop-down menu (Kernel > Change kernel > dog-project). Then, follow the instructions in the notebook. NOTE: While some code has already been implemented to get you started, you will need to implement additional functionality to successfully answer all of the questions included in the notebook. Unless requested, do not modify code that has already been included. Evaluation Your project will be reviewed by a Udacity reviewer against the CNN project rubric. Review this rubric thoroughly, and self-evaluate your project before submission. All criteria found in the rubric must meet specifications for you to pass. Project Submission When you are ready to submit your project, collect the following files and compress them into a single archive for upload: The dog_app.ipynb file with fully functional code, all code cells executed and displaying output, and all questions answered. An HTML or PDF export of the project notebook with the name report.html or report.pdf. Any additional images used for the project that were not supplied to you for the project. Please do not include the project data sets in the dogImages/ or lfw/ folders. Likewise, please do not include the bottleneck_features/ folder.
Udacity Data Engineering Nano Degree Project, Data Modeling for fact and dimension tables, and ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL.
lucaskjaero
Projects submitted as part of working through udacity's data engineering nanodegree.
Capstone Project for Udacity Data Engineering Nanodegree
BarbaraJoebstl
Projects of the Udacity Data Engineering Nanodegree Program.
Udacity Data Engineering Nanodegree Capstone Project
write4alive
Data Engineering Nano Degree Programm of Udacity - Project 5 - Data Pipelines with Apache Airflow
polo2444172276
Completed Udacity's data engineering nano degree. Went through a series of exercises and projects to learn and practice the trendy big data management tools.
MariamGado0
# Starbucks Promotions Project ### This project is the Capstone Project of Udacity's Machine Learning Engineering Nanodegree program.    ## Problem Statement This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. Not all users receive the same offer, and that is the challenge to solve with this data set. The task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products. Starbucks collects the customer data to understand their behaviour on the rewards and offers sent via the mobile-app. Once every few days, Starbucks sends the personalised offers to its customers. These customers can respond positively/negatively/neutrally. A key thing to note is that not all the customers receive the same offer. The task of this project is to combine transaction, demographic and offer data of the past (which is already provided) to determine which demographic groups respond best to which offer types. In order to develop this project, we needed to use some tools, packages, systems and services that could help us achieve our goals. #### Libraries First of all, we used **Python** to write our scripts not only for algorithm training and serving but also for the orchestration of the whole process. Important packages within this environment are listed below: This project is developed in Python 3.6. You will need install some libraries in order to run the code. Libraries are: * `pandas` so we could work with tabular data in dataframes; * `Ploty` so we could visualize our Dataset; * `matplotlib` for Dataset visualization; * `numpy` so we could easily manipulate arrays and data structures; * `seaborn` and `matplotlib` so we could generate insightful visualizations; * `sklearn` so we could build and develop our model pipeline; * `imblearn` so we could apply SMOTE to our training data; * `xgboost` so we could have our main classifier; * `sagemaker` so we could easily interact with AWS. * `json` for reading our Dataset Files. * `boto3` Finally, we used AWS environment in order to launch training jobs, deploy our model and serve predictions. The main services used are also listed below: * __AWS SageMaker__: training, hyperparameter tuning and endpoint serving; * __Amazon S3__: saving our data and model artifacts; ## Files Descriptions This project is structured as follows: #### 01. Proposal Project proposal documentation. #### 02. Data_Cleaning_[Dataset] Folder to perform data preparation and Dataset Cleaning and Prepare the Final Data for Further using in model algorithms. #### 03. Pre-processing Dataset Visualization Folder to perform final Pre-processing Dataset to be used in Visualization and exploration. #### 04. Dataset_Visualization Folder to perform Visualizations for the Pre-processed Dataset. #### 06. ORG_Starbucks_Capstone_Project.ipynb Jupyter notebook file that deploy final model and create an endpoint and orchestrates the end-to-end process in AWS SageMaker and also interacts with other services.
cheuklau
Udacity data engineering Airflow project
danielmt
Udacity Data Engineering Nanodegree Project 1 - Data Modeling with Postgres
naderAsadi
Data Engineering Nanodegree projects and exercises, including Data Modeling, Data Warehousing, Data Lake development, and Pipeline Management.
Wathon
Udacity Data Engineering Nano Degree Project, ETL for Data Warehouse using S3 and Amazon Redshift.
Federico-abss
My projects for the Udacity Data Engineering ND
Project 4: Udacity Nanodegree Program - Data Engineering with Microsoft Azure