Found 1,313 repositories(showing 30)
LangChain & Prompt Engineering tutorials on Large Language Models (LLMs) such as ChatGPT with custom data. Jupyter notebooks on loading and indexing data, creating prompt templates, CSV agents, and using retrieval QA chains to query the custom data. Projects for using a private LLM (Llama 2) for chat with PDF files, tweets sentiment analysis.
jupyter
Creates temporary Jupyter Notebook servers using Docker containers. [DEPRECATED - See BinderHub project]
gmihaila
This is where I put things I find useful that speed up my work with Machine Learning. Ever looked in your old projects to reuse those cool functions you created before? Well, this repo is designed to be a Python Library of functions I created in my previous project that can be reused. I also share some Notebooks Tutorials and Python Code Snippets.
HeyThatsViv
Project using machine learning to predict depression using health care data from the CDC NHANES website. A companion dashboard for users to explore the data in this project was created using Streamlit. Written with python using jupyter notebook for the main project flow/analysis and visual studio code for writing custom functions and creating the dashboard.
snowch
This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with. The demo also uses IBM Message Hub (kafka) to push application events to topic where they are consumed by a spark streaming job running on IBM BigInsights (hadoop).
mudigosa
Image Classifier Going forward, AI algorithms will be incorporated into more and more everyday applications. For example, you might want to include an image classifier in a smartphone app. To do this, you'd use a deep learning model trained on hundreds of thousands of images as part of the overall application architecture. A large part of software development in the future will be using these types of models as common parts of applications. In this project, you'll train an image classifier to recognize different species of flowers. You can imagine using something like this in a phone app that tells you the name of the flower your camera is looking at. In practice, you'd train this classifier, then export it for use in your application. We'll be using this dataset of 102 flower categories. When you've completed this project, you'll have an application that can be trained on any set of labelled images. Here your network will be learning about flowers and end up as a command line application. But, what you do with your new skills depends on your imagination and effort in building a dataset. This is the final Project of the Udacity AI with Python Nanodegree Prerequisites The Code is written in Python 3.6.5 . If you don't have Python installed you can find it here. If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip. To install pip run in the command Line python -m ensurepip -- default-pip to upgrade it python -m pip install -- upgrade pip setuptools wheel to upgrade Python pip install python -- upgrade Additional Packages that are required are: Numpy, Pandas, MatplotLib, Pytorch, PIL and json. You can donwload them using pip pip install numpy pandas matplotlib pil or conda conda install numpy pandas matplotlib pil In order to intall Pytorch head over to the Pytorch site select your specs and follow the instructions given. Viewing the Jyputer Notebook In order to better view and work on the jupyter Notebook I encourage you to use nbviewer . You can simply copy and paste the link to this website and you will be able to edit it without any problem. Alternatively you can clone the repository using git clone https://github.com/fotisk07/Image-Classifier/ then in the command Line type, after you have downloaded jupyter notebook type jupyter notebook locate the notebook and run it. Command Line Application Train a new network on a data set with train.py Basic Usage : python train.py data_directory Prints out current epoch, training loss, validation loss, and validation accuracy as the netowrk trains Options: Set direcotry to save checkpoints: python train.py data_dor --save_dir save_directory Choose arcitecture (alexnet, densenet121 or vgg16 available): pytnon train.py data_dir --arch "vgg16" Set hyperparameters: python train.py data_dir --learning_rate 0.001 --hidden_layer1 120 --epochs 20 Use GPU for training: python train.py data_dir --gpu gpu Predict flower name from an image with predict.py along with the probability of that name. That is you'll pass in a single image /path/to/image and return the flower name and class probability Basic usage: python predict.py /path/to/image checkpoint Options: Return top K most likely classes: python predict.py input checkpoint ---top_k 3 Use a mapping of categories to real names: python predict.py input checkpoint --category_names cat_To_name.json Use GPU for inference: python predict.py input checkpoint --gpu Json file In order for the network to print out the name of the flower a .json file is required. If you aren't familiar with json you can find information here. By using a .json file the data can be sorted into folders with numbers and those numbers will correspond to specific names specified in the .json file. Data and the json file The data used specifically for this assignemnt are a flower database are not provided in the repository as it's larger than what github allows. Nevertheless, feel free to create your own databases and train the model on them to use with your own projects. The structure of your data should be the following: The data need to comprised of 3 folders, test, train and validate. Generally the proportions should be 70% training 10% validate and 20% test. Inside the train, test and validate folders there should be folders bearing a specific number which corresponds to a specific category, clarified in the json file. For example if we have the image a.jpj and it is a rose it could be in a path like this /test/5/a.jpg and json file would be like this {...5:"rose",...}. Make sure to include a lot of photos of your catagories (more than 10) with different angles and different lighting conditions in order for the network to generalize better. GPU As the network makes use of a sophisticated deep convolutional neural network the training process is impossible to be done by a common laptop. In order to train your models to your local machine you have three options Cuda -- If you have an NVIDIA GPU then you can install CUDA from here. With Cuda you will be able to train your model however the process will still be time consuming Cloud Services -- There are many paid cloud services that let you train your models like AWS or Google Cloud Coogle Colab -- Google Colab gives you free access to a tesla K80 GPU for 12 hours at a time. Once 12 hours have ellapsed you can just reload and continue! The only limitation is that you have to upload the data to Google Drive and if the dataset is massive you may run out of space. However, once a model is trained then a normal CPU can be used for the predict.py file and you will have an answer within some seconds. Hyperparameters As you can see you have a wide selection of hyperparameters available and you can get even more by making small modifications to the code. Thus it may seem overly complicated to choose the right ones especially if the training needs at least 15 minutes to be completed. So here are some hints: By increasing the number of epochs the accuracy of the network on the training set gets better and better however be careful because if you pick a large number of epochs the network won't generalize well, that is to say it will have high accuracy on the training image and low accuracy on the test images. Eg: training for 12 epochs training accuracy: 85% Test accuracy: 82%. Training for 30 epochs training accuracy 95% test accuracy 50%. A big learning rate guarantees that the network will converge fast to a small error but it will constantly overshot A small learning rate guarantees that the network will reach greater accuracies but the learning process will take longer Densenet121 works best for images but the training process takes significantly longer than alexnet or vgg16 *My settings were lr=0.001, dropoup=0.5, epochs= 15 and my test accuracy was 86% with densenet121 as my feature extraction model. Pre-Trained Network The checkpoint.pth file contains the information of a network trained to recognise 102 different species of flowers. I has been trained with specific hyperparameters thus if you don't set them right the network will fail. In order to have a prediction for an image located in the path /path/to/image using my pretrained model you can simply type python predict.py /path/to/image checkpoint.pth Contributing Please read CONTRIBUTING.md for the process for submitting pull requests. Authors Shanmukha Mudigonda - Initial work Udacity - Final Project of the AI with Python Nanodegree
HemanthJoseph
An end-to-end Machine Learning project from writing a Jupyter notebook to check the viability of the solution, to breaking down the same into modular code, creating a Flask web app integrated with a HTML template to make a website interface, and deploying on AWS and Azure.
Devtown-India
Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day. Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used. In this project, you will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC. Day:1 In this project, Students will make use of Python to explore data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. You will write code to import the data and answer interesting questions about it by computing descriptive statistics. They will also write a script that takes in raw input to create an interactive experience in the terminal to present these statistics. Technologies that will be covered are Numpy, Pandas, Matplotlib, Seaborn, Jupyter notebook. We will be giving the students a deep dive into the Data Analytical process Day:2 We will be giving the students an insight into one of the major fields of Machine Learning ie. Time Series forcasting we will be taking them through the relevant theory and make them understand of the importance and different techniques that are available to deal with it. After that we will be working hands on the bike share data set implementing different algorithms and understanding them to the core We aim to provide students an insight into what exactly is the job of a data analyst and get them familiarise to how does the entire data analysis process work. The session will be hosted by Shaurya Sinha a data analyst at Jio and Parag Mittal Software engineer at Microsoft.
tripl-ai
A starter project to create Arc jobs using the Jupyter Notebook interface
sohilsshah91
This project gives an overview of crime time analysis in New York City . We have created Python Jupyter notebooks for spatial analysis of different crime types in the city using Pandas, Numpy, Plotly and Leaflet packages. As a second part to this analysis, we worked on ARIMA model on R for predicting the crime counts across various localities in the city based on correlations of various demographics correlation in each locality.
ultranet1
Project Description: A music streaming company wants to introduce more automation and monitoring to their data warehouse ETL pipelines and they have come to the conclusion that the best tool to achieve this is Apache Airflow. As their Data Engineer, I was tasked to create a reusable production-grade data pipeline that incorporates data quality checks and allows for easy backfills. Several analysts and Data Scientists rely on the output generated by this pipeline and it is expected that the pipeline runs daily on a schedule by pulling new data from the source and store the results to the destination. Data Description: The source data resides in S3 and needs to be processed in a data warehouse in Amazon Redshift. The source datasets consist of JSON logs that tell about user activity in the application and JSON metadata about the songs the users listen to. Data Pipeline design: At a high-level the pipeline does the following tasks. Extract data from multiple S3 locations. Load the data into Redshift cluster. Transform the data into a star schema. Perform data validation and data quality checks. Calculate the most played songs for the specified time interval. Load the result back into S3. dag Structure of the Airflow DAG Design Goals: Based on the requirements of our data consumers, our pipeline is required to adhere to the following guidelines: The DAG should not have any dependencies on past runs. On failure, the task is retried for 3 times. Retries happen every 5 minutes. Catchup is turned off. Do not email on retry. Pipeline Implementation: Apache Airflow is a Python framework for programmatically creating workflows in DAGs, e.g. ETL processes, generating reports, and retraining models on a daily basis. The Airflow UI automatically parses our DAG and creates a natural representation for the movement and transformation of data. A DAG simply is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. A DAG describes how you want to carry out your workflow, and Operators determine what actually gets done. By default, airflow comes with some simple built-in operators like PythonOperator, BashOperator, DummyOperator etc., however, airflow lets you extend the features of a BaseOperator and create custom operators. For this project, I developed several custom operators. operators The description of each of these operators follows: StageToRedshiftOperator: Stages data to a specific redshift cluster from a specified S3 location. Operator uses templated fields to handle partitioned S3 locations. LoadFactOperator: Loads data to the given fact table by running the provided sql statement. Supports delete-insert and append style loads. LoadDimensionOperator: Loads data to the given dimension table by running the provided sql statement. Supports delete-insert and append style loads. SubDagOperator: Two or more operators can be grouped into one task using the SubDagOperator. Here, I am grouping the tasks of checking if the given table has rows and then run a series of data quality sql commands. HasRowsOperator: Data quality check to ensure that the specified table has rows. DataQualityOperator: Performs data quality checks by running sql statements to validate the data. SongPopularityOperator: Calculates the top ten most popular songs for a given interval. The interval is dictated by the DAG schedule. UnloadToS3Operator: Stores the analysis result back to the given S3 location. Code for each of these operators is located in the plugins/operators directory. Pipeline Schedule and Data Partitioning: The events data residing on S3 is partitioned by year (2018) and month (11). Our task is to incrementally load the event json files, and run it through the entire pipeline to calculate song popularity and store the result back into S3. In this manner, we can obtain the top songs per day in an automated fashion using the pipeline. Please note, this is a trivial analyis, but you can imagine other complex queries that follow similar structure. S3 Input events data: s3://<bucket>/log_data/2018/11/ 2018-11-01-events.json 2018-11-02-events.json 2018-11-03-events.json .. 2018-11-28-events.json 2018-11-29-events.json 2018-11-30-events.json S3 Output song popularity data: s3://skuchkula-topsongs/ songpopularity_2018-11-01 songpopularity_2018-11-02 songpopularity_2018-11-03 ... songpopularity_2018-11-28 songpopularity_2018-11-29 songpopularity_2018-11-30 The DAG can be configured by giving it some default_args which specify the start_date, end_date and other design choices which I have mentioned above. default_args = { 'owner': 'shravan', 'start_date': datetime(2018, 11, 1), 'end_date': datetime(2018, 11, 30), 'depends_on_past': False, 'email_on_retry': False, 'retries': 3, 'retry_delay': timedelta(minutes=5), 'catchup_by_default': False, 'provide_context': True, } How to run this project? Step 1: Create AWS Redshift Cluster using either the console or through the notebook provided in create-redshift-cluster Run the notebook to create AWS Redshift Cluster. Make a note of: DWN_ENDPOINT :: dwhcluster.c4m4dhrmsdov.us-west-2.redshift.amazonaws.com DWH_ROLE_ARN :: arn:aws:iam::506140549518:role/dwhRole Step 2: Start Apache Airflow Run docker-compose up from the directory containing docker-compose.yml. Ensure that you have mapped the volume to point to the location where you have your DAGs. NOTE: You can find details of how to manage Apache Airflow on mac here: https://gist.github.com/shravan-kuchkula/a3f357ff34cf5e3b862f3132fb599cf3 start_airflow Step 3: Configure Apache Airflow Hooks On the left is the S3 connection. The Login and password are the IAM user's access key and secret key that you created. Basically, by using these credentials, we are able to read data from S3. On the right is the redshift connection. These values can be easily gathered from your Redshift cluster connections Step 4: Execute the create-tables-dag This dag will create the staging, fact and dimension tables. The reason we need to trigger this manually is because, we want to keep this out of main dag. Normally, creation of tables can be handled by just triggering a script. But for the sake of illustration, I created a DAG for this and had Airflow trigger the DAG. You can turn off the DAG once it is completed. After running this DAG, you should see all the tables created in the AWS Redshift. Step 5: Turn on the load_and_transform_data_in_redshift dag As the execution start date is 2018-11-1 with a schedule interval @daily and the execution end date is 2018-11-30, Airflow will automatically trigger and schedule the dag runs once per day for 30 times. Shown below are the 30 DAG runs ranging from start_date till end_date, that are trigged by airflow once per day. schedule
Sudhanshu1st
In this Data science project I tried to create Jupyter notebooks for EDA and feature engineering of Advanced House Price Prediction Dataset from Kaggle Competition.
Rushikesh8983
Language Translation In this project, you’re going to take a peek into the realm of neural network machine translation. You’ll be training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French. Get the Data Since translating the whole language of English to French will take lots of time to train, we have provided you with a small portion of the English corpus. """ DON'T MODIFY ANYTHING IN THIS CELL """ import helper import problem_unittests as tests source_path = 'data/small_vocab_en' target_path = 'data/small_vocab_fr' source_text = helper.load_data(source_path) target_text = helper.load_data(target_path) Explore the Data Play around with view_sentence_range to view different parts of the data. view_sentence_range = (0, 10) """ DON'T MODIFY ANYTHING IN THIS CELL """ import numpy as np print('Dataset Stats') print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()}))) sentences = source_text.split('\n') word_counts = [len(sentence.split()) for sentence in sentences] print('Number of sentences: {}'.format(len(sentences))) print('Average number of words in a sentence: {}'.format(np.average(word_counts))) print() print('English sentences {} to {}:'.format(*view_sentence_range)) print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]])) print() print('French sentences {} to {}:'.format(*view_sentence_range)) print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]])) Dataset Stats Roughly the number of unique words: 227 Number of sentences: 137861 Average number of words in a sentence: 13.225277634719028 English sentences 0 to 10: new jersey is sometimes quiet during autumn , and it is snowy in april . the united states is usually chilly during july , and it is usually freezing in november . california is usually quiet during march , and it is usually hot in june . the united states is sometimes mild during june , and it is cold in september . your least liked fruit is the grape , but my least liked is the apple . his favorite fruit is the orange , but my favorite is the grape . paris is relaxing during december , but it is usually chilly in july . new jersey is busy during spring , and it is never hot in march . our least liked fruit is the lemon , but my least liked is the grape . the united states is sometimes busy during january , and it is sometimes warm in november . French sentences 0 to 10: new jersey est parfois calme pendant l' automne , et il est neigeux en avril . les états-unis est généralement froid en juillet , et il gèle habituellement en novembre . california est généralement calme en mars , et il est généralement chaud en juin . les états-unis est parfois légère en juin , et il fait froid en septembre . votre moins aimé fruit est le raisin , mais mon moins aimé est la pomme . son fruit préféré est l'orange , mais mon préféré est le raisin . paris est relaxant en décembre , mais il est généralement froid en juillet . new jersey est occupé au printemps , et il est jamais chaude en mars . notre fruit est moins aimé le citron , mais mon moins aimé est le raisin . les états-unis est parfois occupé en janvier , et il est parfois chaud en novembre . Implement Preprocessing Function Text to Word Ids As you did with other RNNs, you must turn the text into a number so the computer can understand it. In the function text_to_ids(), you'll turn source_text and target_text from words to ids. However, you need to add the <EOS> word id at the end of target_text. This will help the neural network predict when the sentence should end. You can get the <EOS> word id by doing: target_vocab_to_int['<EOS>'] You can get other word ids using source_vocab_to_int and target_vocab_to_int. def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int): """ Convert source and target text to proper word ids :param source_text: String that contains all the source text. :param target_text: String that contains all the target text. :param source_vocab_to_int: Dictionary to go from the source words to an id :param target_vocab_to_int: Dictionary to go from the target words to an id :return: A tuple of lists (source_id_text, target_id_text) """ # TODO: Implement Function source_id_text = [[source_vocab_to_int[word] for word in sentence.split()] \ for sentence in source_text.split('\n')] target_id_text = [[target_vocab_to_int[word] for word in sentence.split()] + [target_vocab_to_int['<EOS>']] \ for sentence in target_text.split('\n')] return source_id_text, target_id_text """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_text_to_ids(text_to_ids) Tests Passed Preprocess all the data and save it Running the code cell below will preprocess all the data and save it to file. """ DON'T MODIFY ANYTHING IN THIS CELL """ helper.preprocess_and_save_data(source_path, target_path, text_to_ids) Check Point This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk. import problem_unittests as tests """ DON'T MODIFY ANYTHING IN THIS CELL """ import numpy as np import helper (source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess() Check the Version of TensorFlow and Access to GPU This will check to make sure you have the correct version of TensorFlow and access to a GPU """ DON'T MODIFY ANYTHING IN THIS CELL """ from distutils.version import LooseVersion import warnings import tensorflow as tf from tensorflow.python.layers.core import Dense # Check TensorFlow Version assert LooseVersion(tf.__version__) >= LooseVersion('1.1'), 'Please use TensorFlow version 1.1 or newer' print('TensorFlow Version: {}'.format(tf.__version__)) # Check for a GPU if not tf.test.gpu_device_name(): warnings.warn('No GPU found. Please use a GPU to train your neural network.') else: print('Default GPU Device: {}'.format(tf.test.gpu_device_name())) TensorFlow Version: 1.1.0 Default GPU Device: /gpu:0 Build the Neural Network You'll build the components necessary to build a Sequence-to-Sequence model by implementing the following functions below: model_inputs process_decoder_input encoding_layer decoding_layer_train decoding_layer_infer decoding_layer seq2seq_model Input Implement the model_inputs() function to create TF Placeholders for the Neural Network. It should create the following placeholders: Input text placeholder named "input" using the TF Placeholder name parameter with rank 2. Targets placeholder with rank 2. Learning rate placeholder with rank 0. Keep probability placeholder named "keep_prob" using the TF Placeholder name parameter with rank 0. Target sequence length placeholder named "target_sequence_length" with rank 1 Max target sequence length tensor named "max_target_len" getting its value from applying tf.reduce_max on the target_sequence_length placeholder. Rank 0. Source sequence length placeholder named "source_sequence_length" with rank 1 Return the placeholders in the following the tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length) def model_inputs(): """ Create TF Placeholders for input, targets, learning rate, and lengths of source and target sequences. :return: Tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length) """ # TODO: Implement Function inputs = tf.placeholder(tf.int32, [None, None], 'input') targets = tf.placeholder(tf.int32, [None, None]) learning_rate = tf.placeholder(tf.float32, []) keep_prob = tf.placeholder(tf.float32, [], 'keep_prob') target_sequence_length = tf.placeholder(tf.int32, [None], 'target_sequence_length') max_target_len = tf.reduce_max(target_sequence_length) source_sequence_length = tf.placeholder(tf.int32, [None], 'source_sequence_length') return inputs, targets, learning_rate, keep_prob, target_sequence_length, max_target_len, source_sequence_length """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_model_inputs(model_inputs) Tests Passed Process Decoder Input Implement process_decoder_input by removing the last word id from each batch in target_data and concat the GO ID to the begining of each batch. def process_decoder_input(target_data, target_vocab_to_int, batch_size): """ Preprocess target data for encoding :param target_data: Target Placehoder :param target_vocab_to_int: Dictionary to go from the target words to an id :param batch_size: Batch Size :return: Preprocessed target data """ # TODO: Implement Function go = tf.constant([[target_vocab_to_int['<GO>']]]*batch_size) # end = tf.slice(target_data, [0, 0], [-1, batch_size]) end = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1]) return tf.concat([go, end], 1) """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_process_encoding_input(process_decoder_input) Tests Passed Encoding Implement encoding_layer() to create a Encoder RNN layer: Embed the encoder input using tf.contrib.layers.embed_sequence Construct a stacked tf.contrib.rnn.LSTMCell wrapped in a tf.contrib.rnn.DropoutWrapper Pass cell and embedded input to tf.nn.dynamic_rnn() from imp import reload reload(tests) def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, source_sequence_length, source_vocab_size, encoding_embedding_size): """ Create encoding layer :param rnn_inputs: Inputs for the RNN :param rnn_size: RNN Size :param num_layers: Number of layers :param keep_prob: Dropout keep probability :param source_sequence_length: a list of the lengths of each sequence in the batch :param source_vocab_size: vocabulary size of source data :param encoding_embedding_size: embedding size of source data :return: tuple (RNN output, RNN state) """ # TODO: Implement Function embed = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size) def lstm_cell(): lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size) return tf.contrib.rnn.DropoutWrapper(lstm, keep_prob) stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(num_layers)]) # initial_state = stacked_lstm.zero_state(source_sequence_length, tf.float32) return tf.nn.dynamic_rnn(stacked_lstm, embed, source_sequence_length, dtype=tf.float32) # initial_state=initial_state) """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_encoding_layer(encoding_layer) Tests Passed Decoding - Training Create a training decoding layer: Create a tf.contrib.seq2seq.TrainingHelper Create a tf.contrib.seq2seq.BasicDecoder Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_summary_length, output_layer, keep_prob): """ Create a decoding layer for training :param encoder_state: Encoder State :param dec_cell: Decoder RNN Cell :param dec_embed_input: Decoder embedded input :param target_sequence_length: The lengths of each sequence in the target batch :param max_summary_length: The length of the longest sequence in the batch :param output_layer: Function to apply the output layer :param keep_prob: Dropout keep probability :return: BasicDecoderOutput containing training logits and sample_id """ # TODO: Implement Function helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length) decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, helper, encoder_state, output_layer) dec_train_logits, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_summary_length) # for tensorflow 1.2: # dec_train_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_summary_length) return dec_train_logits # keep_prob/dropout not used? """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_decoding_layer_train(decoding_layer_train) Tests Passed Decoding - Inference Create inference decoder: Create a tf.contrib.seq2seq.GreedyEmbeddingHelper Create a tf.contrib.seq2seq.BasicDecoder Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob): """ Create a decoding layer for inference :param encoder_state: Encoder state :param dec_cell: Decoder RNN Cell :param dec_embeddings: Decoder embeddings :param start_of_sequence_id: GO ID :param end_of_sequence_id: EOS Id :param max_target_sequence_length: Maximum length of target sequences :param vocab_size: Size of decoder/target vocabulary :param decoding_scope: TenorFlow Variable Scope for decoding :param output_layer: Function to apply the output layer :param batch_size: Batch size :param keep_prob: Dropout keep probability :return: BasicDecoderOutput containing inference logits and sample_id """ # TODO: Implement Function start_tokens = tf.constant([start_of_sequence_id]*batch_size) helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, start_tokens, end_of_sequence_id) decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, helper, encoder_state, output_layer) dec_infer_logits, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_target_sequence_length) # for tensorflow 1.2: # dec_infer_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_target_sequence_length) return dec_infer_logits """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_decoding_layer_infer(decoding_layer_infer)
NhanPhamThanh-IT
🦠 Covid19 Visualization & Analysis is a community-driven Python project for exploring, visualizing, and analyzing global Covid-19 data. It features interactive notebooks, scripts, and guides to help users uncover trends, create impactful visualizations, and share insights for research, education, and public awareness.
jabedalimollah
This is my notebook project and I am creating it using MERN stack
dindatisi
notebook created for MSING055 final project
Project Overview Welcome to the Convolutional Neural Networks (CNN) project in the AI Nanodegree! In this project, you will learn how to build a pipeline that can be used within a web or mobile app to process real-world, user-supplied images. Given an image of a dog, your algorithm will identify an estimate of the canine’s breed. If supplied an image of a human, the code will identify the resembling dog breed. Sample Output Along with exploring state-of-the-art CNN models for classification, you will make important design decisions about the user experience for your app. Our goal is that by completing this lab, you understand the challenges involved in piecing together a series of models designed to perform various tasks in a data processing pipeline. Each model has its strengths and weaknesses, and engineering a real-world application often involves solving many problems without a perfect answer. Your imperfect solution will nonetheless create a fun user experience! Project Instructions Instructions Clone the repository and navigate to the downloaded folder. git clone https://github.com/udacity/dog-project.git cd dog-project Download the dog dataset. Unzip the folder and place it in the repo, at location path/to/dog-project/dogImages. Download the human dataset. Unzip the folder and place it in the repo, at location path/to/dog-project/lfw. If you are using a Windows machine, you are encouraged to use 7zip to extract the folder. Download the VGG-16 bottleneck features for the dog dataset. Place it in the repo, at location path/to/dog-project/bottleneck_features. (Optional) If you plan to install TensorFlow with GPU support on your local machine, follow the guide to install the necessary NVIDIA software on your system. If you are using an EC2 GPU instance, you can skip this step. (Optional) If you are running the project on your local machine (and not using AWS), create (and activate) a new environment. Linux (to install with GPU support, change requirements/dog-linux.yml to requirements/dog-linux-gpu.yml): conda env create -f requirements/dog-linux.yml source activate dog-project Mac (to install with GPU support, change requirements/dog-mac.yml to requirements/dog-mac-gpu.yml): conda env create -f requirements/dog-mac.yml source activate dog-project NOTE: Some Mac users may need to install a different version of OpenCV conda install --channel https://conda.anaconda.org/menpo opencv3 Windows (to install with GPU support, change requirements/dog-windows.yml to requirements/dog-windows-gpu.yml): conda env create -f requirements/dog-windows.yml activate dog-project (Optional) If you are running the project on your local machine (and not using AWS) and Step 6 throws errors, try this alternative step to create your environment. Linux or Mac (to install with GPU support, change requirements/requirements.txt to requirements/requirements-gpu.txt): conda create --name dog-project python=3.5 source activate dog-project pip install -r requirements/requirements.txt NOTE: Some Mac users may need to install a different version of OpenCV conda install --channel https://conda.anaconda.org/menpo opencv3 Windows (to install with GPU support, change requirements/requirements.txt to requirements/requirements-gpu.txt): conda create --name dog-project python=3.5 activate dog-project pip install -r requirements/requirements.txt (Optional) If you are using AWS, install Tensorflow. sudo python3 -m pip install -r requirements/requirements-gpu.txt Switch Keras backend to TensorFlow. Linux or Mac: KERAS_BACKEND=tensorflow python -c "from keras import backend" Windows: set KERAS_BACKEND=tensorflow python -c "from keras import backend" (Optional) If you are running the project on your local machine (and not using AWS), create an IPython kernel for the dog-project environment. python -m ipykernel install --user --name dog-project --display-name "dog-project" Open the notebook. jupyter notebook dog_app.ipynb (Optional) If you are running the project on your local machine (and not using AWS), before running code, change the kernel to match the dog-project environment by using the drop-down menu (Kernel > Change kernel > dog-project). Then, follow the instructions in the notebook. NOTE: While some code has already been implemented to get you started, you will need to implement additional functionality to successfully answer all of the questions included in the notebook. Unless requested, do not modify code that has already been included. Evaluation Your project will be reviewed by a Udacity reviewer against the CNN project rubric. Review this rubric thoroughly, and self-evaluate your project before submission. All criteria found in the rubric must meet specifications for you to pass. Project Submission When you are ready to submit your project, collect the following files and compress them into a single archive for upload: The dog_app.ipynb file with fully functional code, all code cells executed and displaying output, and all questions answered. An HTML or PDF export of the project notebook with the name report.html or report.pdf. Any additional images used for the project that were not supplied to you for the project. Please do not include the project data sets in the dogImages/ or lfw/ folders. Likewise, please do not include the bottleneck_features/ folder.
noahjonesx
Markov Text Generation Problem Description The Infinite Monkey Theorem1 (IFT) says that if a monkey hits keys at random on a typewriter it will almost surely, given an infinite amount of time, produce a chosen text (like the Declaration of Independence, Hamlet, or a script for ... Planet of the Apes). The probability of this actually happening is, of course, very small but the IFT claims that it is still possible. Some people have tested this hypotheis in software and, after billions and billions of simulated years, one virtual monkey was able to type out a sequence of 19 letters that can be found in Shakespeare’s The Two Gentlemen of Verona. (See the April 9, 2007 edition of The New Yorker if you’re interested; but, hypothesis testing with real monkeys2 is far more entertaining.) The IFT might lead to some interesting conversations with Rust Cohle, but the practical applications are few. It does, however, bring up the idea of automated text generation, and there the ideas and applications are not only interesting but also important. Claude Shannon essentially founded the field of information theory with the publication of his landmark paper A Mathematical Theory of Computation3 in 1948. Shannon described a method for using Markov chains to produce a reasonable imitation of a known text with sometimes startling results. For example, here is a sample of text generated from a Markov model of the script for the 1967 movie Planet of the Apes. "PLANET OF THE APES" Screenplay by Michael Wilson Based on Novel By Pierre Boulle DISSOLVE TO: 138 EXT. GROVE OF FRUIT TREES - ESTABLISHING SHOT - DAY Zira run back to the front of Taylor. The President, I believe the prosecutor's charge of this man. ZIRA Well, whoever owned them was in pretty bad shape. He picks up two of the strain. You got what you wanted, kid. How does it taste? Silence. Taylor and cuffs him. Over this we HEAR from a distance is a crude horse-drawn wagon is silhouetted-against the trunks and branches of great trees and bushes on the horse's rump. Taylor lifts his right arm to ward off the blow, and the room and lands at the feet of Cornelius and Lucius are sorting out equipment falls to his knees, buries his head silently at the Ranch). DISSOLVE TO: 197 INT. CAGES - CLOSE SHOT - FEATURING LANDON - FROM TAYLOR'S VOICE (o.s.) I've got a fine veternary surgeons under my direction? ZIRA Taylor! ZIRA There is a small lake, looking like a politician. TAYLOR Dodge takes a pen and notebook from the half-open door of a guard room. Taylor bursts suddenly confronted by his 1https://en.wikipedia.org/wiki/Infinite_monkey_theorem2https://web.archive.org/web/20130120215600/http://www.vivaria.net/experiments/notes/publication/NOTES_ EN.pdf3http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6773024 1 original pursuer (the dismounted cop coming up with a cigar butt and places it in the drawer beside them. TAYLOR What's the best there is a. loud RAP at the doll was found beside the building. Zira waits at the third table. TAYLOR Good question. Is he a man? CORNELIUS (impatiently. DODGE Blessed are the vegetation. These SHOTS are INTERCUT with: 94 WHAT THE ASTRONAUTS They examine the remnants of the cage. ZIRA (plunging on) Their speech organs are adequate. The flaw lies not in anatomy but in the back of his left sleeve. TAYLOR (taking off his shirt. 80 DODGE AND LANDON You don't sound happy in your work. GALEN (defensively) Gorilla hunter stands over a dead man, one fo Besides a few spelling errors and some rather odd things that make you wonder about the author, this passage is surprisingly human-like. This is a simple example of natural language generation, a sub-area of natural language processing—a very active area of research in computer science. The particular approach we’re using in this assignment was famously implemented as the fictitious Mark V. Shaney4 and the Emacs command Disassociated Press5. Approach So, here’s the basic idea: Imagine taking a book (say, Tom Sawyer) and determining the probability with which each character occurs. You would probably find that spaces are the most common, that the character ‘e’ is fairly common, and that the character ‘q’ is rather uncommon. After completing this “level 0” analysis, you would be able to produce random Tom Sawyer text based on character probabilities. It wouldn’t have much in common with the real thing, but at least the characters would tend to occur in the proper propor- tion. In fact, here’s an example of what you might produce: Level 0 rla bsht eS ststofo hhfosdsdewno oe wee h .mr ae irii ela iad o r te u t mnyto onmalysnce, ifu en c fDwn oee iteo Now imagine doing a slightly more sophisticated level 1 analysis by determining the probability with which each character follows every other character. You would probably discover that ‘h’ follows ‘t’ more frequently than ‘x’ does, and you would probably discover that a space follows ‘.’ more frequently than ‘,’ does. You could now produce some randomly generated Tom Sawyer text by picking a character to begin with and then always choosing the next character based on the previous one and the probabilities revealed by the analysis. Here’s an example: Level 1 "Shand tucthiney m?" le ollds mind Theybooure He, he s whit Pereg lenigabo Jodind alllld ashanthe ainofevids tre lin-p asto oun theanthadomoere Now imagine doing a level k analysis by determining the probability with which each character follows every possible sequence of characters of length k (kgrams). A level 5 analysis of Tom Sawyer for example, would reveal that ‘r’ follows “Sawye” more frequently than any other character. After a level k analysis, you would be able to produce random Tom Sawyer by always choosing the next character based on the previous k characters (a kgram) and the probabilities revealed by the analysis. 4https://en.wikipedia.org/wiki/Mark_V._Shaney5https://en.wikipedia.org/wiki/Dissociated_press Page 2 of 5 At only a moderate level of analysis (say, levels 5-7), the randomly generated text begins to take on many of the characteristics of the source text. It probably won’t make complete sense, but you’ll be able to tell that it was derived from Tom Sawyer as opposed to, say, The Sound and the Fury. Here are some more examples of text that is generated from increasing levels of analysis of Tom Sawyer. (These “levels of analysis” are called order K Markov models.) K = 2 "Yess been." for gothin, Tome oso; ing, in to weliss of an’te cle - armit. Papper a comeasione, and smomenty, fropeck hinticer, sid, a was Tom, be suck tied. He sis tred a youck to themen K = 4 en themself, Mr. Welshman, but him awoke, the balmy shore. I’ll give him that he couple overy because in the slated snufflindeed structure’s kind was rath. She said that the wound the door a fever eyes that WITH him. K = 6 people had eaten, leaving. Come - didn’t stand it better judgment; His hands and bury it again, tramped herself! She’d never would be. He found her spite of anything the one was a prime feature sunset, and hit upon that of the forever. K = 8 look-a-here - I told you before, Joe. I’ve heard a pin drop. The stillness was complete, how- ever, this is awful crime, beyond the village was sufficient. He would be a good enough to get that night, Tom and Becky. K = 10 you understanding that they don’t come around in the cave should get the word "beauteous" was over-fondled, and that together" and decided that he might as we used to do - it’s nobby fun. I’ll learn you." To create an order K Markov model of a given source text, you would need to identify all kgrams in the source text and associate with each kgram all the individual characters that follow it. This association or mapping must also capture the frequency with which a given character follows a given kgram. For example, suppose that k = 2 and the sample text is: agggcagcgggcg The Markov model would have to represent all the character strings of length two (2-grams) in the source text, and associate with them the characters that follow them, and in the correct proportion. The following table shows one way of representing this information. kgram Characters that follow ag gc gg gcgc gc agg ca g cg g Once you have created an order K Markov model of a given source text, you can generate new text based on this model as follows. Page 3 of 5 1. Randomly pick k consecutive characters that appear in the sample text and use them as the initial kgram. 2. Append the kgram to the output text being generated. 3. Repeat the following steps until the output text is sufficiently long. (a) Select a character c that appears in the sample text based on the probability of that character following the current kgram. (b) Append this character to the output text. (c) Update the kgram by removing its first character and adding the character just chosen (c) as its last character. If this process encounters a situation in which there are no characters to choose from (which can happen if the only occurrence of the current kgram is at the exact end of the source), simply pick a new kgram at random and continue. As an example, suppose that k = 2 and the sample text is that from above: agggcagcgggcg Here are four different output text strings of length 10 that could have been the result of the process described above, using the first two characters (’ag’) as the initial kgram. agcggcagcg aggcaggcgg agggcaggcg agcggcggca For another example, suppose that k = 2 and the sample text is: the three pirates charted that course the other day Here is how the first three characters of new text might be generated: •A two-character sequence is chosen at random to become the initial kgram. Let’s suppose that “th” is chosen. So, kgram = th and output = th. •The first character must be chosen based on the probability that it follows the kgram (currently “th”) in the source. The source contains five occurrences of “th”. Three times it is followed by ’e’, once it is followed by ’r’, and once it is followed by ’a’. Thus, the next character must be chosen so that there is a 3/5 chance that an ’e’ will be chosen, a 1/5 chance that an ’r’ will be chosen, and a 1/5 chance that an ’a’ will be chosen. Let’s suppose that we choose an ’e’ this time. So, kgram = he and output = the. •The next character must be chosen based on the probability that it follows the kgram (currently “he”) in the source. The source contains three occurrences of “he”. Twice it is followed by a space and once it is followed by ’r’. Thus, the next character must be chosen so that there is a 2/3 chance that a space will be chosen and a 1/3 chance that an ’r’ will be chosen. Let’s suppose that we choose an ’r’ this time. So, kgram = er and output = ther. •The next character must be chosen based on the probability that it follows the kgram (currently “er”) in the source. The source contains only one occurrence of “er”, and it is followed by a space. Thus, the next character must be a space. So, kgram = r_ and output = ther_, where ’_’ represents a blank space. Page 4 of 5 Implementation Details You are provided with two Java files that you must use to develop your solution: MarkovModel.java and TextGenerator.java. The constructors of MarkovModel build the order-k model of the source text. You are required to represent the model with the provided HashMap field. The main method of TextGenerator must process the following three command line arguments (in the args array): •A non-negative integer k •A non-negative integer length. •The name of an input file source that contains more than k characters. Your program must validate the command line arguments by making sure that k and length are non- negative and that source contains at least k characters and can be opened for reading. If any of the command line arguments are invalid, your program must write an informative error message to System.out and terminate. If there are not enough command line arguments, your program must write an informative error message to System.out and terminate. With valid command line arguments, your program must use the methods of the MarkovModel class to create an order k Markov model of the sample text, select the initial kgram, and make each character selection. You must implement the MarkovModel methods according to description of the Markov modeling process in the section above. A few sample texts have been provided, but Project Gutenberg (http://www.gutenberg.org) maintains a large collection of public domain literary works that you can use as source texts for fun and practice. Acknowledgments This assignment is based on the ideas of many people, Jon Bentley and Owen Astrachan in particular.
LeoZorzoli
:notebook: Project 1: Wiki for CS50 Web. Create a wikipedia-style page with a markup language for entries
CoderAvi
it is a simple GUI based python project. i have created a simple interface where user can book lpg gas , track his booking records and cancel the bookings. this project is based on python language and using sql query storing the records of customers activity in the database. i have created 2 .py file , one is for booking interface and another is for appointment. you can download this zip file and simply run in jupyter notebook or spyder or anyother python interpreter.
abid-mahdi
A collection of notebooks for my final year project. The notebooks are used to create a virtual personal trainer to check bicep curls, squats and overhead presses.
AmberLee2427
The goal of this project is to create an all-encompassing collection of Jupyter notebooks—your trusty companions for engaging exercises related to microlensing. Through these notebooks, the insights and experiences of microlensing elders can light your path as you embark on your journey of discovery and exploration through scientific research.
dgrubis
Jupyter notebook that outlines the process of creating a machine learning predictive model. Predicts the peak "Wins Shared" by the current draft prospects based on numerous features such as college stats, projected draft pick, physical profile and age. I try out multiple models and pick the best performing one for the data from my judgement.
shaadclt
This project provides a collection of Jupyter Notebook exercises for practicing Matplotlib plots, including bar plots, histograms, pie charts, and scatter plots. Matplotlib is a powerful data visualization library in Python that allows for creating a wide range of plots and visualizations.
pnguenda
# Pandas Homework - Pandas, Pandas, Pandas ## Background The data dive continues! Now, it's time to take what you've learned about Python Pandas and apply it to new situations. For this assignment, you'll need to complete **one of two** (not both) Data Challenges. Once again, which challenge you take on is your choice. Just be sure to give it your all -- as the skills you hone will become powerful tools in your data analytics tool belt. ### Before You Begin 1. Create a new repository for this project called `pandas-challenge`. **Do not add this homework to an existing repository**. 2. Clone the new repository to your computer. 3. Inside your local git repository, create a directory for the Pandas Challenge you choose. Use folder names corresponding to the challenges: **HeroesOfPymoli** or **PyCitySchools**. 4. Add your Jupyter notebook to this folder. This will be the main script to run for analysis. 5. Push the above changes to GitHub or GitLab. ## Option 1: Heroes of Pymoli  Congratulations! After a lot of hard work in the data munging mines, you've landed a job as Lead Analyst for an independent gaming company. You've been assigned the task of analyzing the data for their most recent fantasy game Heroes of Pymoli. Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights. Your final report should include each of the following: ### Player Count * Total Number of Players ### Purchasing Analysis (Total) * Number of Unique Items * Average Purchase Price * Total Number of Purchases * Total Revenue ### Gender Demographics * Percentage and Count of Male Players * Percentage and Count of Female Players * Percentage and Count of Other / Non-Disclosed ### Purchasing Analysis (Gender) * The below each broken by gender * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Gender ### Age Demographics * The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.) * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Age Group ### Top Spenders * Identify the the top 5 spenders in the game by total purchase value, then list (in a table): * SN * Purchase Count * Average Purchase Price * Total Purchase Value ### Most Popular Items * Identify the 5 most popular items by purchase count, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value ### Most Profitable Items * Identify the 5 most profitable items by total purchase value, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value As final considerations: * You must use the Pandas Library and the Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of three observable trends based on the data. * See [Example Solution](HeroesOfPymoli/HeroesOfPymoli_starter.ipynb) for a reference on expected format. ## Option 2: PyCitySchools  Well done! Having spent years analyzing financial records for big banks, you've finally scratched your idealistic itch and joined the education sector. In your latest role, you've become the Chief Data Scientist for your city's school district. In this capacity, you'll be helping the school board and mayor make strategic decisions regarding future school budgets and priorities. As a first task, you've been asked to analyze the district-wide standardized test results. You'll be given access to every student's math and reading scores, as well as various information on the schools they attend. Your responsibility is to aggregate the data to and showcase obvious trends in school performance. Your final report should include each of the following: ### District Summary * Create a high level snapshot (in table form) of the district's key metrics, including: * Total Schools * Total Students * Total Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### School Summary * Create an overview table that summarizes key metrics about each school, including: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Top Performing Schools (By % Overall Passing) * Create a table that highlights the top 5 performing schools based on % Overall Passing. Include: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Bottom Performing Schools (By % Overall Passing) * Create a table that highlights the bottom 5 performing schools based on % Overall Passing. Include all of the same metrics as above. ### Math Scores by Grade\*\* * Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Reading Scores by Grade * Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Scores by School Spending * Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following: * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Scores by School Size * Repeat the above breakdown, but this time group schools based on a reasonable approximation of school size (Small, Medium, Large). ### Scores by School Type * Repeat the above breakdown, but this time group schools based on school type (Charter vs. District). As final considerations: * Use the pandas library and Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of at least two observable trends based on the data. * See [Example Solution](PyCitySchools/PyCitySchools_starter.ipynb) for a reference on the expected format. ## Hints and Considerations * These are challenging activities for a number of reasons. For one, these activities will require you to analyze thousands of records. Hacking through the data to look for obvious trends in Excel is just not a feasible option. The size of the data may seem daunting, but pandas will allow you to efficiently parse through it. * Second, these activities will also challenge you by requiring you to learn on your feet. Don't fool yourself into thinking: "I need to study pandas more closely before diving in." Get the basic gist of the library and then _immediately_ get to work. When facing a daunting task, it's easy to think: "I'm just not ready to tackle it yet." But that's the surest way to never succeed. Learning to program requires one to constantly tinker, experiment, and learn on the fly. You are doing exactly the _right_ thing, if you find yourself constantly practicing Google-Fu and diving into documentation. There is just no way (or reason) to try and memorize it all. Online references are available for you to use when you need them. So use them! * Take each of these tasks one at a time. Begin your work, answering the basic questions: "How do I import the data?" "How do I convert the data into a DataFrame?" "How do I build the first table?" Don't get intimidated by the number of asks. Many of them are repetitive in nature with just a few tweaks. Be persistent and creative! * Expect these exercises to take time! Don't get discouraged if you find yourself spending hours initially with little progress. Force yourself to deal with the discomfort of not knowing and forge ahead. Consider these hours an investment in your future! * As always, feel encouraged to work in groups and get help from your TAs and Instructor. Just remember, true success comes from mastery and _not_ a completed homework assignment. So challenge yourself to truly succeed! ### Copyright Trilogy Education Services © 2019. All Rights Reserved.
smith-jj
# Employee Database: A Mystery in Two Parts  ## Background It is a beautiful spring day, and it is two weeks since you have been hired as a new data engineer at Pewlett Hackard. Your first major task is a research project on employees of the corporation from the 1980s and 1990s. All that remain of the database of employees from that period are six CSV files. In this assignment, you will design the tables to hold data in the CSVs, import the CSVs into a SQL database, and answer questions about the data. In other words, you will perform: 1. Data Modeling 2. Data Engineering 3. Data Analysis ## Instructions #### Data Modeling Inspect the CSVs and sketch out an ERD of the tables. Feel free to use a tool like [http://www.quickdatabasediagrams.com](http://www.quickdatabasediagrams.com). #### Data Engineering * Use the information you have to create a table schema for each of the six CSV files. Remember to specify data types, primary keys, foreign keys, and other constraints. * Import each CSV file into the corresponding SQL table. #### Data Analysis Once you have a complete database, do the following: 1. List the following details of each employee: employee number, last name, first name, gender, and salary. 2. List employees who were hired in 1986. 3. List the manager of each department with the following information: department number, department name, the manager's employee number, last name, first name, and start and end employment dates. 4. List the department of each employee with the following information: employee number, last name, first name, and department name. 5. List all employees whose first name is "Hercules" and last names begin with "B." 6. List all employees in the Sales department, including their employee number, last name, first name, and department name. 7. List all employees in the Sales and Development departments, including their employee number, last name, first name, and department name. 8. In descending order, list the frequency count of employee last names, i.e., how many employees share each last name. ## Bonus (Optional) As you examine the data, you are overcome with a creeping suspicion that the dataset is fake. You surmise that your boss handed you spurious data in order to test the data engineering skills of a new employee. To confirm your hunch, you decide to take the following steps to generate a visualization of the data, with which you will confront your boss: 1. Import the SQL database into Pandas. (Yes, you could read the CSVs directly in Pandas, but you are, after all, trying to prove your technical mettle.) This step may require some research. Feel free to use the code below to get started. Be sure to make any necessary modifications for your username, password, host, port, and database name: ```sql from sqlalchemy import create_engine engine = create_engine('postgresql://localhost:5432/<your_db_name>') connection = engine.connect() ``` * Consult [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/latest/core/engines.html#postgresql) for more information. * If using a password, do not upload your password to your GitHub repository. See [https://www.youtube.com/watch?v=2uaTPmNvH0I](https://www.youtube.com/watch?v=2uaTPmNvH0I) and [https://martin-thoma.com/configuration-files-in-python/](https://martin-thoma.com/configuration-files-in-python/) for more information. 2. Create a bar chart of average salary by title. 3. You may also include a technical report in markdown format, in which you outline the data engineering steps taken in the homework assignment. ## Epilogue Evidence in hand, you march into your boss's office and present the visualization. With a sly grin, your boss thanks you for your work. On your way out of the office, you hear the words, "Search your ID number." You look down at your badge to see that your employee ID number is 499942. ## Submission * Create an image file of your ERD. * Create a `.sql` file of your table schemata. * Create a `.sql` file of your queries. * (Optional) Create a Jupyter Notebook of the bonus analysis. * Create and upload a repository with the above files to GitHub and post a link on BootCamp Spot.
Nitinguptadu
Introduction The original code of Keras version of Faster R-CNN I used was written by yhenon (resource link: GitHub .) He used the PASCAL VOC 2007, 2012, and MS COCO datasets. For me, I just extracted three classes, “Person”, “Car” and “Mobile phone”, from Google’s Open Images Dataset V4. I applied configs different from his work to fit my dataset and I removed unuseful code. Btw, to run this on Google Colab (for free GPU computing up to 12hrs), I compressed all the code into three .ipynb notebooks. Sorry for the messy structure. I wrote my exploring and experiment results for Faster R-CNN in this article in Medium. If you are in China, you cannot directly access Medium. So I make a copy in here. Project Structure Object_Detection_DataPreprocessing.ipynb is the file to extract subdata from Open Images Dataset V4 which includes downloading the images and creating the annotation files for our training. I run this part by my own computer because of no need for GPU computation. frcnn_train_vgg.ipynb is the file to train the model. The configuration and model saved path are inside this file. frcnn_test_vgg.ipynb is the file to test the model with test images and calculate the mAP (mean average precision) for the model.
narekye
🔥 There is included projects via JavaScript, HTML, CSS, AngularJS, AJAX requests. created and tested with Visual Studio 2017 Release Candidate edition. :notebook:
Bibhuti5
Potato Disease Classification Setup for Python: Install Python (Setup instructions) Install Python packages pip3 install -r training/requirements.txt pip3 install -r api/requirements.txt Install Tensorflow Serving (Setup instructions) Setup for ReactJS Install Nodejs (Setup instructions) Install NPM (Setup instructions) Install dependencies cd frontend npm install --from-lock-json npm audit fix Copy .env.example as .env. Change API url in .env. Setup for React-Native app Initial setup for React-Native app(Setup instructions) Install dependencies cd mobile-app yarn install cd ios && pod install && cd ../ Copy .env.example as .env. Change API url in .env. Training the Model Download the data from kaggle. Only keep folders related to Potatoes. Run Jupyter Notebook in Browser. jupyter notebook Open training/potato-disease-training.ipynb in Jupyter Notebook. In cell #2, update the path to dataset. Run all the Cells one by one. Copy the model generated and save it with the version number in the models folder. Running the API Using FastAPI Get inside api folder cd api Run the FastAPI Server using uvicorn uvicorn main:app --reload --host 0.0.0.0 Your API is now running at 0.0.0.0:8000 Using FastAPI & TF Serve Get inside api folder cd api Copy the models.config.example as models.config and update the paths in file. Run the TF Serve (Update config file path below) docker run -t --rm -p 8501:8501 -v C:/Code/potato-disease-classification:/potato-disease-classification tensorflow/serving --rest_api_port=8501 --model_config_file=/potato-disease-classification/models.config Run the FastAPI Server using uvicorn For this you can directly run it from your main.py or main-tf-serving.py using pycharm run option (as shown in the video tutorial) OR you can run it from command prompt as shown below, uvicorn main-tf-serving:app --reload --host 0.0.0.0 Your API is now running at 0.0.0.0:8000 Running the Frontend Get inside api folder cd frontend Copy the .env.example as .env and update REACT_APP_API_URL to API URL if needed. Run the frontend npm run start Running the app Get inside mobile-app folder cd mobile-app Copy the .env.example as .env and update URL to API URL if needed. Run the app (android/iOS) npm run android or npm run ios Creating the TF Lite Model Run Jupyter Notebook in Browser. jupyter notebook Open training/tf-lite-converter.ipynb in Jupyter Notebook. In cell #2, update the path to dataset. Run all the Cells one by one. Model would be saved in tf-lite-models folder. Deploying the TF Lite on GCP Create a GCP account. Create a Project on GCP (Keep note of the project id). Create a GCP bucket. Upload the tf-lite model generate in the bucket in the path models/potato-model.tflite. Install Google Cloud SDK (Setup instructions). Authenticate with Google Cloud SDK. gcloud auth login Run the deployment script. cd gcp gcloud functions deploy predict_lite --runtime python38 --trigger-http --memory 512 --project project_id Your model is now deployed. Use Postman to test the GCF using the Trigger URL. Inspiration: https://cloud.google.com/blog/products/ai-machine-learning/how-to-serve-deep-learning-models-using-tensorflow-2-0-with-cloud-functions Deploying the TF Model (.h5) on GCP Create a GCP account. Create a Project on GCP (Keep note of the project id). Create a GCP bucket. Upload the tf .h5 model generate in the bucket in the path models/potato-model.h5. Install Google Cloud SDK (Setup instructions). Authenticate with Google Cloud SDK. gcloud auth login Run the deployment script. cd gcp gcloud functions deploy predict --runtime python38 --trigger-http --memory 512 --project project_id Your model is now deployed. Use Postman to test the GCF using the Trigger URL. Inspiration: https://cloud.google.com/blog/products/ai-machine-learning/how-to-serve-deep-learning-models-using-tensorflow-2-0-with-cloud-functions
gitofanandaghosh
Projecting a society’s long-term electricity demand helps determine what capacity is needed for future energy generation. Accurate forecasts of electricity demand inform investment decisions about power generation and supporting network infrastructure. Of major interest to energy policymakers, power utilities, and private investors alike, forecasts are also essential for development professionals. Inaccurate forecasts, whether they over-or under predict demand, can have dire social and economic consequences. Underestimating demand results in supply shortages and forced power outages, with serious consequences for productivity and economic growth. Overestimating demand can lead to overinvestment in generation capacity, possible financial distress, and, ultimately, higher electricity prices. Several techniques have been developed over the last few decades to accurately predict the future of energy consumption. The main objective of this notebook is to create an energy demand forecasting system using Facebook Prophet. Facebook Prophet is an open-source library released by Facebook’s Core Data Science team. The data I am going to use for this particular task is the new york electricity demand dataset.