Found 2,926 repositories(showing 30)
sayantann11
Classification - Machine Learning This is ‘Classification’ tutorial which is a part of the Machine Learning course offered by Simplilearn. We will learn Classification algorithms, types of classification algorithms, support vector machines(SVM), Naive Bayes, Decision Tree and Random Forest Classifier in this tutorial. Objectives Let us look at some of the objectives covered under this section of Machine Learning tutorial. Define Classification and list its algorithms Describe Logistic Regression and Sigmoid Probability Explain K-Nearest Neighbors and KNN classification Understand Support Vector Machines, Polynomial Kernel, and Kernel Trick Analyze Kernel Support Vector Machines with an example Implement the Naïve Bayes Classifier Demonstrate Decision Tree Classifier Describe Random Forest Classifier Classification: Meaning Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. It predicts a class for an input variable as well. There are 2 types of Classification: Binomial Multi-Class Classification: Use Cases Some of the key areas where classification cases are being used: To find whether an email received is a spam or ham To identify customer segments To find if a bank loan is granted To identify if a kid will pass or fail in an examination Classification: Example Social media sentiment analysis has two potential outcomes, positive or negative, as displayed by the chart given below. https://www.simplilearn.com/ice9/free_resources_article_thumb/classification-example-machine-learning.JPG This chart shows the classification of the Iris flower dataset into its three sub-species indicated by codes 0, 1, and 2. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-flower-dataset-graph.JPG The test set dots represent the assignment of new test data points to one class or the other based on the trained classifier model. Types of Classification Algorithms Let’s have a quick look into the types of Classification Algorithm below. Linear Models Logistic Regression Support Vector Machines Nonlinear models K-nearest Neighbors (KNN) Kernel Support Vector Machines (SVM) Naïve Bayes Decision Tree Classification Random Forest Classification Logistic Regression: Meaning Let us understand the Logistic Regression model below. This refers to a regression model that is used for classification. This method is widely used for binary classification problems. It can also be extended to multi-class classification problems. Here, the dependent variable is categorical: y ϵ {0, 1} A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail, healthy or sick, etc In this case, you model the probability distribution of output y as 1 or 0. This is called the sigmoid probability (σ). If σ(θ Tx) > 0.5, set y = 1, else set y = 0 Unlike Linear Regression (and its Normal Equation solution), there is no closed form solution for finding optimal weights of Logistic Regression. Instead, you must solve this with maximum likelihood estimation (a probability model to detect the maximum likelihood of something happening). It can be used to calculate the probability of a given outcome in a binary model, like the probability of being classified as sick or passing an exam. https://www.simplilearn.com/ice9/free_resources_article_thumb/logistic-regression-example-graph.JPG Sigmoid Probability The probability in the logistic regression is often represented by the Sigmoid function (also called the logistic function or the S-curve): https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-function-machine-learning.JPG In this equation, t represents data values * the number of hours studied and S(t) represents the probability of passing the exam. Assume sigmoid function: https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-probability-machine-learning.JPG g(z) tends toward 1 as z -> infinity , and g(z) tends toward 0 as z -> infinity K-nearest Neighbors (KNN) K-nearest Neighbors algorithm is used to assign a data point to clusters based on similarity measurement. It uses a supervised method for classification. The steps to writing a k-means algorithm are as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-distribution-graph-machine-learning.JPG Choose the number of k and a distance metric. (k = 5 is common) Find k-nearest neighbors of the sample that you want to classify Assign the class label by majority vote. KNN Classification A new input point is classified in the category such that it has the most number of neighbors from that category. For example: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-classification-machine-learning.JPG Classify a patient as high risk or low risk. Mark email as spam or ham. Keen on learning about Classification Algorithms in Machine Learning? Click here! Support Vector Machine (SVM) Let us understand Support Vector Machine (SVM) in detail below. SVMs are classification algorithms used to assign data to various classes. They involve detecting hyperplanes which segregate data into classes. SVMs are very versatile and are also capable of performing linear or nonlinear classification, regression, and outlier detection. Once ideal hyperplanes are discovered, new data points can be easily classified. https://www.simplilearn.com/ice9/free_resources_article_thumb/support-vector-machines-graph-machine-learning.JPG The optimization objective is to find “maximum margin hyperplane” that is farthest from the closest points in the two classes (these points are called support vectors). In the given figure, the middle line represents the hyperplane. SVM Example Let’s look at this image below and have an idea about SVM in general. Hyperplanes with larger margins have lower generalization error. The positive and negative hyperplanes are represented by: https://www.simplilearn.com/ice9/free_resources_article_thumb/positive-negative-hyperplanes-machine-learning.JPG Classification of any new input sample xtest : If w0 + wTxtest > 1, the sample xtest is said to be in the class toward the right of the positive hyperplane. If w0 + wTxtest < -1, the sample xtest is said to be in the class toward the left of the negative hyperplane. When you subtract the two equations, you get: https://www.simplilearn.com/ice9/free_resources_article_thumb/equation-subtraction-machine-learning.JPG Length of vector w is (L2 norm length): https://www.simplilearn.com/ice9/free_resources_article_thumb/length-of-vector-machine-learning.JPG You normalize with the length of w to arrive at: https://www.simplilearn.com/ice9/free_resources_article_thumb/normalize-equation-machine-learning.JPG SVM: Hard Margin Classification Given below are some points to understand Hard Margin Classification. The left side of equation SVM-1 given above can be interpreted as the distance between the positive (+ve) and negative (-ve) hyperplanes; in other words, it is the margin that can be maximized. Hence the objective of the function is to maximize with the constraint that the samples are classified correctly, which is represented as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-machine-learning.JPG This means that you are minimizing ‖w‖. This also means that all positive samples are on one side of the positive hyperplane and all negative samples are on the other side of the negative hyperplane. This can be written concisely as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-formula.JPG Minimizing ‖w‖ is the same as minimizing. This figure is better as it is differentiable even at w = 0. The approach listed above is called “hard margin linear SVM classifier.” SVM: Soft Margin Classification Given below are some points to understand Soft Margin Classification. To allow for linear constraints to be relaxed for nonlinearly separable data, a slack variable is introduced. (i) measures how much ith instance is allowed to violate the margin. The slack variable is simply added to the linear constraints. https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-machine-learning.JPG Subject to the above constraints, the new objective to be minimized becomes: https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-formula.JPG You have two conflicting objectives now—minimizing slack variable to reduce margin violations and minimizing to increase the margin. The hyperparameter C allows us to define this trade-off. Large values of C correspond to larger error penalties (so smaller margins), whereas smaller values of C allow for higher misclassification errors and larger margins. https://www.simplilearn.com/ice9/free_resources_article_thumb/machine-learning-certification-video-preview.jpg SVM: Regularization The concept of C is the reverse of regularization. Higher C means lower regularization, which increases bias and lowers the variance (causing overfitting). https://www.simplilearn.com/ice9/free_resources_article_thumb/concept-of-c-graph-machine-learning.JPG IRIS Data Set The Iris dataset contains measurements of 150 IRIS flowers from three different species: Setosa Versicolor Viriginica Each row represents one sample. Flower measurements in centimeters are stored as columns. These are called features. IRIS Data Set: SVM Let’s train an SVM model using sci-kit-learn for the Iris dataset: https://www.simplilearn.com/ice9/free_resources_article_thumb/svm-model-graph-machine-learning.JPG Nonlinear SVM Classification There are two ways to solve nonlinear SVMs: by adding polynomial features by adding similarity features Polynomial features can be added to datasets; in some cases, this can create a linearly separable dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/nonlinear-classification-svm-machine-learning.JPG In the figure on the left, there is only 1 feature x1. This dataset is not linearly separable. If you add x2 = (x1)2 (figure on the right), the data becomes linearly separable. Polynomial Kernel In sci-kit-learn, one can use a Pipeline class for creating polynomial features. Classification results for the Moons dataset are shown in the figure. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-machine-learning.JPG Polynomial Kernel with Kernel Trick Let us look at the image below and understand Kernel Trick in detail. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-with-kernel-trick.JPG For large dimensional datasets, adding too many polynomial features can slow down the model. You can apply a kernel trick with the effect of polynomial features without actually adding them. The code is shown (SVC class) below trains an SVM classifier using a 3rd-degree polynomial kernel but with a kernel trick. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-equation-machine-learning.JPG The hyperparameter coefθ controls the influence of high-degree polynomials. Kernel SVM Let us understand in detail about Kernel SVM. Kernel SVMs are used for classification of nonlinear data. In the chart, nonlinear data is projected into a higher dimensional space via a mapping function where it becomes linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-machine-learning.JPG In the higher dimension, a linear separating hyperplane can be derived and used for classification. A reverse projection of the higher dimension back to original feature space takes it back to nonlinear shape. As mentioned previously, SVMs can be kernelized to solve nonlinear classification problems. You can create a sample dataset for XOR gate (nonlinear problem) from NumPy. 100 samples will be assigned the class sample 1, and 100 samples will be assigned the class label -1. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-graph-machine-learning.JPG As you can see, this data is not linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-non-separable.JPG You now use the kernel trick to classify XOR dataset created earlier. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-xor-machine-learning.JPG Naïve Bayes Classifier What is Naive Bayes Classifier? Have you ever wondered how your mail provider implements spam filtering or how online news channels perform news text classification or even how companies perform sentiment analysis of their audience on social media? All of this and more are done through a machine learning algorithm called Naive Bayes Classifier. Naive Bayes Named after Thomas Bayes from the 1700s who first coined this in the Western literature. Naive Bayes classifier works on the principle of conditional probability as given by the Bayes theorem. Advantages of Naive Bayes Classifier Listed below are six benefits of Naive Bayes Classifier. Very simple and easy to implement Needs less training data Handles both continuous and discrete data Highly scalable with the number of predictors and data points As it is fast, it can be used in real-time predictions Not sensitive to irrelevant features Bayes Theorem We will understand Bayes Theorem in detail from the points mentioned below. According to the Bayes model, the conditional probability P(Y|X) can be calculated as: P(Y|X) = P(X|Y)P(Y) / P(X) This means you have to estimate a very large number of P(X|Y) probabilities for a relatively small vector space X. For example, for a Boolean Y and 30 possible Boolean attributes in the X vector, you will have to estimate 3 billion probabilities P(X|Y). To make it practical, a Naïve Bayes classifier is used, which assumes conditional independence of P(X) to each other, with a given value of Y. This reduces the number of probability estimates to 2*30=60 in the above example. Naïve Bayes Classifier for SMS Spam Detection Consider a labeled SMS database having 5574 messages. It has messages as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-machine-learning.JPG Each message is marked as spam or ham in the data set. Let’s train a model with Naïve Bayes algorithm to detect spam from ham. The message lengths and their frequency (in the training dataset) are as shown below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-spam-detection.JPG Analyze the logic you use to train an algorithm to detect spam: Split each message into individual words/tokens (bag of words). Lemmatize the data (each word takes its base form, like “walking” or “walked” is replaced with “walk”). Convert data to vectors using scikit-learn module CountVectorizer. Run TFIDF to remove common words like “is,” “are,” “and.” Now apply scikit-learn module for Naïve Bayes MultinomialNB to get the Spam Detector. This spam detector can then be used to classify a random new message as spam or ham. Next, the accuracy of the spam detector is checked using the Confusion Matrix. For the SMS spam example above, the confusion matrix is shown on the right. Accuracy Rate = Correct / Total = (4827 + 592)/5574 = 97.21% Error Rate = Wrong / Total = (155 + 0)/5574 = 2.78% https://www.simplilearn.com/ice9/free_resources_article_thumb/confusion-matrix-machine-learning.JPG Although confusion Matrix is useful, some more precise metrics are provided by Precision and Recall. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-recall-matrix-machine-learning.JPG Precision refers to the accuracy of positive predictions. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-formula-machine-learning.JPG Recall refers to the ratio of positive instances that are correctly detected by the classifier (also known as True positive rate or TPR). https://www.simplilearn.com/ice9/free_resources_article_thumb/recall-formula-machine-learning.JPG Precision/Recall Trade-off To detect age-appropriate videos for kids, you need high precision (low recall) to ensure that only safe videos make the cut (even though a few safe videos may be left out). The high recall is needed (low precision is acceptable) in-store surveillance to catch shoplifters; a few false alarms are acceptable, but all shoplifters must be caught. Learn about Naive Bayes in detail. Click here! Decision Tree Classifier Some aspects of the Decision Tree Classifier mentioned below are. Decision Trees (DT) can be used both for classification and regression. The advantage of decision trees is that they require very little data preparation. They do not require feature scaling or centering at all. They are also the fundamental components of Random Forests, one of the most powerful ML algorithms. Unlike Random Forests and Neural Networks (which do black-box modeling), Decision Trees are white box models, which means that inner workings of these models are clearly understood. In the case of classification, the data is segregated based on a series of questions. Any new data point is assigned to the selected leaf node. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-machine-learning.JPG Start at the tree root and split the data on the feature using the decision algorithm, resulting in the largest information gain (IG). This splitting procedure is then repeated in an iterative process at each child node until the leaves are pure. This means that the samples at each node belonging to the same class. In practice, you can set a limit on the depth of the tree to prevent overfitting. The purity is compromised here as the final leaves may still have some impurity. The figure shows the classification of the Iris dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-graph.JPG IRIS Decision Tree Let’s build a Decision Tree using scikit-learn for the Iris flower dataset and also visualize it using export_graphviz API. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-machine-learning.JPG The output of export_graphviz can be converted into png format: https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-output.JPG Sample attribute stands for the number of training instances the node applies to. Value attribute stands for the number of training instances of each class the node applies to. Gini impurity measures the node’s impurity. A node is “pure” (gini=0) if all training instances it applies to belong to the same class. https://www.simplilearn.com/ice9/free_resources_article_thumb/impurity-formula-machine-learning.JPG For example, for Versicolor (green color node), the Gini is 1-(0/54)2 -(49/54)2 -(5/54) 2 ≈ 0.168 https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-sample.JPG Decision Boundaries Let us learn to create decision boundaries below. For the first node (depth 0), the solid line splits the data (Iris-Setosa on left). Gini is 0 for Setosa node, so no further split is possible. The second node (depth 1) splits the data into Versicolor and Virginica. If max_depth were set as 3, a third split would happen (vertical dotted line). https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-boundaries.JPG For a sample with petal length 5 cm and petal width 1.5 cm, the tree traverses to depth 2 left node, so the probability predictions for this sample are 0% for Iris-Setosa (0/54), 90.7% for Iris-Versicolor (49/54), and 9.3% for Iris-Virginica (5/54) CART Training Algorithm Scikit-learn uses Classification and Regression Trees (CART) algorithm to train Decision Trees. CART algorithm: Split the data into two subsets using a single feature k and threshold tk (example, petal length < “2.45 cm”). This is done recursively for each node. k and tk are chosen such that they produce the purest subsets (weighted by their size). The objective is to minimize the cost function as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/cart-training-algorithm-machine-learning.JPG The algorithm stops executing if one of the following situations occurs: max_depth is reached No further splits are found for each node Other hyperparameters may be used to stop the tree: min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes Gini Impurity or Entropy Entropy is one more measure of impurity and can be used in place of Gini. https://www.simplilearn.com/ice9/free_resources_article_thumb/gini-impurity-entrophy.JPG It is a degree of uncertainty, and Information Gain is the reduction that occurs in entropy as one traverses down the tree. Entropy is zero for a DT node when the node contains instances of only one class. Entropy for depth 2 left node in the example given above is: https://www.simplilearn.com/ice9/free_resources_article_thumb/entrophy-for-depth-2.JPG Gini and Entropy both lead to similar trees. DT: Regularization The following figure shows two decision trees on the moons dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/dt-regularization-machine-learning.JPG The decision tree on the right is restricted by min_samples_leaf = 4. The model on the left is overfitting, while the model on the right generalizes better. Random Forest Classifier Let us have an understanding of Random Forest Classifier below. A random forest can be considered an ensemble of decision trees (Ensemble learning). Random Forest algorithm: Draw a random bootstrap sample of size n (randomly choose n samples from the training set). Grow a decision tree from the bootstrap sample. At each node, randomly select d features. Split the node using the feature that provides the best split according to the objective function, for instance by maximizing the information gain. Repeat the steps 1 to 2 k times. (k is the number of trees you want to create, using a subset of samples) Aggregate the prediction by each tree for a new data point to assign the class label by majority vote (pick the group selected by the most number of trees and assign new data point to that group). Random Forests are opaque, which means it is difficult to visualize their inner workings. https://www.simplilearn.com/ice9/free_resources_article_thumb/random-forest-classifier-graph.JPG However, the advantages outweigh their limitations since you do not have to worry about hyperparameters except k, which stands for the number of decision trees to be created from a subset of samples. RF is quite robust to noise from the individual decision trees. Hence, you need not prune individual decision trees. The larger the number of decision trees, the more accurate the Random Forest prediction is. (This, however, comes with higher computation cost). Key Takeaways Let us quickly run through what we have learned so far in this Classification tutorial. Classification algorithms are supervised learning methods to split data into classes. They can work on Linear Data as well as Nonlinear Data. Logistic Regression can classify data based on weighted parameters and sigmoid conversion to calculate the probability of classes. K-nearest Neighbors (KNN) algorithm uses similar features to classify data. Support Vector Machines (SVMs) classify data by detecting the maximum margin hyperplane between data classes. Naïve Bayes, a simplified Bayes Model, can help classify data using conditional probability models. Decision Trees are powerful classifiers and use tree splitting logic until pure or somewhat pure leaf node classes are attained. Random Forests apply Ensemble Learning to Decision Trees for more accurate classification predictions. Conclusion This completes ‘Classification’ tutorial. In the next tutorial, we will learn 'Unsupervised Learning with Clustering.'
xploitspeeds
* READ THE README FOR INFO!! * Incoming Tags- z score statistics,find mean median mode statistics in ms excel,variance,standard deviation,linear regression,data processing,confidence intervals,average value,probability theory,binomial distribution,matrix,random numbers,error propagation,t statistics analysis,hypothesis testing,theorem,chi square,time series,data collection,sampling,p value,scatterplots,statistics lectures,statistics tutorials,business mathematics statistics,share stock market statistics in calculator,business analytics,GTA,continuous frequency distribution,statistics mathematics in real life,modal class,n is even,n is odd,median mean of series of numbers,math help,Sujoy Krishna Das,n+1/2 element,measurement of variation,measurement of central tendency,range of numbers,interquartile range,casio fx991,casio fx82,casio fx570,casio fx115es,casio 9860,casio 9750,casio 83gt,TI BAII+ financial,casio piano,casio calculator tricks and hacks,how to cheat in exam and not get caught,grouped interval data,equation of triangle rectangle curve parabola hyperbola,graph theory,operation research(OR),numerical methods,decision making,pie chart,bar graph,computer data analysis,histogram,statistics formula,matlab tutorial,find arithmetic mean geometric mean,find population standard deviation,find sample standard deviation,how to use a graphic calculator,pre algebra,pre calculus,absolute deviation,TI Nspire,TI 84 TI83 calculator tutorial,texas instruments calculator,grouped data,set theory,IIT JEE,AIEEE,GCSE,CAT,MAT,SAT,GMAT,MBBS,JELET,JEXPO,VOCLET,Indiastudychannel,IAS,IPS,IFS,GATE,B-Tech,M-Tech,AMIE,MBA,BBA,BCA,MCA,XAT,TOEFL,CBSE,ICSE,HS,WBUT,SSC,IUPAC,Narendra Modi,Sachin Tendulkar Farewell Speech,Dhoom 3,Arvind Kejriwal,maths revision,how to score good marks in exams,how to pass math exams easily,JEE 12th physics chemistry maths PCM,JEE maths shortcut techniques,quadratic equations,competition exams tips and ticks,competition maths,govt job,JEE KOTA,college math,mean value theorem,L hospital rule,tech guru awaaz,derivation,cryptography,iphone 5 fingerprint hack,crash course,CCNA,converting fractions,solve word problem,cipher,game theory,GDP,how to earn money online on youtube,demand curve,computer science,prime factorization,LCM & GCF,gauss elimination,vector,complex numbers,number systems,vector algebra,logarithm,trigonometry,organic chemistry,electrical math problem,eigen value eigen vectors,runge kutta,gauss jordan,simpson 1/3 3/8 trapezoidal rule,solved problem example,newton raphson,interpolation,integration,differentiation,regula falsi,programming,algorithm,gauss seidal,gauss jacobi,taylor series,iteration,binary arithmetic,logic gates,matrix inverse,determinant of matrix,matrix calculator program,sex in ranchi,sex in kolkata,vogel approximation VAM optimization problem,North west NWCR,Matrix minima,Modi method,assignment problem,transportation problem,simplex,k map,boolean algebra,android,casio FC 200v 100v financial,management mathematics tutorials,net present value NPV,time value of money TVM,internal rate of return IRR Bond price,present value PV and future value FV of annuity casio,simple interest SI & compound interest CI casio,break even point,amortization calculation,HP 10b financial calculator,banking and money,income tax e filing,economics,finance,profit & loss,yield of investment bond,Sharp EL 735S,cash flow casio,re finance,insurance and financial planning,investment appraisal,shortcut keys,depreciation,discounting
The aim of this assignment is to have you do UDP socket client / server programming with a focus on two broad aspects : Setting up the exchange between the client and server in a secure way despite the lack of a formal connection (as in TCP) between the two, so that ‘outsider’ UDP datagrams (broadcast, multicast, unicast - fortuitously or maliciously) cannot intrude on the communication. Introducing application-layer protocol data-transmission reliability, flow control and congestion control in the client and server using TCP-like ARQ sliding window mechanisms. The second item above is much more of a challenge to implement than the first, though neither is particularly trivial. But they are not tightly interdependent; each can be worked on separately at first and then integrated together at a later stage. Apart from the material in Chapters 8, 14 & 22 (especially Sections 22.5 - 22.7), and the experience you gained from the preceding assignment, you will also need to refer to the following : ioctl function (Chapter 17). get_ifi_info function (Section 17.6, Chapter 17). This function will be used by the server code to discover its node’s network interfaces so that it can bind all its interface IP addresses (see Section 22.6). ‘Race’ conditions (Section 20.5, Chapter 20) You also need a thorough understanding of how the TCP protocol implements reliable data transfer, flow control and congestion control. Chapters 17- 24 of TCP/IP Illustrated, Volume 1 by W. Richard Stevens gives a good overview of TCP. Though somewhat dated for some things (it was published in 1994), it remains, overall, a good basic reference. Overview This assignment asks you to implement a primitive file transfer protocol for Unix platforms, based on UDP, and with TCP-like reliability added to the transfer operation using timeouts and sliding-window mechanisms, and implementing flow and congestion control. The server is a concurrent server which can handle multiple clients simultaneously. A client gives the server the name of a file. The server forks off a child which reads directly from the file and transfers the contents over to the client using UDP datagrams. The client prints out the file contents as they come in, in order, with nothing missing and with no duplication of content, directly on to stdout (via the receiver sliding window, of course, but with no other intermediate buffering). The file to be transferred can be of arbitrary length, but its contents are always straightforward ascii text. As an aside let me mention that assuming the file contents ascii is not as restrictive as it sounds. We can always pretend, for example, that binary files are base64 encoded (“ASCII armor”). A real file transfer protocol would, of course, have to worry about transferring files between heterogeneous platforms with different file structure conventions and semantics. The sender would first have to transform the file into a platform-independent, protocol-defined, format (using, say, ASN.1, or some such standard), and the receiver would have to transform the received file into its platform’s native file format. This kind of thing can be fairly time consuming, and is certainly very tedious, to implement, with little educational value - it is not part of this assignment. Arguments for the server You should provide the server with an input file server.in from which it reads the following information, in the order shown, one item per line : Well-known port number for server. Maximum sending sliding-window size (in datagram units). You will not be handing in your server.in file. We shall create our own when we come to test your code. So it is important that you stick strictly to the file name and content conventions specified above. The same applies to the client.in input file below. Arguments for the client The client is to be provided with an input file client.in from which it reads the following information, in the order shown, one item per line : IP address of server (not the hostname). Well-known port number of server. filename to be transferred. Receiving sliding-window size (in datagram units). Random generator seed value. Probability p of datagram loss. This should be a real number in the range [ 0.0 , 1.0 ] (value 0.0 means no loss occurs; value 1.0 means all datagrams all lost). The mean µ, in milliseconds, for an exponential distribution controlling the rate at which the client reads received datagram payloads from its receive buffer. Operation Server starts up and reads its arguments from file server.in. As we shall see, when a client communicates with the server, the server will want to know what IP address that client is using to identify the server (i.e. , the destination IP address in the incoming datagram). Normally, this can be done relatively straightforwardly using the IP_RECVDESTADDR socket option, and picking up the information using the ancillary data (‘control information’) capability of the recvmsg function. Unfortunately, Solaris 2.10 does not support the IP_RECVDESTADDR option (nor, incidentally, does it support the msg_flags option in msghdr - see p.390). This considerably complicates things. In the absence of IP_RECVDESTADDR, what the server has to do as part of its initialization phase is to bind each IP address it has (and, simultaneously, its well-known port number, which it has read in from server.in) to a separate UDP socket. The code in Section 22.6, which uses the get_ifi_info function, shows you how to do that. However, there are important differences between that code and the version you want to implement. The code of Section 22.6 binds the IP addresses and forks off a child for each address that is bound to. We do not want to do that. Instead you should have an array of socket descriptors. For each IP address, create a new socket and bind the address (and well-known port number) to the socket without forking off child processes. Creating child processes comes later, when clients arrive. The code of Section 22.6 also attempts to bind broadcast addresses. We do not want to do this. It binds a wildcard IP address, which we certainly do not want to do either. We should bind strictly only unicast addresses (including the loopback address). The get_ifi_info function (which the code in Section 22.6 uses) has to be modified so that it also gets the network masks for the IP addresses of the node, and adds these to the information stored in the linked list of ifi_info structures (see Figure 17.5, p.471) it produces. As you go binding each IP address to a distinct socket, it will be useful for later processing to build your own array of structures, where a structure element records the following information for each socket : sockfd IP address bound to the socket network mask for the IP address subnet address (obtained by doing a bit-wise and between the IP address and its network mask) Report, in a ReadMe file which you hand in with your code, on the modifications you had to introduce to ensure that only unicast addresses are bound, and on your implementation of the array of structures described above. You should print out on stdout, with an appropriate message and appropriately formatted in dotted decimal notation, the IP address, network mask, and subnet address for each socket in your array of structures (you do not need to print the sockfd). The server now uses select to monitor the sockets it has created for incoming datagrams. When it returns from select, it must use recvfrom or recvmsg to read the incoming datagram (see 6. below). When a client starts, it first reads its arguments from the file client.in. The client checks if the server host is ‘local’ to its (extended) Ethernet. If so, all its communication to the server is to occur as MSG_DONTROUTE (or SO_DONTROUTE socket option). It determines if the server host is ‘local’ as follows. The first thing the client should do is to use the modified get_ifi_info function to obtain all of its IP addresses and associated network masks. Print out on stdout, in dotted decimal notation and with an appropriate message, the IP addresses and network masks obtained. In the following, IPserver designates the IP address the client will use to identify the server, and IPclient designates the IP address the client will choose to identify itself. The client checks whether the server is on the same host. If so, it should use the loopback address 127.0.0.1 for the server (i.e. , IPserver = 127.0.0.1). IPclient should also be set to the loopback address. Otherwise it proceeds as follows: IPserver is set to the IP address for the server in the client.in file. Given IPserver and the (unicast) IP addresses and network masks for the client returned by get_ifi_info in the linked list of ifi_info structures, you should be able to figure out if the server node is ‘local’ or not. This will be discussed in class; but let me just remind you here that you should use ‘longest prefix matching’ where applicable. If there are multiple client addresses, and the server host is ‘local’, the client chooses an IP address for itself, IPclient, which matches up as ‘local’ according to your examination above. If the server host is not ‘local’, then IPclient can be chosen arbitrarily. Print out on stdout the results of your examination, as to whether the server host is ‘local’ or not, as well as the IPclient and IPserver addresses selected. Note that this manner of determining whether the server is local or not is somewhat clumsy and ‘over-engineered’, and, as such, should be viewed more in the nature of a pedagogical exercise. Ideally, we would like to look up the server IP address(es) in the routing table (see Section 18.3). This requires that a routing socket be created, for which we need superuser privilege. Alternatively, we might want to dump out the routing table, using the sysctl function for example (see Section 18.4), and examine it directly. Unfortunately, Solaris 2.10 does not support sysctl. Furthermore, note that there is a slight problem with the address 130.245.1.123/24 assigned to compserv3 (see rightmost column of file hosts, and note that this particular compserv3 address “overlaps” with the 130.245.1.x/28 addresses in that same column assigned to compserv1, compserv2 & comserv4). In particular, if the client is running on compserv3 and the server on any of the other three compservs, and if that server node is also being identified to the client by its /28 (rather than its /24) address, then the client will get a “false positive” when it tests as to whether the server node is local or not. In other words, the client will deem the server node to be local, whereas in fact it should not be considered local. Because of this, it is perhaps best simply not to use compserv3 to run the client (but it is o.k. to use it to run the server). Finally, using MSG_DONTROUTE where possible would seem to gain us efficiency, in as much as the kernel does not need to consult the routing table for every datagram sent. But, in fact, that is not so. Recall that one effect of connect with UDP sockets is that routing information is obtained by the kernel at the time the connect is issued. That information is cached and used for subsequent sends from the connected socket (see p.255). The client now creates a UDP socket and calls bind on IPclient, with 0 as the port number. This will cause the kernel to bind an ephemeral port to the socket. After the bind, use the getsockname function (Section 4.10) to obtain IPclient and the ephemeral port number that has been assigned to the socket, and print that information out on stdout, with an appropriate message and appropriately formatted. The client connects its socket to IPserver and the well-known port number of the server. After the connect, use the getpeername function (Section 4.10) to obtain IPserver and the well-known port number of the server, and print that information out on stdout, with an appropriate message and appropriately formatted. The client sends a datagram to the server giving the filename for the transfer. This send needs to be backed up by a timeout in case the datagram is lost. Note that the incoming datagram from the client will be delivered to the server at the socket to which the destination IP address that the datagram is carrying has been bound. Thus, the server can obtain that address (it is, of course, IPserver) and thereby achieve what IP_RECVDESTADDR would have given us had it been available. Furthermore, the server process can obtain the IP address (this will, of course, be IPclient) and ephemeral port number of the client through the recvfrom or recvmsg functions. The server forks off a child process to handle the client. The server parent process goes back to the select to listen for new clients. Hereafter, and unless otherwise stated, whenever we refer to the ‘server’, we mean the server child process handling the client’s file transfer, not the server parent process. Typically, the first thing the server child would be expected to do is to close all sockets it ‘inherits’ from its parent. However, this is not the case with us. The server child does indeed close the sockets it inherited, but not the socket on which the client request arrived. It leaves that socket open for now. Call this socket the ‘listening’ socket. The server (child) then checks if the client host is local to its (extended) Ethernet. If so, all its communication to the client is to occur as MSG_DONTROUTE (or SO_DONTROUTE socket option). If IPserver (obtained in 5. above) is the loopback address, then we are done. Otherwise, the server has to proceed with the following step. Use the array of structures you built in 1. above, together with the addresses IPserver and IPclient to determine if the client is ‘local’. Print out on stdout the results of your examination, as to whether the client host is ‘local’ or not. The server (child) creates a UDP socket to handle file transfer to the client. Call this socket the ‘connection’ socket. It binds the socket to IPserver, with port number 0 so that its kernel assigns an ephemeral port. After the bind, use the getsockname function (Section 4.10) to obtain IPserver and the ephemeral port number that has been assigned to the socket, and print that information out on stdout, with an appropriate message and appropriately formatted. The server then connects this ‘connection’ socket to the client’s IPclient and ephemeral port number. The server now sends the client a datagram, in which it passes it the ephemeral port number of its ‘connection’ socket as the data payload of the datagram. This datagram is sent using the ‘listening’ socket inherited from its parent, otherwise the client (whose socket is connected to the server’s ‘listening’ socket at the latter’s well-known port number) will reject it. This datagram must be backed up by the ARQ mechanism, and retransmitted in the event of loss. Note that if this datagram is indeed lost, the client might well time out and retransmit its original request message (the one carrying the file name). In this event, you must somehow ensure that the parent server does not mistake this retransmitted request for a new client coming in, and spawn off yet another child to handle it. How do you do that? It is potentially more involved than it might seem. I will be discussing this in class, as well as ‘race’ conditions that could potentially arise, depending on how you code the mechanisms I present. When the client receives the datagram carrying the ephemeral port number of the server’s ‘connection’ socket, it reconnects its socket to the server’s ‘connection’ socket, using IPserver and the ephemeral port number received in the datagram (see p.254). It now uses this reconnected socket to send the server an acknowledgment. Note that this implies that, in the event of the server timing out, it should retransmit two copies of its ‘ephemeral port number’ message, one on its ‘listening’ socket and the other on its ‘connection’ socket (why?). When the server receives the acknowledgment, it closes the ‘listening’ socket it inherited from its parent. The server can now commence the file transfer through its ‘connection’ socket. The net effect of all these binds and connects at server and client is that no ‘outsider’ UDP datagram (broadcast, multicast, unicast - fortuitously or maliciously) can now intrude on the communication between server and client. Starting with the first datagram sent out, the client behaves as follows. Whenever a datagram arrives, or an ACK is about to be sent out (or, indeed, the initial datagram to the server giving the filename for the transfer), the client uses some random number generator function random() (initialized by the client.in argument value seed) to decide with probability p (another client.in argument value) if the datagram or ACK should be discarded by way of simulating transmission loss across the network. (I will briefly discuss in class how you do this.) Adding reliability to UDP The mechanisms you are to implement are based on TCP Reno. These include : Reliable data transmission using ARQ sliding-windows, with Fast Retransmit. Flow control via receiver window advertisements. Congestion control that implements : SlowStart Congestion Avoidance (‘Additive-Increase/Multiplicative Decrease’ – AIMD) Fast Recovery (but without the window-inflation aspect of Fast Recovery) Only some, and by no means all, of the details for these are covered below. The rest will be presented in class, especially those concerning flow control and TCP Reno’s congestion control mechanisms in general : Slow Start, Congestion Avoidance, Fast Retransmit and Fast Recovery. Implement a timeout mechanism on the sender (server) side. This is available to you from Stevens, Section 22.5 . Note, however, that you will need to modify the basic driving mechanism of Figure 22.7 appropriately since the situation at the sender side is not a repetitive cycle of send-receive, but rather a straightforward progression of send-send-send-send- . . . . . . . . . . . Also, modify the RTT and RTO mechanisms of Section 22.5 as specified below. I will be discussing the details of these modifications and the reasons for them in class. Modify function rtt_stop (Fig. 22.13) so that it uses integer arithmetic rather than floating point. This will entail your also having to modify some of the variable and function parameter declarations throughout Section 22.5 from float to int, as appropriate. In the unprrt.h header file (Fig. 22.10) set : RTT_RXTMIN to 1000 msec. (1 sec. instead of the current value 3 sec.) RTT_RXTMAX to 3000 msec. (3 sec. instead of the current value 60 sec.) RTT_MAXNREXMT to 12 (instead of the current value 3) In function rtt_timeout (Fig. 22.14), after doubling the RTO in line 86, pass its value through the function rtt_minmax of Fig. 22.11 (somewhat along the lines of what is done in line 77 of rtt_stop, Fig. 22.13). Finally, note that with the modification to integer calculation of the smoothed RTT and its variation, and given the small RTT values you will experience on the cs / sbpub network, these calculations should probably now be done on a millisecond or even microsecond scale (rather than in seconds, as is the case with Stevens’ code). Otherwise, small measured RTTs could show up as 0 on a scale of seconds, yielding a negative result when we subtract the smoothed RTT from the measured RTT (line 72 of rtt_stop, Fig. 22.13). Report the details of your modifications to the code of Section 22.5 in the ReadMe file which you hand in with your code. We need to have a sender sliding window mechanism for the retransmission of lost datagrams; and a receiver sliding window in order to ensure correct sequencing of received file contents, and some measure of flow control. You should implement something based on TCP Reno’s mechanisms, with cumulative acknowledgments, receiver window advertisements, and a congestion control mechanism I will explain in detail in class. For a reference on TCP’s mechanisms generally, see W. Richard Stevens, TCP/IP Illustrated, Volume 1 , especially Sections 20.2 - 20.4 of Chapter 20 , and Sections 21.1 - 21.8 of Chapter 21 . Bear in mind that our sequence numbers should count datagrams, not bytes as in TCP. Remember that the sender and receiver window sizes have to be set according to the argument values in client.in and server.in, respectively. Whenever the sender window becomes full and so ‘locks’, the server should print out a message to that effect on stdout. Similarly, whenever the receiver window ‘locks’, the client should print out a message on stdout. Be aware of the potential for deadlock when the receiver window ‘locks’. This situation is handled by having the receiver process send a duplicate ACK which acts as a window update when its window opens again (see Figure 20.3 and the discussion about it in TCP/IP Illustrated). However, this is not enough, because ACKs are not backed up by a timeout mechanism in the event they are lost. So we will also need to implement a persist timer driving window probes in the sender process (see Sections 22.1 & 22.2 in Chapter 22 of TCP/IP Illustrated). Note that you do not have to worry about the Silly Window Syndrome discussed in Section 22.3 of TCP/IP Illustrated since the receiver process consumes ‘full sized’ 512-byte messages from the receiver buffer (see 3. below). Report on the details of the ARQ mechanism you implemented in the ReadMe file you hand in. Indeed, you should report on all the TCP mechanisms you implemented in the ReadMe file, both the ones discussed here, and the ones I will be discussing in class. Make your datagram payload a fixed 512 bytes, inclusive of the file transfer protocol header (which must, at the very least, carry: the sequence number of the datagram; ACKs; and advertised window notifications). The client reads the file contents in its receive buffer and prints them out on stdout using a separate thread. This thread sits in a repetitive loop till all the file contents have been printed out, doing the following. It samples from an exponential distribution with mean µ milliseconds (read from the client.in file), sleeps for that number of milliseconds; wakes up to read and print all in-order file contents available in the receive buffer at that point; samples again from the exponential distribution; sleeps; and so on. The formula -1 × µ × ln( random( ) ) , where ln is the natural logarithm, yields variates from an exponential distribution with mean µ, based on the uniformly-distributed variates over ( 0 , 1 ) returned by random(). Note that you will need to implement some sort of mutual exclusion/semaphore mechanism on the client side so that the thread that sleeps and wakes up to consume from the receive buffer is not updating the state variables of the buffer at the same time as the main thread reading from the socket and depositing into the buffer is doing the same. Furthermore, we need to ensure that the main thread does not effectively monopolize the semaphore (and thus lock out for prolonged periods of time) the sleeping thread when the latter wakes up. See the textbook, Section 26.7, ‘Mutexes: Mutual Exclusion’, pp.697-701. You might also find Section 26.8, ‘Condition Variables’, pp.701-705, useful. You will need to devise some way by which the sender can notify the receiver when it has sent the last datagram of the file transfer, without the receiver mistaking that EOF marker as part of the file contents. (Also, note that the last data segment could be a “short” segment of less than 512 bytes – your client needs to be able to handle this correctly somehow.) When the sender receives an ACK for the last datagram of the transfer, the (child) server terminates. The parent server has to take care of cleaning up zombie children. Note that if we want a clean closing, the client process cannot simply terminate when the receiver ACKs the last datagram. This ACK could be lost, which would leave the (child) server process ‘hanging’, timing out, and retransmitting the last datagram. TCP attempts to deal with this problem by means of the TIME_WAIT state. You should have your receiver process behave similarly, sticking around in something akin to a TIME_WAIT state in case in case it needs to retransmit the ACK. In the ReadMe file you hand in, report on how you dealt with the issues raised here: sender notifying receiver of the last datagram, clean closing, and so on. Output Some of the output required from your program has been described in the section Operation above. I expect you to provide further output – clear, well-structured, well-laid-out, concise but sufficient and helpful – in the client and server windows by means of which we can trace the correct evolution of your TCP’s behaviour in all its intricacies : information (e.g., sequence number) on datagrams and acks sent and dropped, window advertisements, datagram retransmissions (and why : dup acks or RTO); entering/exiting Slow Start and Congestion Avoidance, ssthresh and cwnd values; sender and receiver windows locking/unlocking; etc., etc. . . . . The onus is on you to convince us that the TCP mechanisms you implemented are working correctly. Too many students do not put sufficient thought, creative imagination, time or effort into this. It is not the TA’s nor my responsibility to sit staring at an essentially blank screen, trying to summon up our paranormal psychology skills to figure out if your TCP implementation is really working correctly in all its very intricate aspects, simply because the transferred file seems to be printing o.k. in the client window. Nor is it our responsibility to strain our eyes and our patience wading through a mountain of obscure, ill-structured, hyper-messy, debugging-style output because, for example, your effort-conserving concept of what is ‘suitable’ is to dump your debugging output on us, relevant, irrelevant, and everything in between.
nyaundid
SEIS 665 Assignment 2: Linux & Git Overview This week we will focus on becoming familiar with launching a Linux server and working with some basic Linux and Git commands. We will use AWS to launch and host the Linux server. AWS might seem a little confusing at this point. Don’t worry, we will gain much more hands-on experience with AWS throughout the course. The goal is to get you comfortable working with the technology and not overwhelm you with all the details. Requirements You need to have a personal AWS account and GitHub account for this assignment. You should also read the Git Hands-on Guide and Linux Hands-on Guide before beginning this exercise. A word about grading One of the key DevOps practices we learn about in this class is the use of automation to increase the speed and repeatability of processes. Automation is utilized during the assignment grading process to review and assess your work. It’s important that you follow the instructions in each assignment and type in required files and resources with the proper names. All names are case sensitive, so a name like "Web1" is not the same as "web1". If you misspell a name, use the wrong case, or put a file in the wrong directory location you will lose points on your assignment. This is the easiest way to lose points, and also the most preventable. You should always double-check your work to make sure it accurately reflects the requirements specified in the assignment. You should always carefully review the content of your files before submitting your assignment. The assignment Let’s get started! Create GitHub repository The first step in the assignment is to setup a Git repository on GitHub. We will use a special solution called GitHub Classroom for this course which automates the process of setting up student assignment repositories. Here are the basic steps: Click on the following link to open Assignment 2 on the GitHub Classroom site: https://classroom.github.com/a/K4zcVmX- (Links to an external site.)Links to an external site. Click on the Accept this assignment button. GitHub Classroom will provide you with a URL (https) to access the assignment repository. Either copy this address to your clipboard or write it down somewhere. You will need to use this address to set up the repository on a Linux server. Example: https://github.com/UST-SEIS665/hw2-seis665-02-spring2019-<your github id>.git At this point your new repository to ready to use. The repository is currently empty. We will put some content in there soon! Launch Linux server The second step in the assignment is to launch a Linux server using AWS EC2. The server should have the following characteristics: Amazon Linux 2 AMI 64-bit (usually the first option listed) Located in a U.S. region (us-east-1) t2.micro instance type All default instance settings (storage, vpm, security group, etc.) I’ve shown you how to launch EC2 instances in class. You can review it on Canvas. Once you launch the new server, it may take a few minutes to provision. Log into server The next step is to log into the Linux server using a terminal program with a secure shell (SSH) support. You can use iTerm2 (Links to an external site.)Links to an external site. on a Mac and GitBash/PuTTY (Links to an external site.)Links to an external site. on a PC. You will need to have the private server key and the public IP address before attempting to log into the server. The server key is basically your password. If you lose it, you will need to terminate the existing instance and launch a new server. I recommend reusing the same key when launching new servers throughout the class. Note, I make this recommendation to make the learning process easier and not because it is a common security practice. I’ve shown you how to use a terminal application to log into the instance using a Windows desktop. Your personal computer or lab computer may be running a different OS version, but the process is still very similar. You can review the videos on the Canvas. Working with Linux If you’ve made it this far, congratulations! You’ve made it over the toughest hurdle. By the end of this course, I promise you will be able to launch and log into servers in your sleep. You should be looking at a login screen that looks something like this: Last login: Mon Mar 21 21:17:54 2016 from 174-20-199-194.mpls.qwest.net __| __|_ ) _| ( / Amazon Linux AMI ___|\___|___| https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/ 8 package(s) needed for security, out of 17 available Run "sudo yum update" to apply all updates. ec2-user@ip-172-31-15-26 ~]$ Your terminal cursor is sitting at the shell prompt, waiting for you to type in your first command. Remember the shell? It is a really cool program that lets you start other programs and manage services on the Linux system. The rest of this assignment will be spent working with the shell. Note, when you are asked to type in a command in the steps below, don’t type in the dollar-sign ($) character. This is just meant to represent the command prompt. The actual commands are represented by the characters to the right of the command prompt. Let’s start by asking the shell for some help. Type in: $ help The shell provides you with a list of commands you can run along with possible command options. Next, check out one of the pages in the built-in manual: $ man ls A man page will appear with information on how to use the ls command. This command is used to list the contents of file directories. Either space through the contents of the man page or hit q to exit. Most of the core Linux commands have man pages available. But honestly, some of these man pages are a bit hard to understand. Sometimes your best bet is to search on Google if you are trying to figure out how to use a specific command. When you initially log into Linux, the system places you in your home directory. Each user on the system has a separate home directory. Let’s see where your home directory is located: $ pwd The response should be /home/ec2-user. The pwd command is handy to remember if you ever forget what file directory you are currently located in. If you recall from the Linux Hands-on Guide, this directory is also your current working directory. Type in: $ cd / The cd command let’s you change to a new working directory on the server. In this case, we changed to the root (/) directory. This is the parent of all the other directories on the file system. Type in: $ ls The ls command lists the contents of the current directory. As you can see, root directory contains many other directories. You will become familiar with these directories over time. The ls command provides a very basic directory listing. You need to supply the command with some options if you want to see more detailed information. Type in: $ ls -la See how this command provides you with much more detailed information about the files and directories? You can use this detailed listing to see the owner, group, and access control list settings for each file or directory. Do you see any files listed? Remember, the first character in the access control list column denotes whether a listed item is a file or a directory. You probably see a couple files with names like .autofsck. How come you didn’t see this file when you typed in the lscommand without any options? (Try to run this command again to convince yourself.) Files names that start with a period are called hidden files. These files won’t appear on normal directory listings. Type in: $ cd /var Then, type in: $ ls You will see a directory listing for the /var directory. Next, type in: $ ls .. Huh. This directory listing looks the same as the earlier root directory listing. When you use two periods (..) in a directory path that means you are referring to the parent directory of the current directory. Just think of the two dots as meaning the directory above the current directory. Now, type in: $ cd ~ $ pwd Whoa. We’re back at our home directory again. The tilde character (~) is another one of those handy little directory path shortcuts. It always refers to our personal home directory. Keep in mind that since every user has their own home directory, the tilde shortcut will refer to a unique directory for each logged-in user. Most students are used to navigating a file system by clicking a mouse in nested graphical folders. When they start using a command-line to navigate a file system, they sometimes get confused and lose track of their current position in the file system. Remember, you can always use the pwd command to quickly figure out what directory you are currently working in. Let’s make some changes to the file system. We can easily make our own directories on the file system. Type: mkdir test Now type: ls Cool, there’s our new test directory. Let’s pretend we don’t like that directory name and delete it. Type: rmdir test Now it’s gone. How can you be sure? You should know how to check to see if the directory still exists at this point. Go ahead and check. Let’s create another directory. Type in: $ mkdir documents Next, change to the new directory: $ cd documents Did you notice that your command prompt displays the name of the current directory? Something like: [ec2-user@ip-172-31-15-26 documents]$. Pretty handy, huh? Okay, let’s create our first file in the documents directory. This is just an empty file for training purposes. Type in: $ touch paper.txt Check to see that the new file is in the directory. Now, go back to the previous directory. Remember the double dot shortcut? $ cd .. Okay, we don’t like our documents directory any more. Let’s blow it away. Type in: $ rmdir documents Uh oh. The shell didn’t like that command because the directory isn’t empty. Let’s change back into the documents directory. But this time don’t type in the full name of the directory. You can let shell auto-completion do the typing for you. Type in the first couple characters of the directory name and then hit the tab key: $ cd doc<tab> You should use the tab auto-completion feature often. It saves typing and makes working with the Linux file system much much easier. Tab is your friend. Now, remove the file by typing: $ rm paper.txt Did you try to use the tab key instead of typing in the whole file name? Check to make sure the file was deleted from the directory. Next, create a new file: $ touch file1 We like file1 so much that we want to make a backup copy. Type: $ cp file1 file1-backup Check to make sure the new backup copy was created. We don’t really like the name of that new file, so let’s rename it. Type: $ mv file1-backup backup Moving a file to the same directory and giving it a new name is basically the same thing as renaming it. We could have moved it to a different directory if we wanted. Let’s list all of the files in the current directory that start with the letter f: $ ls f* Using wildcard pattern matching in file commands is really useful if you want the command to impact or filter a group of files. Now, go up one directory to the parent directory (remember the double dot shortcut?) We tried to remove the documents directory earlier when it had files in it. Obviously that won’t work again. However, we can use a more powerful command to destroy the directory and vanquish its contents. Behold, the all powerful remove command: $ rm -fr documents Did you remember to use auto-completion when typing in documents? This command and set of options forcibly removes the directory and its contents. It’s a dangerous command wielded by the mightiest Linux wizards. Okay, maybe that’s a bit of an exaggeration. Just be careful with it. Check to make sure the documents directory is gone before proceeding. Let’s continue. Change to the directory /var and make a directory called test. Ugh. Permission denied. We created this darn Linux server and we paid for it. Shouldn’t we be able to do anything we want on it? You logged into the system as a user called ec2-user. While this user can create and manage files in its home directory, it cannot change files all across the system. At least it can’t as a normal user. The ec2-user is a member of the root group, so it can escalate its privileges to super-user status when necessary. Let’s try it: $ sudo mkdir test Check to make sure the directory exists now. Using sudo we can execute commands as a super-user. We can do anything we want now that we know this powerful new command. Go ahead and delete the test directory. Did you remember to use sudo before the rmdir command? Check to make sure the directory is gone. You might be asking yourself the question: why can we list the contents of the /var directory but not make changes? That’s because all users have read access to the /var directory and the ls command is a read function. Only the root users or those acting as a super-user can write changes to the directory. Let’s go back to our home directory: $ cd ~ Editing text files is a really common task on Linux systems because many of the application configuration files are text files. We can create a text file by using a text editor. Type in: $ nano myfile.conf The shell starts up the nano text editor and places your terminal cursor in the editing screen. Nano is a simple text-based word processor. Type in a few lines of text. When you’re done writing your novel, hit ctrl-x and answer y to the prompt to save your work. Finally, hit enter to save the text to the filename you specified. Check to see that your file was saved in the directory. You can take a look at the contents of your file by typing: $ cat myfile.conf The cat command displays your text file content on the terminal screen. This command works fine for displaying small text files. But if your file is hundreds of lines long, the content will scroll down your terminal screen so fast that you won’t be able to easily read it. There’s a better way to view larger text files. Type in: $ less myfile.conf The less command will page the display of a text file, allowing you to page through the contents of the file using the space bar. Your text file is probably too short to see the paging in action though. Hit q to quit out of the less text viewer. Hit the up-arrow key on your keyboard a few times until the commmand nano myfile.conf appears next to your command prompt. Cool, huh? The up-arrow key allows you to replay a previously run command. Linux maintains a list of all the commands you have run since you logged into the server. This is called the command history. It’s a really useful feature if you have to re-run a complex command again. Now, hit ctrl-c. This cancels whatever command is displayed on the command line. Type in the following command to create a couple empty files in the directory: $ touch file1 file2 file3 Confirm that the files were created. Some commands, like touch. allow you to specify multiple files as arguments. You will find that Linux commands have all kinds of ways to make tasks more efficient like this. Throughout this assignment, we have been running commands and viewing results on the terminal screen. The screen is the standard place for commands to output results. It’s known as the standard out (stdout). However, it’s really useful to output results to the file system sometimes. Type in: $ ls > listing.txt Take a look at the directory listing now. You just created a new file. View the contents of the listing.txt file. What do you see? Instead of sending the output from the ls command to the screen we sent it to a text file. Let’s try another one. Type: $ cat myfile.conf > listing.txt Take a look at the contents of the listing.txt file again. It looks like your myfile.conf file now. It’s like you made a copy of it. But what happened to the previous content in the listing.txt file? When you redirect the output of a command using the right angle-bracket character (>), the output overwrites the existing file. Type this command in: $ cat myfile.conf >> listing.txt Now look at the contents of the listing.txt file. You should see your original content displayed twice. When you use two angle-bracket characters in the commmand the output appends (or adds to) the file instead of overwriting it. We redirected the output from a command to a text file. It’s also possible to redirect the input to a command. Typically we use a keyboard to provide input, but sometimes it makes more sense to input a file to a command. For example, how many words are in your new listing.txt file? Let’s find out. Type in: $ wc -w < listing.txt Did you get a number? This command inputs the listing.txt file into a word count program called wc. Type in the command: $ ls /usr/bin The terminal screen probably scrolled quickly as filenames flashed by. The /usr/bin directory holds quite a few files. It would be nice if we could page through the contents of this directory. Well, we can. We can use a special shell feature called pipes. In previous steps, we redirected I/O using the file system. Pipes allow us to redirect I/O between programs. We can redirect the output from one program into another. Type in: $ ls /usr/bin | less Now the directory listing is paged. Hit the spacebar to page through the listing. The pipe, represented by a vertical bar character (|), takes the output from the ls command and redirects it to the less command where the resulting output is paged. Pipes are super powerful and used all the time by savvy Linux operators. Hit the q key to quit the paginated directory listing command. Working with shell scripts Now things are going to get interesting. We’ve been manually typing in commands throughout this exercise. If we were running a set of repetitive tasks, we would want to automate the process as much as possible. The shell makes it really easy to automate tasks using shell scripts. The shell provides many of the same features as a basic procedural programming language. Let’s write some code. Type in this command: $ j=123 $ echo $j We just created a variable named j referencing the string 123. The echo command printed out the value of the variable. We had to use a dollar sign ($) when referencing the variable in another command. Next, type in: $ j=1+1 $ echo $j Is that what you expected? The shell just interprets the variable value as a string. It’s not going to do any sort of computation. Typing in shell script commands on the command line is sort of pointless. We want to be able to create scripts that we can run over-and-over. Let’s create our first shell script. Use the nano editor to create a file named myscript. When the file is open in the editor, type in the following lines of code: #!/bin/bash echo Hello $1 Now quit the editor and save your file. We can run our script by typing: $ ./myscript World Er, what happened? Permission denied. Didn’t we create this file? Why can’t we run it? We can’t run the script file because we haven’t set the execute permission on the file. Type in: $ chmod u+x myscript This modifies the file access control list to allow the owner of the file to execute it. Let’s try to run the command again. Hit the up-arrow key a couple times until the ./myscript World command is displayed and hit enter. Hooray! Our first shell script. It’s probably a bit underwhelming. No problem, we’ll make it a little more complex. The script took a single argument called World. Any arguments provided to a shell script are represented as consecutively numbered variables inside the script ($1, $2, etc). Pretty simple. You might be wondering why we had to type the ./ characters before the name of our script file. Try to type in the command without them: $ myscript World Command not found. That seems a little weird. Aren’t we currently in the directory where the shell script is located? Well, that’s just not how the shell works. When you enter a command into the shell, it looks for the command in a predefined set of directories on the server called your PATH. Since your script file isn’t in your special path, the shell reports it as not found. By typing in the ./ characters before the command name you are basically forcing the shell to look for your script in the current directory instead of the default path. Create another file called cleanup using nano. In the file editor window type: #!/bin/bash # My cleanup script mkdir archive mv file* archive Exit the editor window and save the file. Change the permissions on the script file so that you can execute it. Now run the command: $ ./cleanup Take a look at the file directory listing. Notice the archive directory? List the contents of that directory. The script automatically created a new directory and moved three files into it. Anything you can do manually at a command prompt can be automated using a shell script. Let’s create one more shell script. Use nano to create a script called namelist. Here is the content of the script: #!/bin/bash # for-loop test script names='Jason John Jane' for i in $names do echo Hello $i done Change the permissions on the script file so that you can execute it. Run the command: $ ./namelist The script will loop through a set of names stored in a variable displaying each one. Scripts support several programming constructs like for-loops, do-while loops, and if-then-else. These building blocks allow you to create fairly complex scripts for automating tasks. Installing packages and services We’re nearing the end of this assignment. But before we finish, let’s install some new software packages on our server. The first thing we should do is make sure all the current packages installed on our Linux server are up-to-date. Type in: $ sudo yum update -y This is one of those really powerful commands that requires sudo access. The system will review the currently installed packages and go out to the Internet and download appropriate updates. Next, let’s install an Apache web server on our system. Type in: $ sudo yum install httpd -y Bam! You probably never knew that installing a web server was so easy. We’re not going to actually use the web server in this exercise, but we will in future assignments. We installed the web server, but is it actually running? Let’s check. Type in: $ sudo service httpd status Nope. Let’s start it. Type: $ sudo service httpd start We can use the service command to control the services running on the system. Let’s setup the service so that it automatically starts when the system boots up. Type in: $ sudo chkconfig httpd on Cool. We installed the Apache web server on our system, but what other programs are currently running? We can use the pscommand to find out. Type in: $ ps -ax Lots of processes are running on our system. We can even look at the overall performance of our system using the topcommand. Let’s try that now. Type in: $ top The display might seem a little overwhelming at first. You should see lots of performance information displayed including the cpu usage, free memory, and a list of running tasks. We’re almost across the finish line. Let’s make sure all of our valuable work is stored in a git repository. First, we need to install git. Type in the command: $ sudo yum install git -y Check your work It’s very important to check your work before submitting it for grading. A misspelled, misplaced or missing file will cost you points. This may seem harsh, but the reality is that these sorts of mistakes have consequences in the real world. For example, a server instance could fail to launch properly and impact customers because a single required file is missing. Here is what the contents of your git repository should look like before final submission: ┣archive ┃ ┣ file1 ┃ ┣ file2 ┃ ┗ file3 ┣ namelist ┗ myfile.conf Saving our work in the git repository Next, make sure you are still in your home directory (/home/ec2-user). We will install the git repository you created at the beginning of this exercise. You will need to modify this command by typing in the GitHub repository URL you copied earlier. $ git clone <your GitHub URL here>.git Example: git clone https://github.com/UST-SEIS665/hw2-seis665-02-spring2019-<your github id>.git The git application will ask you for your GitHub username and password. Note, if you have multi-factor authentication enabled on your GitHub account you will need to provide a personal token instead of your password. Git will clone (copy) the repository from GitHub to your Linux server. Since the repository is empty the clone happens almost instantly. Check to make sure that a sub-directory called "hw2-seis665-02-spring2019-<username>" exists in the current directory (where <username> is your GitHub account name). Git automatically created this directory as part of the cloning process. Change to the hw2-seis665-02-spring2019-<username> directory and type: $ ls -la Notice the .git hidden directory? This is where git actually stores all of the file changes in your repository. Nothing is actually in your repository yet. Change back to the parent directory (cd ..). Next, let’s move some of our files into the repository. Type: $ mv archive hw2-seis665-02-spring2019-<username> $ mv namelist hw2-seis665-02-spring2019-<username> $ mv myfile.conf hw2-seis665-02-spring2019-<username> Hopefully, you remembered to use the auto-complete function to reduce some of that typing. Change to the hw2-seis665-02-spring2019-<username> directory and list the directory contents. Your files are in the working directory, but are not actually stored in the repository because they haven’t been committed yet. Type in: $ git status You should see a list of untracked files. Let’s tell git that we want these files tracked. Type in: $ git add * Now type in the git status command again. Notice how all the files are now being tracked and are ready to be committed. These files are in the git staging area. We’ll commit them to the repository next. Type: $ git commit -m 'assignment 2 files' Next, take a look at the commit log. Type: $ git log You should see your commit listed along with an assigned hash (long string of random-looking characters). Finally, let’s save the repository to our GitHub account. Type in: $ git push origin master The git client will ask you for your GitHub username and password before pushing the repository. Go back to the GitHub.com website and login if you have been logged out. Click on the repository link for the assignment. Do you see your files listed there? Congratulations, you completed the exercise! Terminate server The last step is to terminate your Linux instance. AWS will bill you for every hour the instance is running. The cost is nominal, but there’s no need to rack up unnecessary charges. Here are the steps to terminate your instance: Log into your AWS account and click on the EC2 dashboard. Click the Instances menu item. Select your server in the instances table. Click on the Actions drop down menu above the instances table. Select the Instance State menu option Click on the Terminate action. Your Linux instance will shutdown and disappear in a few minutes. The EC2 dashboard will continue to display the instance on your instance listing for another day or so. However, the state of the instance will be terminated. Submitting your assignment — IMPORTANT! If you haven’t already, please e-mail me your GitHub username in order to receive credit for this assignment. There is no need to email me to tell me that you have committed your work to GitHub or to ask me if your GitHub submission worked. If you can see your work in your GitHub repository, I can see your work.
DeclareDesign
randomizr: Easy-to-Use Tools for Common Forms of Random Assignment and Sampling
Introduction The context is the 2016 public use NH medical claims files obtained from NH CHIS (Comprehensive Health Care Information System). The dataset contains Commercial Insurance claims, and a small fraction of Medicaid and Medicare payments for dually eligible people. The primary purpose of this assignment is to test machine learning (ML) skills in a real case analysis setting. You are expected to clean and process data and then apply various ML techniques like Linear and no linear models like regularized regression, MARS, and Partitioning methods. You are expected to use at least two of R, Python and JMP software. Data details: Medical claims file for 2016 contains ~17 millions rows and ~60 columns of data, containing ~6.5 million individual medical claims. These claims are all commercial claims that were filed by healthcare providers in 2016 in the state of NH. These claims were ~88% for residents of NH and the remaining for out of state visitors who sought care in NH. Each claim consists of one or more line items, each indicating a procedure done during the doctor’s visit. Two columns indicating Billed amount and the Paid amount for the care provided, are of primary interest. The main objective is to predict “Paid amount per procedure” by mapping a plethora of features available in the dataset. It is also an expectation that you would create new features using the existing ones or external data sources. Objectives: Step 1: Take a random sample of 1 million unique claims, such that all line items related to each claim are included in the sample. This will result in a little less than 3 million rows of data. Step 2: Clean up the data, understand the distributions, and create new features if necessary. Step 3: Run predictive models using validation method of your choice. Step 4: Write a descriptive report (less than 10 pages) describing the process and your findings.
cneuralnetwork
idk
kevinhughes27
ELEC 824 Assignment #2 - c++ implementation of circle detection using RANSAC based off of "An Efficient Randomized Algorithm for Detecting Circles" by Chen and Chung
terracotta-education
Terracotta (a portmanteau of Tool for Education Research with RAndomized COnTrolled TriAls) is a plug-in to the learning management system that allows the contents of online assignments to be differentiated for experimental treatment variations, and to be assigned randomly to different groups of students. Terracotta also enables privacy protections for student participants, such as informed consent that is hidden from the teacher, filtering of non-consenting participants from result summaries and data exports, and removal of student identifiers from these exports. Terracotta's goal is to lower the technical and methodological barriers to conducting more rigorous and responsible education research.
shubham3121
This repository consists of various programming assignments to solve the motion planning problem using different approaches including graph-based methods, randomized planners and artificial potential fields.
richarddmorey
Use rstudio flexdashboards to create random stats assignments
manmartgarc
This is a Python tool to employ stratified sampling or treatment randomization with uneven numbers in some strata using pandas. Mainly thought with RCTs in mind, it also works for any other scenario in where you would like to randomly allocate treatment within blocks or strata. The tool also supports having multiple treatments with different probability of assignment within each block or stratum.
Aryia-Behroziuan
Quickstart tutorial Prerequisites Before reading this tutorial you should know a bit of Python. If you would like to refresh your memory, take a look at the Python tutorial. If you wish to work the examples in this tutorial, you must also have some software installed on your computer. Please see https://scipy.org/install.html for instructions. Learner profile This tutorial is intended as a quick overview of algebra and arrays in NumPy and want to understand how n-dimensional (n>=2) arrays are represented and can be manipulated. In particular, if you don’t know how to apply common functions to n-dimensional arrays (without using for-loops), or if you want to understand axis and shape properties for n-dimensional arrays, this tutorial might be of help. Learning Objectives After this tutorial, you should be able to: Understand the difference between one-, two- and n-dimensional arrays in NumPy; Understand how to apply some linear algebra operations to n-dimensional arrays without using for-loops; Understand axis and shape properties for n-dimensional arrays. The Basics NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes. For example, the coordinates of a point in 3D space [1, 2, 1] has one axis. That axis has 3 elements in it, so we say it has a length of 3. In the example pictured below, the array has 2 axes. The first axis has a length of 2, the second axis has a length of 3. [[ 1., 0., 0.], [ 0., 1., 2.]] NumPy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality. The more important attributes of an ndarray object are: ndarray.ndim the number of axes (dimensions) of the array. ndarray.shape the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim. ndarray.size the total number of elements of the array. This is equal to the product of the elements of shape. ndarray.dtype an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples. ndarray.itemsize the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize. ndarray.data the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities. An example >>> import numpy as np a = np.arange(15).reshape(3, 5) a array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) a.shape (3, 5) a.ndim 2 a.dtype.name 'int64' a.itemsize 8 a.size 15 type(a) <class 'numpy.ndarray'> b = np.array([6, 7, 8]) b array([6, 7, 8]) type(b) <class 'numpy.ndarray'> Array Creation There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences. >>> >>> import numpy as np >>> a = np.array([2,3,4]) >>> a array([2, 3, 4]) >>> a.dtype dtype('int64') >>> b = np.array([1.2, 3.5, 5.1]) >>> b.dtype dtype('float64') A frequent error consists in calling array with multiple arguments, rather than providing a single sequence as an argument. >>> >>> a = np.array(1,2,3,4) # WRONG Traceback (most recent call last): ... TypeError: array() takes from 1 to 2 positional arguments but 4 were given >>> a = np.array([1,2,3,4]) # RIGHT array transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on. >>> >>> b = np.array([(1.5,2,3), (4,5,6)]) >>> b array([[1.5, 2. , 3. ], [4. , 5. , 6. ]]) The type of the array can also be explicitly specified at creation time: >>> >>> c = np.array( [ [1,2], [3,4] ], dtype=complex ) >>> c array([[1.+0.j, 2.+0.j], [3.+0.j, 4.+0.j]]) Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation. The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64. >>> >>> np.zeros((3, 4)) array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) >>> np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]], dtype=int16) >>> np.empty( (2,3) ) # uninitialized array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260], # may vary [ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]]) To create sequences of numbers, NumPy provides the arange function which is analogous to the Python built-in range, but returns an array. >>> >>> np.arange( 10, 30, 5 ) array([10, 15, 20, 25]) >>> np.arange( 0, 2, 0.3 ) # it accepts float arguments array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8]) When arange is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace that receives as an argument the number of elements that we want, instead of the step: >>> >>> from numpy import pi >>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2 array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ]) >>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points >>> f = np.sin(x) See also array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, numpy.random.Generator.rand, numpy.random.Generator.randn, fromfunction, fromfile Printing Arrays When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout: the last axis is printed from left to right, the second-to-last is printed from top to bottom, the rest are also printed from top to bottom, with each slice separated from the next by an empty line. One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices. >>> >>> a = np.arange(6) # 1d array >>> print(a) [0 1 2 3 4 5] >>> >>> b = np.arange(12).reshape(4,3) # 2d array >>> print(b) [[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]] >>> >>> c = np.arange(24).reshape(2,3,4) # 3d array >>> print(c) [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] See below to get more details on reshape. If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners: >>> >>> print(np.arange(10000)) [ 0 1 2 ... 9997 9998 9999] >>> >>> print(np.arange(10000).reshape(100,100)) [[ 0 1 2 ... 97 98 99] [ 100 101 102 ... 197 198 199] [ 200 201 202 ... 297 298 299] ... [9700 9701 9702 ... 9797 9798 9799] [9800 9801 9802 ... 9897 9898 9899] [9900 9901 9902 ... 9997 9998 9999]] To disable this behaviour and force NumPy to print the entire array, you can change the printing options using set_printoptions. >>> >>> np.set_printoptions(threshold=sys.maxsize) # sys module should be imported Basic Operations Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result. >>> >>> a = np.array( [20,30,40,50] ) >>> b = np.arange( 4 ) >>> b array([0, 1, 2, 3]) >>> c = a-b >>> c array([20, 29, 38, 47]) >>> b**2 array([0, 1, 4, 9]) >>> 10*np.sin(a) array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854]) >>> a<35 array([ True, True, False, False]) Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the @ operator (in python >=3.5) or the dot function or method: >>> >>> A = np.array( [[1,1], ... [0,1]] ) >>> B = np.array( [[2,0], ... [3,4]] ) >>> A * B # elementwise product array([[2, 0], [0, 4]]) >>> A @ B # matrix product array([[5, 4], [3, 4]]) >>> A.dot(B) # another matrix product array([[5, 4], [3, 4]]) Some operations, such as += and *=, act in place to modify an existing array rather than create a new one. >>> >>> rg = np.random.default_rng(1) # create instance of default random number generator >>> a = np.ones((2,3), dtype=int) >>> b = rg.random((2,3)) >>> a *= 3 >>> a array([[3, 3, 3], [3, 3, 3]]) >>> b += a >>> b array([[3.51182162, 3.9504637 , 3.14415961], [3.94864945, 3.31183145, 3.42332645]]) >>> a += b # b is not automatically converted to integer type Traceback (most recent call last): ... numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind' When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one (a behavior known as upcasting). >>> >>> a = np.ones(3, dtype=np.int32) >>> b = np.linspace(0,pi,3) >>> b.dtype.name 'float64' >>> c = a+b >>> c array([1. , 2.57079633, 4.14159265]) >>> c.dtype.name 'float64' >>> d = np.exp(c*1j) >>> d array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j, -0.54030231-0.84147098j]) >>> d.dtype.name 'complex128' Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class. >>> >>> a = rg.random((2,3)) >>> a array([[0.82770259, 0.40919914, 0.54959369], [0.02755911, 0.75351311, 0.53814331]]) >>> a.sum() 3.1057109529998157 >>> a.min() 0.027559113243068367 >>> a.max() 0.8277025938204418 By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array: >>> >>> b = np.arange(12).reshape(3,4) >>> b array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> b.sum(axis=0) # sum of each column array([12, 15, 18, 21]) >>> >>> b.min(axis=1) # min of each row array([0, 4, 8]) >>> >>> b.cumsum(axis=1) # cumulative sum along each row array([[ 0, 1, 3, 6], [ 4, 9, 15, 22], [ 8, 17, 27, 38]]) Universal Functions NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output. >>> >>> B = np.arange(3) >>> B array([0, 1, 2]) >>> np.exp(B) array([1. , 2.71828183, 7.3890561 ]) >>> np.sqrt(B) array([0. , 1. , 1.41421356]) >>> C = np.array([2., -1., 4.]) >>> np.add(B, C) array([2., 0., 6.]) See also all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, invert, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where Indexing, Slicing and Iterating One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences. >>> >>> a = np.arange(10)**3 >>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729]) >>> a[2] 8 >>> a[2:5] array([ 8, 27, 64]) # equivalent to a[0:6:2] = 1000; # from start to position 6, exclusive, set every 2nd element to 1000 >>> a[:6:2] = 1000 >>> a array([1000, 1, 1000, 27, 1000, 125, 216, 343, 512, 729]) >>> a[ : :-1] # reversed a array([ 729, 512, 343, 216, 125, 1000, 27, 1000, 1, 1000]) >>> for i in a: ... print(i**(1/3.)) ... 9.999999999999998 1.0 9.999999999999998 3.0 9.999999999999998 4.999999999999999 5.999999999999999 6.999999999999999 7.999999999999999 8.999999999999998 Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas: >>> >>> def f(x,y): ... return 10*x+y ... >>> b = np.fromfunction(f,(5,4),dtype=int) >>> b array([[ 0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43]]) >>> b[2,3] 23 >>> b[0:5, 1] # each row in the second column of b array([ 1, 11, 21, 31, 41]) >>> b[ : ,1] # equivalent to the previous example array([ 1, 11, 21, 31, 41]) >>> b[1:3, : ] # each column in the second and third row of b array([[10, 11, 12, 13], [20, 21, 22, 23]]) When fewer indices are provided than the number of axes, the missing indices are considered complete slices: >>> >>> b[-1] # the last row. Equivalent to b[-1,:] array([40, 41, 42, 43]) The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as b[i,...]. The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array with 5 axes, then x[1,2,...] is equivalent to x[1,2,:,:,:], x[...,3] to x[:,:,:,:,3] and x[4,...,5,:] to x[4,:,:,5,:]. >>> >>> c = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays) ... [ 10, 12, 13]], ... [[100,101,102], ... [110,112,113]]]) >>> c.shape (2, 2, 3) >>> c[1,...] # same as c[1,:,:] or c[1] array([[100, 101, 102], [110, 112, 113]]) >>> c[...,2] # same as c[:,:,2] array([[ 2, 13], [102, 113]]) Iterating over multidimensional arrays is done with respect to the first axis: >>> >>> for row in b: ... print(row) ... [0 1 2 3] [10 11 12 13] [20 21 22 23] [30 31 32 33] [40 41 42 43] However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array: >>> >>> for element in b.flat: ... print(element) ... 0 1 2 3 10 11 12 13 20 21 22 23 30 31 32 33 40 41 42 43 See also Indexing, Indexing (reference), newaxis, ndenumerate, indices Shape Manipulation Changing the shape of an array An array has a shape given by the number of elements along each axis: >>> >>> a = np.floor(10*rg.random((3,4))) >>> a array([[3., 7., 3., 4.], [1., 4., 2., 2.], [7., 2., 4., 9.]]) >>> a.shape (3, 4) The shape of an array can be changed with various commands. Note that the following three commands all return a modified array, but do not change the original array: >>> >>> a.ravel() # returns the array, flattened array([3., 7., 3., 4., 1., 4., 2., 2., 7., 2., 4., 9.]) >>> a.reshape(6,2) # returns the array with a modified shape array([[3., 7.], [3., 4.], [1., 4.], [2., 2.], [7., 2.], [4., 9.]]) >>> a.T # returns the array, transposed array([[3., 1., 7.], [7., 4., 2.], [3., 2., 4.], [4., 2., 9.]]) >>> a.T.shape (4, 3) >>> a.shape (3, 4) The order of the elements in the array resulting from ravel() is normally “C-style”, that is, the rightmost index “changes the fastest”, so the element after a[0,0] is a[0,1]. If the array is reshaped to some other shape, again the array is treated as “C-style”. NumPy normally creates arrays stored in this order, so ravel() will usually not need to copy its argument, but if the array was made by taking slices of another array or created with unusual options, it may need to be copied. The functions ravel() and reshape() can also be instructed, using an optional argument, to use FORTRAN-style arrays, in which the leftmost index changes the fastest. The reshape function returns its argument with a modified shape, whereas the ndarray.resize method modifies the array itself: >>> >>> a array([[3., 7., 3., 4.], [1., 4., 2., 2.], [7., 2., 4., 9.]]) >>> a.resize((2,6)) >>> a array([[3., 7., 3., 4., 1., 4.], [2., 2., 7., 2., 4., 9.]]) If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated: >>> >>> a.reshape(3,-1) array([[3., 7., 3., 4.], [1., 4., 2., 2.], [7., 2., 4., 9.]]) See also ndarray.shape, reshape, resize, ravel Stacking together different arrays Several arrays can be stacked together along different axes: >>> >>> a = np.floor(10*rg.random((2,2))) >>> a array([[9., 7.], [5., 2.]]) >>> b = np.floor(10*rg.random((2,2))) >>> b array([[1., 9.], [5., 1.]]) >>> np.vstack((a,b)) array([[9., 7.], [5., 2.], [1., 9.], [5., 1.]]) >>> np.hstack((a,b)) array([[9., 7., 1., 9.], [5., 2., 5., 1.]]) The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 2D arrays: >>> >>> from numpy import newaxis >>> np.column_stack((a,b)) # with 2D arrays array([[9., 7., 1., 9.], [5., 2., 5., 1.]]) >>> a = np.array([4.,2.]) >>> b = np.array([3.,8.]) >>> np.column_stack((a,b)) # returns a 2D array array([[4., 3.], [2., 8.]]) >>> np.hstack((a,b)) # the result is different array([4., 2., 3., 8.]) >>> a[:,newaxis] # view `a` as a 2D column vector array([[4.], [2.]]) >>> np.column_stack((a[:,newaxis],b[:,newaxis])) array([[4., 3.], [2., 8.]]) >>> np.hstack((a[:,newaxis],b[:,newaxis])) # the result is the same array([[4., 3.], [2., 8.]]) On the other hand, the function row_stack is equivalent to vstack for any input arrays. In fact, row_stack is an alias for vstack: >>> >>> np.column_stack is np.hstack False >>> np.row_stack is np.vstack True In general, for arrays with more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and concatenate allows for an optional arguments giving the number of the axis along which the concatenation should happen. Note In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of range literals (“:”) >>> >>> np.r_[1:4,0,4] array([1, 2, 3, 0, 4]) When used with arrays as arguments, r_ and c_ are similar to vstack and hstack in their default behavior, but allow for an optional argument giving the number of the axis along which to concatenate. See also hstack, vstack, column_stack, concatenate, c_, r_ Splitting one array into several smaller ones Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur: >>> >>> a = np.floor(10*rg.random((2,12))) >>> a array([[6., 7., 6., 9., 0., 5., 4., 0., 6., 8., 5., 2.], [8., 5., 5., 7., 1., 8., 6., 7., 1., 8., 1., 0.]]) # Split a into 3 >>> np.hsplit(a,3) [array([[6., 7., 6., 9.], [8., 5., 5., 7.]]), array([[0., 5., 4., 0.], [1., 8., 6., 7.]]), array([[6., 8., 5., 2.], [1., 8., 1., 0.]])] # Split a after the third and the fourth column >>> np.hsplit(a,(3,4)) [array([[6., 7., 6.], [8., 5., 5.]]), array([[9.], [7.]]), array([[0., 5., 4., 0., 6., 8., 5., 2.], [1., 8., 6., 7., 1., 8., 1., 0.]])] vsplit splits along the vertical axis, and array_split allows one to specify along which axis to split. Copies and Views When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases: No Copy at All Simple assignments make no copy of objects or their data. >>> >>> a = np.array([[ 0, 1, 2, 3], ... [ 4, 5, 6, 7], ... [ 8, 9, 10, 11]]) >>> b = a # no new object is created >>> b is a # a and b are two names for the same ndarray object True Python passes mutable objects as references, so function calls make no copy. >>> >>> def f(x): ... print(id(x)) ... >>> id(a) # id is a unique identifier of an object 148293216 # may vary >>> f(a) 148293216 # may vary View or Shallow Copy Different array objects can share the same data. The view method creates a new array object that looks at the same data. >>> >>> c = a.view() >>> c is a False >>> c.base is a # c is a view of the data owned by a True >>> c.flags.owndata False >>> >>> c = c.reshape((2, 6)) # a's shape doesn't change >>> a.shape (3, 4) >>> c[0, 4] = 1234 # a's data changes >>> a array([[ 0, 1, 2, 3], [1234, 5, 6, 7], [ 8, 9, 10, 11]]) Slicing an array returns a view of it: >>> >>> s = a[ : , 1:3] # spaces added for clarity; could also be written "s = a[:, 1:3]" >>> s[:] = 10 # s[:] is a view of s. Note the difference between s = 10 and s[:] = 10 >>> a array([[ 0, 10, 10, 3], [1234, 10, 10, 7], [ 8, 10, 10, 11]]) Deep Copy The copy method makes a complete copy of the array and its data. >>> >>> d = a.copy() # a new array object with new data is created >>> d is a False >>> d.base is a # d doesn't share anything with a False >>> d[0,0] = 9999 >>> a array([[ 0, 10, 10, 3], [1234, 10, 10, 7], [ 8, 10, 10, 11]]) Sometimes copy should be called after slicing if the original array is not required anymore. For example, suppose a is a huge intermediate result and the final result b only contains a small fraction of a, a deep copy should be made when constructing b with slicing: >>> >>> a = np.arange(int(1e8)) >>> b = a[:100].copy() >>> del a # the memory of ``a`` can be released. If b = a[:100] is used instead, a is referenced by b and will persist in memory even if del a is executed. Functions and Methods Overview Here is a list of some useful NumPy functions and methods names ordered in categories. See Routines for the full list. Array Creation arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r_, zeros, zeros_like Conversions ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat Manipulations array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack Questions all, any, nonzero, where Ordering argmax, argmin, argsort, max, min, ptp, searchsorted, sort Operations choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask, real, sum Basic Statistics cov, mean, std, var Basic Linear Algebra cross, dot, outer, linalg.svd, vdot Less Basic Broadcasting rules Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape. The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions. The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array. After application of the broadcasting rules, the sizes of all arrays must match. More details can be found in Broadcasting. Advanced indexing and index tricks NumPy offers more indexing facilities than regular Python sequences. In addition to indexing by integers and slices, as we saw before, arrays can be indexed by arrays of integers and arrays of booleans. Indexing with Arrays of Indices >>> >>> a = np.arange(12)**2 # the first 12 square numbers >>> i = np.array([1, 1, 3, 8, 5]) # an array of indices >>> a[i] # the elements of a at the positions i array([ 1, 1, 9, 64, 25]) >>> >>> j = np.array([[3, 4], [9, 7]]) # a bidimensional array of indices >>> a[j] # the same shape as j array([[ 9, 16], [81, 49]]) When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. The following example shows this behavior by converting an image of labels into a color image using a palette. >>> >>> palette = np.array([[0, 0, 0], # black ... [255, 0, 0], # red ... [0, 255, 0], # green ... [0, 0, 255], # blue ... [255, 255, 255]]) # white >>> image = np.array([[0, 1, 2, 0], # each value corresponds to a color in the palette ... [0, 3, 4, 0]]) >>> palette[image] # the (2, 4, 3) color image array([[[ 0, 0, 0], [255, 0, 0], [ 0, 255, 0], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 255], [255, 255, 255], [ 0, 0, 0]]]) We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape. >>> >>> a = np.arange(12).reshape(3,4) >>> a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> i = np.array([[0, 1], # indices for the first dim of a ... [1, 2]]) >>> j = np.array([[2, 1], # indices for the second dim ... [3, 3]]) >>> >>> a[i, j] # i and j must have equal shape array([[ 2, 5], [ 7, 11]]) >>> >>> a[i, 2] array([[ 2, 6], [ 6, 10]]) >>> >>> a[:, j] # i.e., a[ : , j] array([[[ 2, 1], [ 3, 3]], [[ 6, 5], [ 7, 7]], [[10, 9], [11, 11]]]) In Python, arr[i, j] is exactly the same as arr[(i, j)]—so we can put i and j in a tuple and then do the indexing with that. >>> >>> l = (i, j) # equivalent to a[i, j] >>> a[l] array([[ 2, 5], [ 7, 11]]) However, we can not do this by putting i and j into an array, because this array will be interpreted as indexing the first dimension of a. >>> >>> s = np.array([i, j]) # not what we want >>> a[s] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index 3 is out of bounds for axis 0 with size 3 # same as a[i, j] >>> a[tuple(s)] array([[ 2, 5], [ 7, 11]]) Another common use of indexing with arrays is the search of the maximum value of time-dependent series: >>> >>> time = np.linspace(20, 145, 5) # time scale >>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series >>> time array([ 20. , 51.25, 82.5 , 113.75, 145. ]) >>> data array([[ 0. , 0.84147098, 0.90929743, 0.14112001], [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], [-0.53657292, 0.42016704, 0.99060736, 0.65028784], [-0.28790332, -0.96139749, -0.75098725, 0.14987721]]) # index of the maxima for each series >>> ind = data.argmax(axis=0) >>> ind array([2, 0, 3, 1]) # times corresponding to the maxima >>> time_max = time[ind] >>> >>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]... >>> time_max array([ 82.5 , 20. , 113.75, 51.25]) >>> data_max array([0.98935825, 0.84147098, 0.99060736, 0.6569866 ]) >>> np.all(data_max == data.max(axis=0)) True You can also use indexing with arrays as a target to assign to: >>> >>> a = np.arange(5) >>> a array([0, 1, 2, 3, 4]) >>> a[[1,3,4]] = 0 >>> a array([0, 0, 2, 0, 0]) However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value: >>> >>> a = np.arange(5) >>> a[[0,0,2]]=[1,2,3] >>> a array([2, 1, 3, 3, 4]) This is reasonable enough, but watch out if you want to use Python’s += construct, as it may not do what you expect: >>> >>> a = np.arange(5) >>> a[[0,0,2]]+=1 >>> a array([1, 1, 3, 3, 4]) Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. This is because Python requires “a+=1” to be equivalent to “a = a + 1”. Indexing with Boolean Arrays When we index arrays with arrays of (integer) indices we are providing the list of indices to pick. With boolean indices the approach is different; we explicitly choose which items in the array we want and which ones we don’t. The most natural way one can think of for boolean indexing is to use boolean arrays that have the same shape as the original array: >>> >>> a = np.arange(12).reshape(3,4) >>> b = a > 4 >>> b # b is a boolean with a's shape array([[False, False, False, False], [False, True, True, True], [ True, True, True, True]]) >>> a[b] # 1d array with the selected elements array([ 5, 6, 7, 8, 9, 10, 11]) This property can be very useful in assignments: >>> >>> a[b] = 0 # All elements of 'a' higher than 4 become 0 >>> a array([[0, 1, 2, 3], [4, 0, 0, 0], [0, 0, 0, 0]]) You can look at the following example to see how to use boolean indexing to generate an image of the Mandelbrot set: >>> import numpy as np import matplotlib.pyplot as plt def mandelbrot( h,w, maxit=20 ): """Returns an image of the Mandelbrot fractal of size (h,w).""" y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ] c = x+y*1j z = c divtime = maxit + np.zeros(z.shape, dtype=int) for i in range(maxit): z = z**2 + c diverge = z*np.conj(z) > 2**2 # who is diverging div_now = diverge & (divtime==maxit) # who is diverging now divtime[div_now] = i # note when z[diverge] = 2 # avoid diverging too much return divtime plt.imshow(mandelbrot(400,400)) ../_images/quickstart-1.png The second way of indexing with booleans is more similar to integer indexing; for each dimension of the array we give a 1D boolean array selecting the slices we want: >>> >>> a = np.arange(12).reshape(3,4) >>> b1 = np.array([False,True,True]) # first dim selection >>> b2 = np.array([True,False,True,False]) # second dim selection >>> >>> a[b1,:] # selecting rows array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[b1] # same thing array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> >>> a[:,b2] # selecting columns array([[ 0, 2], [ 4, 6], [ 8, 10]]) >>> >>> a[b1,b2] # a weird thing to do array([ 4, 10]) Note that the length of the 1D boolean array must coincide with the length of the dimension (or axis) you want to slice. In the previous example, b1 has length 3 (the number of rows in a), and b2 (of length 4) is suitable to index the 2nd axis (columns) of a. The ix_() function The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c: >>> >>> a = np.array([2,3,4,5]) >>> b = np.array([8,5,4]) >>> c = np.array([5,4,6,8,3]) >>> ax,bx,cx = np.ix_(a,b,c) >>> ax array([[[2]], [[3]], [[4]], [[5]]]) >>> bx array([[[8], [5], [4]]]) >>> cx array([[[5, 4, 6, 8, 3]]]) >>> ax.shape, bx.shape, cx.shape ((4, 1, 1), (1, 3, 1), (1, 1, 5)) >>> result = ax+bx*cx >>> result array([[[42, 34, 50, 66, 26], [27, 22, 32, 42, 17], [22, 18, 26, 34, 14]], [[43, 35, 51, 67, 27], [28, 23, 33, 43, 18], [23, 19, 27, 35, 15]], [[44, 36, 52, 68, 28], [29, 24, 34, 44, 19], [24, 20, 28, 36, 16]], [[45, 37, 53, 69, 29], [30, 25, 35, 45, 20], [25, 21, 29, 37, 17]]]) >>> result[3,2,4] 17 >>> a[3]+b[2]*c[4] 17 You could also implement the reduce as follows: >>> >>> def ufunc_reduce(ufct, *vectors): ... vs = np.ix_(*vectors) ... r = ufct.identity ... for v in vs: ... r = ufct(r,v) ... return r and then use it as: >>> >>> ufunc_reduce(np.add,a,b,c) array([[[15, 14, 16, 18, 13], [12, 11, 13, 15, 10], [11, 10, 12, 14, 9]], [[16, 15, 17, 19, 14], [13, 12, 14, 16, 11], [12, 11, 13, 15, 10]], [[17, 16, 18, 20, 15], [14, 13, 15, 17, 12], [13, 12, 14, 16, 11]], [[18, 17, 19, 21, 16], [15, 14, 16, 18, 13], [14, 13, 15, 17, 12]]]) The advantage of this version of reduce compared to the normal ufunc.reduce is that it makes use of the Broadcasting Rules in order to avoid creating an argument array the size of the output times the number of vectors. Indexing with strings See Structured arrays. Linear Algebra Work in progress. Basic linear algebra to be included here. Simple Array Operations See linalg.py in numpy folder for more. >>> >>> import numpy as np >>> a = np.array([[1.0, 2.0], [3.0, 4.0]]) >>> print(a) [[1. 2.] [3. 4.]] >>> a.transpose() array([[1., 3.], [2., 4.]]) >>> np.linalg.inv(a) array([[-2. , 1. ], [ 1.5, -0.5]]) >>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I" >>> u array([[1., 0.], [0., 1.]]) >>> j = np.array([[0.0, -1.0], [1.0, 0.0]]) >>> j @ j # matrix product array([[-1., 0.], [ 0., -1.]]) >>> np.trace(u) # trace 2.0 >>> y = np.array([[5.], [7.]]) >>> np.linalg.solve(a, y) array([[-3.], [ 4.]]) >>> np.linalg.eig(j) (array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j , 0.70710678-0.j ], [0. -0.70710678j, 0. +0.70710678j]])) Parameters: square matrix Returns The eigenvalues, each repeated according to its multiplicity. The normalized (unit "length") eigenvectors, such that the column ``v[:,i]`` is the eigenvector corresponding to the eigenvalue ``w[i]`` . Tricks and Tips Here we give a list of short and useful tips. “Automatic” Reshaping To change the dimensions of an array, you can omit one of the sizes which will then be deduced automatically: >>> >>> a = np.arange(30) >>> b = a.reshape((2, -1, 3)) # -1 means "whatever is needed" >>> b.shape (2, 5, 3) >>> b array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]], [[15, 16, 17], [18, 19, 20], [21, 22, 23], [24, 25, 26], [27, 28, 29]]]) Vector Stacking How do we construct a 2D array from a list of equally-sized row vectors? In MATLAB this is quite easy: if x and y are two vectors of the same length you only need do m=[x;y]. In NumPy this works via the functions column_stack, dstack, hstack and vstack, depending on the dimension in which the stacking is to be done. For example: >>> >>> x = np.arange(0,10,2) >>> y = np.arange(5) >>> m = np.vstack([x,y]) >>> m array([[0, 2, 4, 6, 8], [0, 1, 2, 3, 4]]) >>> xy = np.hstack([x,y]) >>> xy array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4]) The logic behind those functions in more than two dimensions can be strange. See also NumPy for Matlab users Histograms The NumPy histogram function applied to an array returns a pair of vectors: the histogram of the array and a vector of the bin edges. Beware: matplotlib also has a function to build histograms (called hist, as in Matlab) that differs from the one in NumPy. The main difference is that pylab.hist plots the histogram automatically, while numpy.histogram only generates the data. >>> import numpy as np rg = np.random.default_rng(1) import matplotlib.pyplot as plt # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2 mu, sigma = 2, 0.5 v = rg.normal(mu,sigma,10000) # Plot a normalized histogram with 50 bins plt.hist(v, bins=50, density=1) # matplotlib version (plot) # Compute the histogram with numpy and then plot it (n, bins) = np.histogram(v, bins=50, density=True) # NumPy version (no plot) plt.plot(.5*(bins[1:]+bins[:-1]), n) ../_images/quickstart-2.png Further reading The Python tutorial NumPy Reference SciPy Tutorial SciPy Lecture Notes A matlab, R, IDL, NumPy/SciPy dictionary © Copyright 2008-2020, The SciPy community. Last updated on Jun 29, 2020. Created using Sphinx 2.4.4.
Technocolabs100
The goal for the data scientist is to determine in this non-random assignment whether the incremental amount the bank earns exceeded the additional cost of assigning customers to a higher recovery strategy.
This course introduces methods for harnessing data to answer questions of cultural, social, economic, and policy interest. We will start with essential notions of probability and statistics. We will proceed to cover techniques in modern data analysis: regression and econometrics, design of experiments, randomized control trials (and A/B testing), machine learning, data visualization. We will illustrate these concepts with applications drawn from real world examples and frontier research. Finally, we will provide instruction on the use of the statistical package R, and opportunities for students to perform self-directed empirical analyses. Students taking the graduate version will complete additional assignments. No prior preparation in probability and statistics is required, but familiarity with basic algebra and calculus is assumed.
mrc1234
# LIRI Bot ### Overview In this assignment, you will make LIRI. LIRI is like iPhone's SIRI. However, while SIRI is a Speech Interpretation and Recognition Interface, LIRI is a _Language_ Interpretation and Recognition Interface. LIRI will be a command line node app that takes in parameters and gives you back data. ### Before You Begin 1. LIRI will search Spotify for songs, Bands in Town for concerts, and OMDB for movies. 2. Make a new GitHub repository called liri-node-app and clone it to your computer. 3. To retrieve the data that will power this app, you'll need to send requests to the Bands in Town, Spotify and OMDB APIs. You'll find these Node packages crucial for your assignment. * [Node-Spotify-API](https://www.npmjs.com/package/node-spotify-api) * [Request](https://www.npmjs.com/package/request) * You'll use Request to grab data from the [OMDB API](http://www.omdbapi.com) and the [Bands In Town API](http://www.artists.bandsintown.com/bandsintown-api) * [Moment](https://www.npmjs.com/package/moment) * [DotEnv](https://www.npmjs.com/package/dotenv) ## Submission Guide Make sure you use the normal GitHub. Because this is a CLI App, there will be no need to deploy it to Heroku. This time, though, you need to include screenshots, a gif, and/or a video showing us that you got the app working with no bugs. You can include these screenshots or a link to a video in a `README.md` file. * Include screenshots (or a video) of typical user flows through your application (for the customer and if relevant the manager/supervisor). This includes views of the prompts and the responses after their selection (for the different selection options). * Include any other screenshots you deem necessary to help someone who has never been introduced to your application understand the purpose and function of it. This is how you will communicate to potential employers/other developers in the future what you built and why, and to show how it works. * Because screenshots (and well-written READMEs) are extremely important in the context of GitHub, this will be part of the grading. If you haven't written a markdown file yet, [click here for a rundown](https://guides.github.com/features/mastering-markdown/), or just take a look at the raw file of these instructions. ### Submission on BCS * Please submit the link to the Github Repository! ### Instructions 1. Navigate to the root of your project and run `npm init -y` — this will initialize a `package.json` file for your project. The `package.json` file is required for installing third party npm packages and saving their version numbers. If you fail to initialize a `package.json` file, it will be troublesome, and at times almost impossible for anyone else to run your code after cloning your project. 2. Make a `.gitignore` file and add the following lines to it. This will tell git not to track these files, and thus they won't be committed to Github. ``` node_modules .DS_Store .env ``` 3. Make a JavaScript file named `keys.js`. * Inside keys.js your file will look like this: ```js console.log('this is loaded'); exports.spotify = { id: process.env.SPOTIFY_ID, secret: process.env.SPOTIFY_SECRET }; ``` 4. Next, create a file named `.env`, add the following to it, replacing the values with your API keys (no quotes) once you have them: ```js # Spotify API keys SPOTIFY_ID=your-spotify-id SPOTIFY_SECRET=your-spotify-secret ``` * This file will be used by the `dotenv` package to set what are known as environment variables to the global `process.env` object in node. These are values that are meant to be specific to the computer that node is running on, and since we are gitignoring this file, they won't be pushed to github — keeping our API key information private. * If someone wanted to clone your app from github and run it themselves, they would need to supply their own `.env` file for it to work. 5. Make a file called `random.txt`. * Inside of `random.txt` put the following in with no extra characters or white space: * spotify-this-song,"I Want it That Way" 6. Make a JavaScript file named `liri.js`. 7. At the top of the `liri.js` file, add code to read and set any environment variables with the dotenv package: ```js require("dotenv").config(); ``` 8. Add the code required to import the `keys.js` file and store it in a variable. * You should then be able to access your keys information like so ```js var spotify = new Spotify(keys.spotify); ``` 9. Make it so liri.js can take in one of the following commands: * `concert-this` * `spotify-this-song` * `movie-this` * `do-what-it-says` ### What Each Command Should Do 1. `node liri.js concert-this <artist/band name here>` * This will search the Bands in Town Artist Events API (`"https://rest.bandsintown.com/artists/" + artist + "/events?app_id=codingbootcamp"`) for an artist and render the following information about each event to the terminal: * Name of the venue * Venue location * Date of the Event (use moment to format this as "MM/DD/YYYY") 2. `node liri.js spotify-this-song '<song name here>'` * This will show the following information about the song in your terminal/bash window * Artist(s) * The song's name * A preview link of the song from Spotify * The album that the song is from * If no song is provided then your program will default to "The Sign" by Ace of Base. * You will utilize the [node-spotify-api](https://www.npmjs.com/package/node-spotify-api) package in order to retrieve song information from the Spotify API. * The Spotify API requires you sign up as a developer to generate the necessary credentials. You can follow these steps in order to generate a **client id** and **client secret**: * Step One: Visit <https://developer.spotify.com/my-applications/#!/> * Step Two: Either login to your existing Spotify account or create a new one (a free account is fine) and log in. * Step Three: Once logged in, navigate to <https://developer.spotify.com/my-applications/#!/applications/create> to register a new application to be used with the Spotify API. You can fill in whatever you'd like for these fields. When finished, click the "complete" button. * Step Four: On the next screen, scroll down to where you see your client id and client secret. Copy these values down somewhere, you'll need them to use the Spotify API and the [node-spotify-api package](https://www.npmjs.com/package/node-spotify-api). 3. `node liri.js movie-this '<movie name here>'` * This will output the following information to your terminal/bash window: ``` * Title of the movie. * Year the movie came out. * IMDB Rating of the movie. * Rotten Tomatoes Rating of the movie. * Country where the movie was produced. * Language of the movie. * Plot of the movie. * Actors in the movie. ``` * If the user doesn't type a movie in, the program will output data for the movie 'Mr. Nobody.' * If you haven't watched "Mr. Nobody," then you should: <http://www.imdb.com/title/tt0485947/> * It's on Netflix! * You'll use the request package to retrieve data from the OMDB API. Like all of the in-class activities, the OMDB API requires an API key. You may use `trilogy`. 4. `node liri.js do-what-it-says` * Using the `fs` Node package, LIRI will take the text inside of random.txt and then use it to call one of LIRI's commands. * It should run `spotify-this-song` for "I Want it That Way," as follows the text in `random.txt`. * Edit the text in random.txt to test out the feature for movie-this and concert-this. ### BONUS * In addition to logging the data to your terminal/bash window, output the data to a .txt file called `log.txt`. * Make sure you append each command you run to the `log.txt` file. * Do not overwrite your file each time you run a command. ### Reminder: Submission on BCS * Please submit the link to the Github Repository! - - - ### Minimum Requirements Attempt to complete homework assignment as described in instructions. If unable to complete certain portions, please pseudocode these portions to describe what remains to be completed. Adding a README.md as well as adding this homework to your portfolio are required as well and more information can be found below. - - - ### Create a README.md Add a `README.md` to your repository describing the project. Here are some resources for creating your `README.md`. Here are some resources to help you along the way: * [About READMEs](https://help.github.com/articles/about-readmes/) * [Mastering Markdown](https://guides.github.com/features/mastering-markdown/) - - - ### Add To Your Portfolio After completing the homework please add the piece to your portfolio. Make sure to add a link to your updated portfolio in the comments section of your homework so the TAs can easily ensure you completed this step when they are grading the assignment. To receive an 'A' on any assignment, you must link to it from your portfolio. - - - ### One More Thing If you have any questions about this project or the material we have covered, please post them in the community channels in slack so that your fellow developers can help you! If you're still having trouble, you can come to office hours for assistance from your instructor and TAs. **Good Luck!**
CaptainEFFF
# LIRI Bot ### Overview In this assignment, you will make LIRI. LIRI is like iPhone's SIRI. However, while SIRI is a Speech Interpretation and Recognition Interface, LIRI is a _Language_ Interpretation and Recognition Interface. LIRI will be a command line node app that takes in parameters and gives you back data. ### Before You Begin 1. LIRI will search Spotify for songs, Bands in Town for concerts, and OMDB for movies. 2. Make a new GitHub repository called liri-node-app and clone it to your computer. 3. To retrieve the data that will power this app, you'll need to send requests using the `axios` package to the Bands in Town, Spotify and OMDB APIs. You'll find these Node packages crucial for your assignment. * [Node-Spotify-API](https://www.npmjs.com/package/node-spotify-api) * [Axios](https://www.npmjs.com/package/axios) * You'll use Axios to grab data from the [OMDB API](http://www.omdbapi.com) and the [Bands In Town API](http://www.artists.bandsintown.com/bandsintown-api) * [Moment](https://www.npmjs.com/package/moment) * [DotEnv](https://www.npmjs.com/package/dotenv) ## Submission Guide Create and use a standard GitHub repository. As this is a CLI App, it cannot be deployed to GitHub pages or Heroku. This time you'll need to include screenshots, a GIF, and/or a video showing us that you have the app working with no bugs. You can include these screenshots/GIFs or a link to a video in a `README.md` file. In order to meet the Employer Competitive standards and be ready to show your application to employers, the `README.md` file should meet the following criteria: 1. Clearly state the problem the app is trying to solve (i.e. what is it doing and why) 2. Give a high-level overview of how the app is organized 3. Give start-to-finish instructions on how to run the app 4. Include screenshots, gifs or videos of the app functioning 5. Contain a link to a deployed version of the app 6. Clearly list the technologies used in the app 7. State your role in the app development Because screenshots (and well-written READMEs) are extremely important in the context of GitHub, this will be part of the grading in this assignment. If you haven't written a markdown file yet, [click here for a rundown](https://guides.github.com/features/mastering-markdown/), or just take a look at the raw file of these instructions. ### Commits Having an active and healthy commit history on GitHub is important for your future job search. It is also extremely important for making sure your work is saved in your repository. If something breaks, committing often ensures you are able to go back to a working version of your code. * Committing often is a signal to employers that you are actively working on your code and learning. * We use the mantra “commit early and often.” This means that when you write code that works, add it and commit it! * Numerous commits allow you to see how your app is progressing and give you a point to revert to if anything goes wrong. * Be clear and descriptive in your commit messaging. * When writing a commit message, avoid vague messages like "fixed." Be descriptive so that you and anyone else looking at your repository knows what happened with each commit. * We would like you to have well over 200 commits by graduation, so commit early and often! ### Submission on BCS * Please submit the link to the Github Repository! ### Instructions 1. Navigate to the root of your project and run `npm init -y` — this will initialize a `package.json` file for your project. The `package.json` file is required for installing third party npm packages and saving their version numbers. If you fail to initialize a `package.json` file, it will be troublesome, and at times almost impossible for anyone else to run your code after cloning your project. 2. Make a `.gitignore` file and add the following lines to it. This will tell git not to track these files, and thus they won't be committed to Github. ``` node_modules .DS_Store .env ``` 3. Make a JavaScript file named `keys.js`. * Inside keys.js your file will look like this: ```js console.log('this is loaded'); exports.spotify = { id: process.env.SPOTIFY_ID, secret: process.env.SPOTIFY_SECRET }; ``` 4. Next, create a file named `.env`, add the following to it, replacing the values with your API keys (no quotes) once you have them: ```js # Spotify API keys SPOTIFY_ID=your-spotify-id SPOTIFY_SECRET=your-spotify-secret ``` * This file will be used by the `dotenv` package to set what are known as environment variables to the global `process.env` object in node. These are values that are meant to be specific to the computer that node is running on, and since we are gitignoring this file, they won't be pushed to github — keeping our API key information private. * If someone wanted to clone your app from github and run it themselves, they would need to supply their own `.env` file for it to work. 5. Make a file called `random.txt`. * Inside of `random.txt` put the following in with no extra characters or white space: * spotify-this-song,"I Want it That Way" 6. Make a JavaScript file named `liri.js`. 7. At the top of the `liri.js` file, add code to read and set any environment variables with the dotenv package: ```js require("dotenv").config(); ``` 8. Add the code required to import the `keys.js` file and store it in a variable. ```js var keys = require("./keys.js"); ``` * You should then be able to access your keys information like so ```js var spotify = new Spotify(keys.spotify); ``` 9. Make it so liri.js can take in one of the following commands: * `concert-this` * `spotify-this-song` * `movie-this` * `do-what-it-says` ### What Each Command Should Do 1. `node liri.js concert-this <artist/band name here>` * This will search the Bands in Town Artist Events API (`"https://rest.bandsintown.com/artists/" + artist + "/events?app_id=codingbootcamp"`) for an artist and render the following information about each event to the terminal: * Name of the venue * Venue location * Date of the Event (use moment to format this as "MM/DD/YYYY") 2. `node liri.js spotify-this-song '<song name here>'` * This will show the following information about the song in your terminal/bash window * Artist(s) * The song's name * A preview link of the song from Spotify * The album that the song is from * If no song is provided then your program will default to "The Sign" by Ace of Base. * You will utilize the [node-spotify-api](https://www.npmjs.com/package/node-spotify-api) package in order to retrieve song information from the Spotify API. * The Spotify API requires you sign up as a developer to generate the necessary credentials. You can follow these steps in order to generate a **client id** and **client secret**: * Step One: Visit <https://developer.spotify.com/my-applications/#!/> * Step Two: Either login to your existing Spotify account or create a new one (a free account is fine) and log in. * Step Three: Once logged in, navigate to <https://developer.spotify.com/my-applications/#!/applications/create> to register a new application to be used with the Spotify API. You can fill in whatever you'd like for these fields. When finished, click the "complete" button. * Step Four: On the next screen, scroll down to where you see your client id and client secret. Copy these values down somewhere, you'll need them to use the Spotify API and the [node-spotify-api package](https://www.npmjs.com/package/node-spotify-api). 3. `node liri.js movie-this '<movie name here>'` * This will output the following information to your terminal/bash window: ``` * Title of the movie. * Year the movie came out. * IMDB Rating of the movie. * Rotten Tomatoes Rating of the movie. * Country where the movie was produced. * Language of the movie. * Plot of the movie. * Actors in the movie. ``` * If the user doesn't type a movie in, the program will output data for the movie 'Mr. Nobody.' * If you haven't watched "Mr. Nobody," then you should: <http://www.imdb.com/title/tt0485947/> * It's on Netflix! * You'll use the `axios` package to retrieve data from the OMDB API. Like all of the in-class activities, the OMDB API requires an API key. You may use `trilogy`. 4. `node liri.js do-what-it-says` * Using the `fs` Node package, LIRI will take the text inside of random.txt and then use it to call one of LIRI's commands. * It should run `spotify-this-song` for "I Want it That Way," as follows the text in `random.txt`. * Edit the text in random.txt to test out the feature for movie-this and concert-this. ### BONUS * In addition to logging the data to your terminal/bash window, output the data to a .txt file called `log.txt`. * Make sure you append each command you run to the `log.txt` file. * Do not overwrite your file each time you run a command. ### Reminder: Submission on BCS * Please submit the link to the Github Repository! - - - ### Minimum Requirements Attempt to complete homework assignment as described in instructions. If unable to complete certain portions, please pseudocode these portions to describe what remains to be completed. Adding a README.md as well as adding this homework to your portfolio are required as well and more information can be found below. - - - ### Create a README.md Add a `README.md` to your repository describing the project. Here are some resources for creating your `README.md`. Here are some resources to help you along the way: * [About READMEs](https://help.github.com/articles/about-readmes/) * [Mastering Markdown](https://guides.github.com/features/mastering-markdown/) - - - ### Add To Your Portfolio After completing the homework please add the piece to your portfolio. Make sure to add a link to your updated portfolio in the comments section of your homework so the TAs can easily ensure you completed this step when they are grading the assignment. To receive an 'A' on any assignment, you must link to it from your portfolio. - - - ### One More Thing If you have any questions about this project or the material we have covered, please post them in the community channels in slack so that your fellow developers can help you! If you're still having trouble, you can come to office hours for assistance from your instructor and TAs. **Good Luck!**
noahjonesx
Markov Text Generation Problem Description The Infinite Monkey Theorem1 (IFT) says that if a monkey hits keys at random on a typewriter it will almost surely, given an infinite amount of time, produce a chosen text (like the Declaration of Independence, Hamlet, or a script for ... Planet of the Apes). The probability of this actually happening is, of course, very small but the IFT claims that it is still possible. Some people have tested this hypotheis in software and, after billions and billions of simulated years, one virtual monkey was able to type out a sequence of 19 letters that can be found in Shakespeare’s The Two Gentlemen of Verona. (See the April 9, 2007 edition of The New Yorker if you’re interested; but, hypothesis testing with real monkeys2 is far more entertaining.) The IFT might lead to some interesting conversations with Rust Cohle, but the practical applications are few. It does, however, bring up the idea of automated text generation, and there the ideas and applications are not only interesting but also important. Claude Shannon essentially founded the field of information theory with the publication of his landmark paper A Mathematical Theory of Computation3 in 1948. Shannon described a method for using Markov chains to produce a reasonable imitation of a known text with sometimes startling results. For example, here is a sample of text generated from a Markov model of the script for the 1967 movie Planet of the Apes. "PLANET OF THE APES" Screenplay by Michael Wilson Based on Novel By Pierre Boulle DISSOLVE TO: 138 EXT. GROVE OF FRUIT TREES - ESTABLISHING SHOT - DAY Zira run back to the front of Taylor. The President, I believe the prosecutor's charge of this man. ZIRA Well, whoever owned them was in pretty bad shape. He picks up two of the strain. You got what you wanted, kid. How does it taste? Silence. Taylor and cuffs him. Over this we HEAR from a distance is a crude horse-drawn wagon is silhouetted-against the trunks and branches of great trees and bushes on the horse's rump. Taylor lifts his right arm to ward off the blow, and the room and lands at the feet of Cornelius and Lucius are sorting out equipment falls to his knees, buries his head silently at the Ranch). DISSOLVE TO: 197 INT. CAGES - CLOSE SHOT - FEATURING LANDON - FROM TAYLOR'S VOICE (o.s.) I've got a fine veternary surgeons under my direction? ZIRA Taylor! ZIRA There is a small lake, looking like a politician. TAYLOR Dodge takes a pen and notebook from the half-open door of a guard room. Taylor bursts suddenly confronted by his 1https://en.wikipedia.org/wiki/Infinite_monkey_theorem2https://web.archive.org/web/20130120215600/http://www.vivaria.net/experiments/notes/publication/NOTES_ EN.pdf3http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6773024 1 original pursuer (the dismounted cop coming up with a cigar butt and places it in the drawer beside them. TAYLOR What's the best there is a. loud RAP at the doll was found beside the building. Zira waits at the third table. TAYLOR Good question. Is he a man? CORNELIUS (impatiently. DODGE Blessed are the vegetation. These SHOTS are INTERCUT with: 94 WHAT THE ASTRONAUTS They examine the remnants of the cage. ZIRA (plunging on) Their speech organs are adequate. The flaw lies not in anatomy but in the back of his left sleeve. TAYLOR (taking off his shirt. 80 DODGE AND LANDON You don't sound happy in your work. GALEN (defensively) Gorilla hunter stands over a dead man, one fo Besides a few spelling errors and some rather odd things that make you wonder about the author, this passage is surprisingly human-like. This is a simple example of natural language generation, a sub-area of natural language processing—a very active area of research in computer science. The particular approach we’re using in this assignment was famously implemented as the fictitious Mark V. Shaney4 and the Emacs command Disassociated Press5. Approach So, here’s the basic idea: Imagine taking a book (say, Tom Sawyer) and determining the probability with which each character occurs. You would probably find that spaces are the most common, that the character ‘e’ is fairly common, and that the character ‘q’ is rather uncommon. After completing this “level 0” analysis, you would be able to produce random Tom Sawyer text based on character probabilities. It wouldn’t have much in common with the real thing, but at least the characters would tend to occur in the proper propor- tion. In fact, here’s an example of what you might produce: Level 0 rla bsht eS ststofo hhfosdsdewno oe wee h .mr ae irii ela iad o r te u t mnyto onmalysnce, ifu en c fDwn oee iteo Now imagine doing a slightly more sophisticated level 1 analysis by determining the probability with which each character follows every other character. You would probably discover that ‘h’ follows ‘t’ more frequently than ‘x’ does, and you would probably discover that a space follows ‘.’ more frequently than ‘,’ does. You could now produce some randomly generated Tom Sawyer text by picking a character to begin with and then always choosing the next character based on the previous one and the probabilities revealed by the analysis. Here’s an example: Level 1 "Shand tucthiney m?" le ollds mind Theybooure He, he s whit Pereg lenigabo Jodind alllld ashanthe ainofevids tre lin-p asto oun theanthadomoere Now imagine doing a level k analysis by determining the probability with which each character follows every possible sequence of characters of length k (kgrams). A level 5 analysis of Tom Sawyer for example, would reveal that ‘r’ follows “Sawye” more frequently than any other character. After a level k analysis, you would be able to produce random Tom Sawyer by always choosing the next character based on the previous k characters (a kgram) and the probabilities revealed by the analysis. 4https://en.wikipedia.org/wiki/Mark_V._Shaney5https://en.wikipedia.org/wiki/Dissociated_press Page 2 of 5 At only a moderate level of analysis (say, levels 5-7), the randomly generated text begins to take on many of the characteristics of the source text. It probably won’t make complete sense, but you’ll be able to tell that it was derived from Tom Sawyer as opposed to, say, The Sound and the Fury. Here are some more examples of text that is generated from increasing levels of analysis of Tom Sawyer. (These “levels of analysis” are called order K Markov models.) K = 2 "Yess been." for gothin, Tome oso; ing, in to weliss of an’te cle - armit. Papper a comeasione, and smomenty, fropeck hinticer, sid, a was Tom, be suck tied. He sis tred a youck to themen K = 4 en themself, Mr. Welshman, but him awoke, the balmy shore. I’ll give him that he couple overy because in the slated snufflindeed structure’s kind was rath. She said that the wound the door a fever eyes that WITH him. K = 6 people had eaten, leaving. Come - didn’t stand it better judgment; His hands and bury it again, tramped herself! She’d never would be. He found her spite of anything the one was a prime feature sunset, and hit upon that of the forever. K = 8 look-a-here - I told you before, Joe. I’ve heard a pin drop. The stillness was complete, how- ever, this is awful crime, beyond the village was sufficient. He would be a good enough to get that night, Tom and Becky. K = 10 you understanding that they don’t come around in the cave should get the word "beauteous" was over-fondled, and that together" and decided that he might as we used to do - it’s nobby fun. I’ll learn you." To create an order K Markov model of a given source text, you would need to identify all kgrams in the source text and associate with each kgram all the individual characters that follow it. This association or mapping must also capture the frequency with which a given character follows a given kgram. For example, suppose that k = 2 and the sample text is: agggcagcgggcg The Markov model would have to represent all the character strings of length two (2-grams) in the source text, and associate with them the characters that follow them, and in the correct proportion. The following table shows one way of representing this information. kgram Characters that follow ag gc gg gcgc gc agg ca g cg g Once you have created an order K Markov model of a given source text, you can generate new text based on this model as follows. Page 3 of 5 1. Randomly pick k consecutive characters that appear in the sample text and use them as the initial kgram. 2. Append the kgram to the output text being generated. 3. Repeat the following steps until the output text is sufficiently long. (a) Select a character c that appears in the sample text based on the probability of that character following the current kgram. (b) Append this character to the output text. (c) Update the kgram by removing its first character and adding the character just chosen (c) as its last character. If this process encounters a situation in which there are no characters to choose from (which can happen if the only occurrence of the current kgram is at the exact end of the source), simply pick a new kgram at random and continue. As an example, suppose that k = 2 and the sample text is that from above: agggcagcgggcg Here are four different output text strings of length 10 that could have been the result of the process described above, using the first two characters (’ag’) as the initial kgram. agcggcagcg aggcaggcgg agggcaggcg agcggcggca For another example, suppose that k = 2 and the sample text is: the three pirates charted that course the other day Here is how the first three characters of new text might be generated: •A two-character sequence is chosen at random to become the initial kgram. Let’s suppose that “th” is chosen. So, kgram = th and output = th. •The first character must be chosen based on the probability that it follows the kgram (currently “th”) in the source. The source contains five occurrences of “th”. Three times it is followed by ’e’, once it is followed by ’r’, and once it is followed by ’a’. Thus, the next character must be chosen so that there is a 3/5 chance that an ’e’ will be chosen, a 1/5 chance that an ’r’ will be chosen, and a 1/5 chance that an ’a’ will be chosen. Let’s suppose that we choose an ’e’ this time. So, kgram = he and output = the. •The next character must be chosen based on the probability that it follows the kgram (currently “he”) in the source. The source contains three occurrences of “he”. Twice it is followed by a space and once it is followed by ’r’. Thus, the next character must be chosen so that there is a 2/3 chance that a space will be chosen and a 1/3 chance that an ’r’ will be chosen. Let’s suppose that we choose an ’r’ this time. So, kgram = er and output = ther. •The next character must be chosen based on the probability that it follows the kgram (currently “er”) in the source. The source contains only one occurrence of “er”, and it is followed by a space. Thus, the next character must be a space. So, kgram = r_ and output = ther_, where ’_’ represents a blank space. Page 4 of 5 Implementation Details You are provided with two Java files that you must use to develop your solution: MarkovModel.java and TextGenerator.java. The constructors of MarkovModel build the order-k model of the source text. You are required to represent the model with the provided HashMap field. The main method of TextGenerator must process the following three command line arguments (in the args array): •A non-negative integer k •A non-negative integer length. •The name of an input file source that contains more than k characters. Your program must validate the command line arguments by making sure that k and length are non- negative and that source contains at least k characters and can be opened for reading. If any of the command line arguments are invalid, your program must write an informative error message to System.out and terminate. If there are not enough command line arguments, your program must write an informative error message to System.out and terminate. With valid command line arguments, your program must use the methods of the MarkovModel class to create an order k Markov model of the sample text, select the initial kgram, and make each character selection. You must implement the MarkovModel methods according to description of the Markov modeling process in the section above. A few sample texts have been provided, but Project Gutenberg (http://www.gutenberg.org) maintains a large collection of public domain literary works that you can use as source texts for fun and practice. Acknowledgments This assignment is based on the ideas of many people, Jon Bentley and Owen Astrachan in particular.
Static configurations, especially IP addresses, give the adversaries great opportunities to discover network targets and then launch attacks. Frequently changing hosts’ IP addresses to hide their real ones is a novel method to solve this problem. A recent paper OpenFlow Random Host Mutation: Transparent Moving Target Defense using Software Defined Networking specified this new proactive moving target defense. In this assignment, I implement a simple model to replicate the result of this paper based on Mininet network emulation environment, OpenFlow architecture and protocol, and SDN technique. This report includes three major parts: the design of the controller, the implementation of my method, and discussion about the advantages and disadvantages of my approach.
ck37
Stata module for random assignment, including blocking, balance checking, and automated rerandomization.
j3yang
A model checker for the Dynamic Logic of Propositional Assignments (DL-PA) with solving and parameterized random formula generation functionalities.
mrzamaniiii
Comparison of ILP and heuristic solutions for the Routing and Wavelength Assignment problem using NSFNET topology and random traffic matrices.
flybunctious
Introduction In this project, you will develop a simulator and multiple strategies for the dice game Hog. You will need to use control statements and higher-order functions together, as described in Sections 1.2 through 1.6 of Composing Programs. In Hog, two players alternate turns trying to be the first to end a turn with at least 100 total points. On each turn, the current player chooses some number of dice to roll, up to 10. That player's score for the turn is the sum of the dice outcomes. To spice up the game, we will play with some special rules: Pig Out. If any of the dice outcomes is a 1, the current player's score for the turn is 1. Example 1: The current player rolls 7 dice, 5 of which are 1's. They score 1 point for the turn. Example 2: The current player rolls 4 dice, all of which are 3's. Since Pig Out did not occur, they score 12 points for the turn. Free Bacon. A player who chooses to roll zero dice scores one more than the largest digit in the opponent's total score. Example 1: If the opponent has 42 points, the current player gains 1 + max(4, 2) = 5 points by rolling zero dice. Example 2: If the opponent has 48 points, the current player gains 1 + max(4, 8) = 9 points by rolling zero dice. Example 3: If the opponent has 7 points, the current player gains 1 + max(0, 7) = 8 points by rolling zero dice. Swine Swap. After points for the turn are added to the current player's score, if both scores are larger than 1 and either one of the scores is a positive integer multiple of the other, then the two scores are swapped. Example 1: The current player has a total score of 37 and the opponent has 92. The current player rolls two dice that total 9. The opponent's score (92) is exactly twice the player's new total score (46). These scores are swapped! The current player now has 92 points and the opponent has 46. The turn ends. Example 2: The current player has 91 and the opponent has 37. The current player rolls five dice that total 20. The current player has 111, which is 3 times 37, so the scores are swapped. The opponent ends the turn with 111 and wins the game. Download starter files To get started, download all of the project code as a zip archive. You only have to make changes to hog.py. hog.py: A starter implementation of Hog dice.py: Functions for rolling dice hog_gui.py: A graphical user interface for Hog ucb.py: Utility functions for CS 61A ok: CS 61A autograder tests: A directory of tests used by ok images: A directory of images used by hog_gui.py Logistics This is a 2-week project. This is a solo project, so you will complete this project without a partner. You should not share your code with any other students, or copy from anyone else's solutions. Remember that you can earn an additional bonus point by submitting the project at least 24 hours before the deadline. The project is worth 20 points. 18 points are assigned for correctness, and 2 points for the overall composition of your program. You will turn in the following files: hog.py You do not need to modify or turn in any other files to complete the project. To submit the project, run the following command: python3 ok --submit You will be able to view your submissions on the Ok dashboard. For the functions that we ask you to complete, there may be some initial code that we provide. If you would rather not use that code, feel free to delete it and start from scratch. You may also add new function definitions as you see fit. However, please do not modify any other functions. Doing so may result in your code failing our autograder tests. Also, please do not change any function signatures (names, argument order, or number of arguments). Testing Throughout this project, you should be testing the correctness of your code. It is good practice to test often, so that it is easy to isolate any problems. However, you should not be testing too often, to allow yourself time to think through problems. We have provided an autograder called ok to help you with testing your code and tracking your progress. The first time you run the autograder, you will be asked to log in with your Ok account using your web browser. Please do so. Each time you run ok, it will back up your work and progress on our servers. The primary purpose of ok is to test your implementations, but there are two things you should be aware of. First, some of the test cases are locked. To unlock tests, run the following command from your terminal: python3 ok -u This command will start an interactive prompt that looks like: ===================================================================== Assignment: The Game of Hog Ok, version ... ===================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Unlocking tests At each "? ", type what you would expect the output to be. Type exit() to quit --------------------------------------------------------------------- Question 0 > Suite 1 > Case 1 (cases remaining: 1) >>> Code here ? At the ?, you can type what you expect the output to be. If you are correct, then this test case will be available the next time you run the autograder. The idea is to understand conceptually what your program should do first, before you start writing any code. Once you have unlocked some tests and written some code, you can check the correctness of your program using the tests that you have unlocked: python3 ok Most of the time, you will want to focus on a particular question. Use the -q option as directed in the problems below. We recommend that you submit after you finish each problem. Only your last submission will be graded. It is also useful for us to have more backups of your code in case you run into a submission issue. The tests folder is used to store autograder tests, so do not modify it. You may lose all your unlocking progress if you do. If you need to get a fresh copy, you can download the zip archive and copy it over, but you will need to start unlocking from scratch. If you do not want us to record a backup of your work or information about your progress, use the --local option when invoking ok. With this option, no information will be sent to our course servers. Graphical User Interface A graphical user interface (GUI, for short) is provided for you. At the moment, it doesn't work because you haven't implemented the game logic. Once you complete the play function, you will be able to play a fully interactive version of Hog! In order to render the graphics, make sure you have Tkinter, Python's main graphics library, installed on your computer. Once you've done that, you can run the GUI from your terminal: python3 hog_gui.py Once you complete the project, you can play against the final strategy that you've created! python3 hog_gui.py -f Phase 1: Simulator In the first phase, you will develop a simulator for the game of Hog. Problem 0 (0 pt) The dice.py file represents dice using non-pure zero-argument functions. These functions are non-pure because they may have different return values each time they are called. The documentation of dice.py describes the two different types of dice used in the project: Dice can be fair, meaning that they produce each possible outcome with equal probability. Example: six_sided. For testing functions that use dice, deterministic test dice always cycle through a fixed sequence of values that are passed as arguments to the make_test_dice function. Before we start writing any code, let's understand the make_test_dice function by unlocking its tests. python3 ok -q 00 -u This should display a prompt that looks like this: ===================================================================== Assignment: Project 1: Hog Ok, version v1.5.2 ===================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Unlocking tests At each "? ", type what you would expect the output to be. Type exit() to quit --------------------------------------------------------------------- Question 0 > Suite 1 > Case 1 (cases remaining: 1) >>> test_dice = make_test_dice(4, 1, 2) >>> test_dice() ? You should type in what you expect the output to be. To do so, you need to first figure out what test_dice will do, based on the description above. You can exit the unlocker by typing exit() (without quotes). Typing Ctrl-C on Windows to exit out of the unlocker has been known to cause problems, so avoid doing so. Problem 1 (2 pt) Implement the roll_dice function in hog.py. It takes two arguments: a positive integer called num_rolls giving the number of dice to roll and a dice function. It returns the number of points scored by rolling the dice that number of times in a turn: either the sum of the outcomes or 1 (Pig Out). To obtain a single outcome of a dice roll, call dice(). You should call dice() exactly num_rolls times in the body of roll_dice. Remember to call dice() exactly num_rolls times even if Pig Out happens in the middle of rolling. In this way, we correctly simulate rolling all the dice together. Checking Your Work: Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 01 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 01 If the tests don't pass, it's time to debug. You can observe the behavior of your function using Python directly. First, start the Python interpreter and load the hog.py file. python3 -i hog.py Then, you can call your roll_dice function on any number of dice you want, such as 4. >>> roll_dice(4) In most systems, you can evaluate the same expression again by pressing the up arrow or Control-P, then pressing enter or return. You should find that evaluating this call expression gives a different answer each time, since dice rolls are random. The roll_dice function has a default argument value for dice that is a random six-sided dice function. You can also use test dice that fix the outcomes of the dice in advance. For example, rolling twice when you know that the dice will come up 3 and 4 should give a total outcome of 7. >>> fixed_dice = make_test_dice(3, 4) >>> roll_dice(2, fixed_dice) 7 If you find a problem, you need to change your hog.py file, save it, quit Python, start it again, and then start evaluating expressions. Pressing the up arrow should give you access to your previous expressions, even after restarting Python. Once you think that your roll_dice function is correct, run the ok tests again. Tests like these don't prove that your program is exactly correct, but they help you build confidence that this part of your program does what you expect, so that you can trust the abstraction it defines as you proceed. Problem 2 (1 pt) Implement the free_bacon helper function that returns the number of points scored by rolling 0 dice, based on the opponent's current score. You can assume that score is less than 100. For a score less than 10, assume that the first of the two digits is 0. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 02 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 02 You can also test free_bacon interactively by entering python3 -i hog.py in the terminal and then calling free_bacon with various inputs. Problem 3 (1 pt) Implement the take_turn function, which returns the number of points scored for a turn by the current player. Your implementation should call roll_dice when possible. You will need to implement the Free Bacon rule. You can assume that opponent_score is less than 100. Call free_bacon in your implementation of take_turn. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 03 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 03 Problem 4 (1 pt) Implement is_swap, which returns whether or not the scores should be swapped because one is an integer multiple of the other. The is_swap function takes two arguments: the player scores. It returns a boolean value to indicate whether the Swine Swap condition is met. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 04 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 04 Problem 5 (3 pt) Implement the play function, which simulates a full game of Hog. Players alternate turns, each using their respective strategy function (Player 0 uses strategy0, etc.), until one of the players reaches the goal score. When the game ends, play returns the final total scores of both players, with Player 0's score first, and Player 1's score second. Here are some hints: You should use the functions you have already written! You will need to call take_turn with all three arguments. Only call take_turn once per turn. Enforce all the special rules. You can get the number of the other player (either 0 or 1) by calling the provided function other. You can ignore the say argument to the play function for now. You will use it in Phase 2 of the project. A strategy is a function that, given a player's score and their opponent's score, returns how many dice the player wants to roll. A strategy function (such as strategy0 and strategy1) takes two arguments: scores for the current player and opposing player, which both must be non-negative integers. A strategy function returns the number of dice that the current player wants to roll in the turn. Each strategy function should be called only once per turn. Don't worry about the details of implementing strategies yet. You will develop them in Phase 3. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 05 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 05 The last test for Question 5 is a fuzz test, which checks that your play function works for a large number of different inputs. Failing this test means something is wrong, but you should look at other tests to see where the problem might be. Once you are finished, you will be able to play a graphical version of the game. We have provided a file called hog_gui.py that you can run from the terminal: python3 hog_gui.py If you don't already have Tkinter (Python's graphics library) installed, you'll need to install it first before you can run the GUI. The GUI relies on your implementation, so if you have any bugs in your code, they will be reflected in the GUI. This means you can also use the GUI as a debugging tool; however, it's better to run the tests first. Congratulations! You have finished Phase 1 of this project! Phase 2: Commentary In the second phase, you will implement commentary functions that print remarks about the game, such as, "22 points! That's the biggest gain yet for Player 1." A commentary function takes two arguments, the current score for Player 0 and the current score for Player 1. It returns another commentary function to be called on the next turn. It may also print some output as a side effect of being called. The function say_scores in hog.py is an example of a commentary function. The function announce_lead_changes is an example of a higher-order function that returns a commentary function. def say_scores(score0, score1): """A commentary function that announces the score for each player.""" print("Player 0 now has", score0, "and Player 1 now has", score1) return say_scores def announce_lead_changes(previous_leader=None): """Return a commentary function that announces lead changes. >>> f0 = announce_lead_changes() >>> f1 = f0(5, 0) Player 0 takes the lead by 5 >>> f2 = f1(5, 12) Player 1 takes the lead by 7 >>> f3 = f2(8, 12) >>> f4 = f3(8, 13) >>> f5 = f4(15, 13) Player 0 takes the lead by 2 """ def say(score0, score1): if score0 > score1: leader = 0 elif score1 > score0: leader = 1 else: leader = None if leader != None and leader != previous_leader: print('Player', leader, 'takes the lead by', abs(score0 - score1)) return announce_lead_changes(leader) return say Problem 6 (2 pt) Update your play function so that a commentary function is called at the end of each turn. say(score0, score1) should be called at the end of the first turn. Its return value (another commentary function) should be called at the end of the second turn. Each turn, a new commentary function should be called that is the return value of the previous call to a commentary function. Also implement both, a function that takes two commentary functions (f and g) and returns a new commentary function. This new commentary function returns another commentary function which calls the functions returned by calling f and g, in that order. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 06 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 06 Problem 7 (2 pt) Implement the announce_highest function, which is a higher-order function that returns a commentary function. This commentary function announces whenever a particular player gains more points in a turn than ever before. To compute the gain, it must compare the score from last turn to the score from this turn for the player of interest, which is designated by the who argument. This function must also keep track of the highest gain for the player so far. The way in which announce_highest announces is very specific, and your implementation should match the doctests provided. Notice in particular that if the gain is only 1 point, then the message includes "point" in singular form. If the gain is larger, then the message includes "points" in plural form. Use Ok to test your code: python3 ok -q 07 Hint. The announce_lead_changes function provided to you is an example of how to keep track of information using commentary functions. If you are stuck, first make sure you understand how announce_lead_changes works. When you are done, if play the game again, you will see the commentary. python3 hog_gui.py The commentary in the GUI is generated by passing the following function as the say argument to play. both(announce_highest(0), both(announce_highest(1), announce_lead_changes())) Great work! You just finished Phase 2 of the project! Phase 3: Strategies In the third phase, you will experiment with ways to improve upon the basic strategy of always rolling a fixed number of dice. First, you need to develop some tools to evaluate strategies. Problem 8 (2 pt) Implement the make_averaged function, which is a higher-order function that takes a function fn as an argument. It returns another function that takes the same number of arguments as fn (the function originally passed into make_averaged). This returned function differs from the input function in that it returns the average value of repeatedly calling fn on the same arguments. This function should call fn a total of num_samples times and return the average of the results. To implement this function, you need a new piece of Python syntax! You must write a function that accepts an arbitrary number of arguments, then calls another function using exactly those arguments. Here's how it works. Instead of listing formal parameters for a function, we write *args. To call another function using exactly those arguments, we call it again with *args. For example, >>> def printed(fn): ... def print_and_return(*args): ... result = fn(*args) ... print('Result:', result) ... return result ... return print_and_return >>> printed_pow = printed(pow) >>> printed_pow(2, 8) Result: 256 256 >>> printed_abs = printed(abs) >>> printed_abs(-10) Result: 10 10 Read the docstring for make_averaged carefully to understand how it is meant to work. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 08 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 08 Problem 9 (1 pt) Implement the max_scoring_num_rolls function, which runs an experiment to determine the number of rolls (from 1 to 10) that gives the maximum average score for a turn. Your implementation should use make_averaged and roll_dice. If two numbers of rolls are tied for the maximum average score, return the lower number. For example, if both 3 and 6 achieve a maximum average score, return 3. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 09 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 09 To run this experiment on randomized dice, call run_experiments using the -r option: python3 hog.py -r Running experiments For the remainder of this project, you can change the implementation of run_experiments as you wish. By calling average_win_rate, you can evaluate various Hog strategies. For example, change the first if False: to if True: in order to evaluate always_roll(8) against the baseline strategy of always_roll(4). You should find that it wins slightly more often than it loses, giving a win rate around 0.5. Some of the experiments may take up to a minute to run. You can always reduce the number of samples in make_averaged to speed up experiments. Problem 10 (1 pt) A strategy can take advantage of the Free Bacon rule by rolling 0 when it is most beneficial to do so. Implement bacon_strategy, which returns 0 whenever rolling 0 would give at least margin points and returns num_rolls otherwise. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 10 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 10 Once you have implemented this strategy, change run_experiments to evaluate your new strategy against the baseline. You should find that it wins more than half of the time. Problem 11 (2 pt) A strategy can also take advantage of the Swine Swap rule. The swap_strategy rolls 0 if it would cause a beneficial swap. It also returns 0 if rolling 0 would give at least margin points and would not cause a swap. Otherwise, the strategy rolls num_rolls. Before writing any code, unlock the tests to verify your understanding of the question. python3 ok -q 11 -u Once you are done unlocking, begin implementing your solution. You can check your correctness with: python3 ok -q 11 Once you have implemented this strategy, update run_experiments to evaluate your new strategy against the baseline. You should find that it gives a significant edge over always_roll(4). Optional: Problem 12 (0 pt) Implement final_strategy, which combines these ideas and any other ideas you have to achieve a high win rate against the always_roll(4) strategy. Some suggestions: swap_strategy is a good default strategy to start with. There's no point in scoring more than 100. Check whether you can win by rolling 0, 1 or 2 dice. If you are in the lead, you might take fewer risks. Try to force a beneficial swap. Choose the num_rolls and margin arguments carefully. You can check that your final strategy is valid by running Ok. python3 ok -q 12 You can also check your exact final winrate by running python3 calc.py At this point, run the entire autograder to see if there are any tests that don't pass. python3 ok Once you are satisfied, submit to Ok to complete the project. python3 ok --submit You can also play against your final strategy with the graphical user interface: python3 hog_gui.py -f The GUI will alternate which player is controlled by you. Congratulations, you have reached the end of your first CS 61A project! If you haven't already, relax and enjoy a few games of Hog with a friend.
gpoore
Create randomized assignments with solutions/keys using Python and LaTeX.
(Internship Project) Use Unsupervised Learning, Randomized Optimization, Task Assignment algorithms with UI to visualize the dispatch jobs on Singapore maps
nirav1997
Network Properties in Spark GraphFrames In this project, you will implement various network properties using pySpark and GraphFrames. You may need to look into the documentation for GraphFrames and networkx to complete this project. In addition to implementing these network metrics, you will be required to answer some questions when applying these metrics on real world and synthetic networks. Submission Requirements Please create a zip file containing ONLY the following for this project : 0. Do NOT include the data or any extra files. 1. degree.py 2. centrality.py 3. articulations.py 4. A pdf document containing answers to the questions asked in the project description. [Power law questions for Stanford and the 4 random graphs provided in degree.py, answers for the two questions on centrality, answer for the closeness question). Name this as <your_unity_id>.pdf 5. [OPTIONAL] Script for checking power law (if you have written something). Instructions to run in the pdf document which contains the answers to the questions asked in project description. 6. Output for centrality.py saved in a csv file called centrality_out.csv 7. Output for articulations.py saved in a csv file called articulations_out.csv For articulations problem, you may just run it with networkx approach (5 mins run time) and save the output in the file articulations_out.csv. We will be evaluating the project using a script. So please make sure you follow the instructions, or your assignment might not get graded. Degree Distribution The degree distribution is a measure of the frequency of nodes that have a certain degree. Implement a function degreedist, which takes a GraphFrame object as input and computes the degree distribution of the graph. The function should return a DataFrame with two columns: degree and count. For the graphs provided to you, test and report which graphs are scalefree, namely whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as P(k) ~ k−γ where γ is a parameter whose value is typically in the range 2 < γ < 3, although occasionally it may lie outside these bounds. Answer the following questions: 1. Generate a few random graphs. You can do this using networkx’s random graph generators. Do the random graphs you tested appear to be scale free? (Include degree distribution with your answer). 2. Do the Stanford graphs provided to you appear to be scale free? Note that GraphFrames represents all graphs as directed, so every edge in the undirected graphs provided or generated must be added twice, once for each direction. When finding if the degrees follow a power law you can then use either the indegree or outdegree, as they should be equal. Centrality Centrality measures are a way to determine nodes that are important based on the structure of the graph. Closeness centrality measures the distance of a node to all other nodes. We will define the closeness centrality as CC(v) =1/ ∑ d(u,v) u∈V where d(u, v) is the shortestpath distance between u and v. Implement the function closeness, which takes a GraphFrame object as input and computes the closeness centrality of every node in the graph. The function should return a DataFrame with two columns: id and closeness. Consider a small network of 10 computers, illustrated below, with nodes representing computers and edges representing direct connections of two machines. If two computers are not connected directly, then the information must flow through other connected machines. Answer the following questions about the graph: 1. Rank the nodes from highest to lowest closeness centrality. 2. Suppose we had some centralized data that would sit on one machine but would be shared with all computers on the network. Which two machines would be the best candidates to hold this data based on other machines having few hops to access this data? Articulation Points Articulation points, or cut vertices, are vertices in the graph that, when removed, create more components than there were originally. For example, in the simple chain 1-2-3, there is a single component. However, if vertex 2 were removed, there would be 2 components. Thus, vertex 2 is an articulation point. Implement the function articulations, which takes a GraphFrame object as input and finds all the articulation points of a graph. The function should return a DataFrame with two columns, id and articulation, where articulation is a 1 if the node is an articulation point, otherwise a 0. Suppose we had the terrorist communication network given in the file 9_11_edgelist.txt . If our goal was to disrupt the flow of communication between different groups, isolating the articulation points would be a good way to do this. Answer the following questions: 1. In this example, which members should have been targeted to best disrupt communication in the organization?
btskinner
Python script to randomize experimental treatment assignment and produce labels
sthalles
Official PyTorch implementation of Representation Learning via Consistent Assignment of Views over Random Partitions (CARP)
maronin
This is an L-system generator plug-in I created in Autodesk 3ds Max for the Procedural Modelling for Games course. Its function is to generate random trees/plants based on the seed that it is given and the options that are selected. You can change the texture, min/max branch bending, number of iterations, branch thickness, and the type of leaf you want. Every time you generate a tree, it’s always unique. This can make some really interesting and weird looking trees. By far my favorite assignment I got work on while at school.
Groverkss
Random/Course Notes, Assignments i make