Found 130 repositories(showing 30)
curiousily
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BER
sayantann11
Classification - Machine Learning This is ‘Classification’ tutorial which is a part of the Machine Learning course offered by Simplilearn. We will learn Classification algorithms, types of classification algorithms, support vector machines(SVM), Naive Bayes, Decision Tree and Random Forest Classifier in this tutorial. Objectives Let us look at some of the objectives covered under this section of Machine Learning tutorial. Define Classification and list its algorithms Describe Logistic Regression and Sigmoid Probability Explain K-Nearest Neighbors and KNN classification Understand Support Vector Machines, Polynomial Kernel, and Kernel Trick Analyze Kernel Support Vector Machines with an example Implement the Naïve Bayes Classifier Demonstrate Decision Tree Classifier Describe Random Forest Classifier Classification: Meaning Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. It predicts a class for an input variable as well. There are 2 types of Classification: Binomial Multi-Class Classification: Use Cases Some of the key areas where classification cases are being used: To find whether an email received is a spam or ham To identify customer segments To find if a bank loan is granted To identify if a kid will pass or fail in an examination Classification: Example Social media sentiment analysis has two potential outcomes, positive or negative, as displayed by the chart given below. https://www.simplilearn.com/ice9/free_resources_article_thumb/classification-example-machine-learning.JPG This chart shows the classification of the Iris flower dataset into its three sub-species indicated by codes 0, 1, and 2. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-flower-dataset-graph.JPG The test set dots represent the assignment of new test data points to one class or the other based on the trained classifier model. Types of Classification Algorithms Let’s have a quick look into the types of Classification Algorithm below. Linear Models Logistic Regression Support Vector Machines Nonlinear models K-nearest Neighbors (KNN) Kernel Support Vector Machines (SVM) Naïve Bayes Decision Tree Classification Random Forest Classification Logistic Regression: Meaning Let us understand the Logistic Regression model below. This refers to a regression model that is used for classification. This method is widely used for binary classification problems. It can also be extended to multi-class classification problems. Here, the dependent variable is categorical: y ϵ {0, 1} A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail, healthy or sick, etc In this case, you model the probability distribution of output y as 1 or 0. This is called the sigmoid probability (σ). If σ(θ Tx) > 0.5, set y = 1, else set y = 0 Unlike Linear Regression (and its Normal Equation solution), there is no closed form solution for finding optimal weights of Logistic Regression. Instead, you must solve this with maximum likelihood estimation (a probability model to detect the maximum likelihood of something happening). It can be used to calculate the probability of a given outcome in a binary model, like the probability of being classified as sick or passing an exam. https://www.simplilearn.com/ice9/free_resources_article_thumb/logistic-regression-example-graph.JPG Sigmoid Probability The probability in the logistic regression is often represented by the Sigmoid function (also called the logistic function or the S-curve): https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-function-machine-learning.JPG In this equation, t represents data values * the number of hours studied and S(t) represents the probability of passing the exam. Assume sigmoid function: https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-probability-machine-learning.JPG g(z) tends toward 1 as z -> infinity , and g(z) tends toward 0 as z -> infinity K-nearest Neighbors (KNN) K-nearest Neighbors algorithm is used to assign a data point to clusters based on similarity measurement. It uses a supervised method for classification. The steps to writing a k-means algorithm are as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-distribution-graph-machine-learning.JPG Choose the number of k and a distance metric. (k = 5 is common) Find k-nearest neighbors of the sample that you want to classify Assign the class label by majority vote. KNN Classification A new input point is classified in the category such that it has the most number of neighbors from that category. For example: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-classification-machine-learning.JPG Classify a patient as high risk or low risk. Mark email as spam or ham. Keen on learning about Classification Algorithms in Machine Learning? Click here! Support Vector Machine (SVM) Let us understand Support Vector Machine (SVM) in detail below. SVMs are classification algorithms used to assign data to various classes. They involve detecting hyperplanes which segregate data into classes. SVMs are very versatile and are also capable of performing linear or nonlinear classification, regression, and outlier detection. Once ideal hyperplanes are discovered, new data points can be easily classified. https://www.simplilearn.com/ice9/free_resources_article_thumb/support-vector-machines-graph-machine-learning.JPG The optimization objective is to find “maximum margin hyperplane” that is farthest from the closest points in the two classes (these points are called support vectors). In the given figure, the middle line represents the hyperplane. SVM Example Let’s look at this image below and have an idea about SVM in general. Hyperplanes with larger margins have lower generalization error. The positive and negative hyperplanes are represented by: https://www.simplilearn.com/ice9/free_resources_article_thumb/positive-negative-hyperplanes-machine-learning.JPG Classification of any new input sample xtest : If w0 + wTxtest > 1, the sample xtest is said to be in the class toward the right of the positive hyperplane. If w0 + wTxtest < -1, the sample xtest is said to be in the class toward the left of the negative hyperplane. When you subtract the two equations, you get: https://www.simplilearn.com/ice9/free_resources_article_thumb/equation-subtraction-machine-learning.JPG Length of vector w is (L2 norm length): https://www.simplilearn.com/ice9/free_resources_article_thumb/length-of-vector-machine-learning.JPG You normalize with the length of w to arrive at: https://www.simplilearn.com/ice9/free_resources_article_thumb/normalize-equation-machine-learning.JPG SVM: Hard Margin Classification Given below are some points to understand Hard Margin Classification. The left side of equation SVM-1 given above can be interpreted as the distance between the positive (+ve) and negative (-ve) hyperplanes; in other words, it is the margin that can be maximized. Hence the objective of the function is to maximize with the constraint that the samples are classified correctly, which is represented as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-machine-learning.JPG This means that you are minimizing ‖w‖. This also means that all positive samples are on one side of the positive hyperplane and all negative samples are on the other side of the negative hyperplane. This can be written concisely as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-formula.JPG Minimizing ‖w‖ is the same as minimizing. This figure is better as it is differentiable even at w = 0. The approach listed above is called “hard margin linear SVM classifier.” SVM: Soft Margin Classification Given below are some points to understand Soft Margin Classification. To allow for linear constraints to be relaxed for nonlinearly separable data, a slack variable is introduced. (i) measures how much ith instance is allowed to violate the margin. The slack variable is simply added to the linear constraints. https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-machine-learning.JPG Subject to the above constraints, the new objective to be minimized becomes: https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-formula.JPG You have two conflicting objectives now—minimizing slack variable to reduce margin violations and minimizing to increase the margin. The hyperparameter C allows us to define this trade-off. Large values of C correspond to larger error penalties (so smaller margins), whereas smaller values of C allow for higher misclassification errors and larger margins. https://www.simplilearn.com/ice9/free_resources_article_thumb/machine-learning-certification-video-preview.jpg SVM: Regularization The concept of C is the reverse of regularization. Higher C means lower regularization, which increases bias and lowers the variance (causing overfitting). https://www.simplilearn.com/ice9/free_resources_article_thumb/concept-of-c-graph-machine-learning.JPG IRIS Data Set The Iris dataset contains measurements of 150 IRIS flowers from three different species: Setosa Versicolor Viriginica Each row represents one sample. Flower measurements in centimeters are stored as columns. These are called features. IRIS Data Set: SVM Let’s train an SVM model using sci-kit-learn for the Iris dataset: https://www.simplilearn.com/ice9/free_resources_article_thumb/svm-model-graph-machine-learning.JPG Nonlinear SVM Classification There are two ways to solve nonlinear SVMs: by adding polynomial features by adding similarity features Polynomial features can be added to datasets; in some cases, this can create a linearly separable dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/nonlinear-classification-svm-machine-learning.JPG In the figure on the left, there is only 1 feature x1. This dataset is not linearly separable. If you add x2 = (x1)2 (figure on the right), the data becomes linearly separable. Polynomial Kernel In sci-kit-learn, one can use a Pipeline class for creating polynomial features. Classification results for the Moons dataset are shown in the figure. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-machine-learning.JPG Polynomial Kernel with Kernel Trick Let us look at the image below and understand Kernel Trick in detail. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-with-kernel-trick.JPG For large dimensional datasets, adding too many polynomial features can slow down the model. You can apply a kernel trick with the effect of polynomial features without actually adding them. The code is shown (SVC class) below trains an SVM classifier using a 3rd-degree polynomial kernel but with a kernel trick. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-equation-machine-learning.JPG The hyperparameter coefθ controls the influence of high-degree polynomials. Kernel SVM Let us understand in detail about Kernel SVM. Kernel SVMs are used for classification of nonlinear data. In the chart, nonlinear data is projected into a higher dimensional space via a mapping function where it becomes linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-machine-learning.JPG In the higher dimension, a linear separating hyperplane can be derived and used for classification. A reverse projection of the higher dimension back to original feature space takes it back to nonlinear shape. As mentioned previously, SVMs can be kernelized to solve nonlinear classification problems. You can create a sample dataset for XOR gate (nonlinear problem) from NumPy. 100 samples will be assigned the class sample 1, and 100 samples will be assigned the class label -1. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-graph-machine-learning.JPG As you can see, this data is not linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-non-separable.JPG You now use the kernel trick to classify XOR dataset created earlier. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-xor-machine-learning.JPG Naïve Bayes Classifier What is Naive Bayes Classifier? Have you ever wondered how your mail provider implements spam filtering or how online news channels perform news text classification or even how companies perform sentiment analysis of their audience on social media? All of this and more are done through a machine learning algorithm called Naive Bayes Classifier. Naive Bayes Named after Thomas Bayes from the 1700s who first coined this in the Western literature. Naive Bayes classifier works on the principle of conditional probability as given by the Bayes theorem. Advantages of Naive Bayes Classifier Listed below are six benefits of Naive Bayes Classifier. Very simple and easy to implement Needs less training data Handles both continuous and discrete data Highly scalable with the number of predictors and data points As it is fast, it can be used in real-time predictions Not sensitive to irrelevant features Bayes Theorem We will understand Bayes Theorem in detail from the points mentioned below. According to the Bayes model, the conditional probability P(Y|X) can be calculated as: P(Y|X) = P(X|Y)P(Y) / P(X) This means you have to estimate a very large number of P(X|Y) probabilities for a relatively small vector space X. For example, for a Boolean Y and 30 possible Boolean attributes in the X vector, you will have to estimate 3 billion probabilities P(X|Y). To make it practical, a Naïve Bayes classifier is used, which assumes conditional independence of P(X) to each other, with a given value of Y. This reduces the number of probability estimates to 2*30=60 in the above example. Naïve Bayes Classifier for SMS Spam Detection Consider a labeled SMS database having 5574 messages. It has messages as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-machine-learning.JPG Each message is marked as spam or ham in the data set. Let’s train a model with Naïve Bayes algorithm to detect spam from ham. The message lengths and their frequency (in the training dataset) are as shown below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-spam-detection.JPG Analyze the logic you use to train an algorithm to detect spam: Split each message into individual words/tokens (bag of words). Lemmatize the data (each word takes its base form, like “walking” or “walked” is replaced with “walk”). Convert data to vectors using scikit-learn module CountVectorizer. Run TFIDF to remove common words like “is,” “are,” “and.” Now apply scikit-learn module for Naïve Bayes MultinomialNB to get the Spam Detector. This spam detector can then be used to classify a random new message as spam or ham. Next, the accuracy of the spam detector is checked using the Confusion Matrix. For the SMS spam example above, the confusion matrix is shown on the right. Accuracy Rate = Correct / Total = (4827 + 592)/5574 = 97.21% Error Rate = Wrong / Total = (155 + 0)/5574 = 2.78% https://www.simplilearn.com/ice9/free_resources_article_thumb/confusion-matrix-machine-learning.JPG Although confusion Matrix is useful, some more precise metrics are provided by Precision and Recall. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-recall-matrix-machine-learning.JPG Precision refers to the accuracy of positive predictions. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-formula-machine-learning.JPG Recall refers to the ratio of positive instances that are correctly detected by the classifier (also known as True positive rate or TPR). https://www.simplilearn.com/ice9/free_resources_article_thumb/recall-formula-machine-learning.JPG Precision/Recall Trade-off To detect age-appropriate videos for kids, you need high precision (low recall) to ensure that only safe videos make the cut (even though a few safe videos may be left out). The high recall is needed (low precision is acceptable) in-store surveillance to catch shoplifters; a few false alarms are acceptable, but all shoplifters must be caught. Learn about Naive Bayes in detail. Click here! Decision Tree Classifier Some aspects of the Decision Tree Classifier mentioned below are. Decision Trees (DT) can be used both for classification and regression. The advantage of decision trees is that they require very little data preparation. They do not require feature scaling or centering at all. They are also the fundamental components of Random Forests, one of the most powerful ML algorithms. Unlike Random Forests and Neural Networks (which do black-box modeling), Decision Trees are white box models, which means that inner workings of these models are clearly understood. In the case of classification, the data is segregated based on a series of questions. Any new data point is assigned to the selected leaf node. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-machine-learning.JPG Start at the tree root and split the data on the feature using the decision algorithm, resulting in the largest information gain (IG). This splitting procedure is then repeated in an iterative process at each child node until the leaves are pure. This means that the samples at each node belonging to the same class. In practice, you can set a limit on the depth of the tree to prevent overfitting. The purity is compromised here as the final leaves may still have some impurity. The figure shows the classification of the Iris dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-graph.JPG IRIS Decision Tree Let’s build a Decision Tree using scikit-learn for the Iris flower dataset and also visualize it using export_graphviz API. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-machine-learning.JPG The output of export_graphviz can be converted into png format: https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-output.JPG Sample attribute stands for the number of training instances the node applies to. Value attribute stands for the number of training instances of each class the node applies to. Gini impurity measures the node’s impurity. A node is “pure” (gini=0) if all training instances it applies to belong to the same class. https://www.simplilearn.com/ice9/free_resources_article_thumb/impurity-formula-machine-learning.JPG For example, for Versicolor (green color node), the Gini is 1-(0/54)2 -(49/54)2 -(5/54) 2 ≈ 0.168 https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-sample.JPG Decision Boundaries Let us learn to create decision boundaries below. For the first node (depth 0), the solid line splits the data (Iris-Setosa on left). Gini is 0 for Setosa node, so no further split is possible. The second node (depth 1) splits the data into Versicolor and Virginica. If max_depth were set as 3, a third split would happen (vertical dotted line). https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-boundaries.JPG For a sample with petal length 5 cm and petal width 1.5 cm, the tree traverses to depth 2 left node, so the probability predictions for this sample are 0% for Iris-Setosa (0/54), 90.7% for Iris-Versicolor (49/54), and 9.3% for Iris-Virginica (5/54) CART Training Algorithm Scikit-learn uses Classification and Regression Trees (CART) algorithm to train Decision Trees. CART algorithm: Split the data into two subsets using a single feature k and threshold tk (example, petal length < “2.45 cm”). This is done recursively for each node. k and tk are chosen such that they produce the purest subsets (weighted by their size). The objective is to minimize the cost function as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/cart-training-algorithm-machine-learning.JPG The algorithm stops executing if one of the following situations occurs: max_depth is reached No further splits are found for each node Other hyperparameters may be used to stop the tree: min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes Gini Impurity or Entropy Entropy is one more measure of impurity and can be used in place of Gini. https://www.simplilearn.com/ice9/free_resources_article_thumb/gini-impurity-entrophy.JPG It is a degree of uncertainty, and Information Gain is the reduction that occurs in entropy as one traverses down the tree. Entropy is zero for a DT node when the node contains instances of only one class. Entropy for depth 2 left node in the example given above is: https://www.simplilearn.com/ice9/free_resources_article_thumb/entrophy-for-depth-2.JPG Gini and Entropy both lead to similar trees. DT: Regularization The following figure shows two decision trees on the moons dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/dt-regularization-machine-learning.JPG The decision tree on the right is restricted by min_samples_leaf = 4. The model on the left is overfitting, while the model on the right generalizes better. Random Forest Classifier Let us have an understanding of Random Forest Classifier below. A random forest can be considered an ensemble of decision trees (Ensemble learning). Random Forest algorithm: Draw a random bootstrap sample of size n (randomly choose n samples from the training set). Grow a decision tree from the bootstrap sample. At each node, randomly select d features. Split the node using the feature that provides the best split according to the objective function, for instance by maximizing the information gain. Repeat the steps 1 to 2 k times. (k is the number of trees you want to create, using a subset of samples) Aggregate the prediction by each tree for a new data point to assign the class label by majority vote (pick the group selected by the most number of trees and assign new data point to that group). Random Forests are opaque, which means it is difficult to visualize their inner workings. https://www.simplilearn.com/ice9/free_resources_article_thumb/random-forest-classifier-graph.JPG However, the advantages outweigh their limitations since you do not have to worry about hyperparameters except k, which stands for the number of decision trees to be created from a subset of samples. RF is quite robust to noise from the individual decision trees. Hence, you need not prune individual decision trees. The larger the number of decision trees, the more accurate the Random Forest prediction is. (This, however, comes with higher computation cost). Key Takeaways Let us quickly run through what we have learned so far in this Classification tutorial. Classification algorithms are supervised learning methods to split data into classes. They can work on Linear Data as well as Nonlinear Data. Logistic Regression can classify data based on weighted parameters and sigmoid conversion to calculate the probability of classes. K-nearest Neighbors (KNN) algorithm uses similar features to classify data. Support Vector Machines (SVMs) classify data by detecting the maximum margin hyperplane between data classes. Naïve Bayes, a simplified Bayes Model, can help classify data using conditional probability models. Decision Trees are powerful classifiers and use tree splitting logic until pure or somewhat pure leaf node classes are attained. Random Forests apply Ensemble Learning to Decision Trees for more accurate classification predictions. Conclusion This completes ‘Classification’ tutorial. In the next tutorial, we will learn 'Unsupervised Learning with Clustering.'
molyswu
using Neural Networks (SSD) on Tensorflow. This repo documents steps and scripts used to train a hand detector using Tensorflow (Object Detection API). As with any DNN based task, the most expensive (and riskiest) part of the process has to do with finding or creating the right (annotated) dataset. I was interested mainly in detecting hands on a table (egocentric view point). I experimented first with the [Oxford Hands Dataset](http://www.robots.ox.ac.uk/~vgg/data/hands/) (the results were not good). I then tried the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) which was a much better fit to my requirements. The goal of this repo/post is to demonstrate how neural networks can be applied to the (hard) problem of tracking hands (egocentric and other views). Better still, provide code that can be adapted to other uses cases. If you use this tutorial or models in your research or project, please cite [this](#citing-this-tutorial). Here is the detector in action. <img src="images/hand1.gif" width="33.3%"><img src="images/hand2.gif" width="33.3%"><img src="images/hand3.gif" width="33.3%"> Realtime detection on video stream from a webcam . <img src="images/chess1.gif" width="33.3%"><img src="images/chess2.gif" width="33.3%"><img src="images/chess3.gif" width="33.3%"> Detection on a Youtube video. Both examples above were run on a macbook pro **CPU** (i7, 2.5GHz, 16GB). Some fps numbers are: | FPS | Image Size | Device| Comments| | ------------- | ------------- | ------------- | ------------- | | 21 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run without visualizing results| | 16 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | | 11 | 640 * 480 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | > Note: The code in this repo is written and tested with Tensorflow `1.4.0-rc0`. Using a different version may result in [some errors](https://github.com/tensorflow/models/issues/1581). You may need to [generate your own frozen model](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/) graph using the [model checkpoints](model-checkpoint) in the repo to fit your TF version. **Content of this document** - Motivation - Why Track/Detect hands with Neural Networks - Data preparation and network training in Tensorflow (Dataset, Import, Training) - Training the hand detection Model - Using the Detector to Detect/Track hands - Thoughts on Optimizations. > P.S if you are using or have used the models provided here, feel free to reach out on twitter ([@vykthur](https://twitter.com/vykthur)) and share your work! ## Motivation - Why Track/Detect hands with Neural Networks? There are several existing approaches to tracking hands in the computer vision domain. Incidentally, many of these approaches are rule based (e.g extracting background based on texture and boundary features, distinguishing between hands and background using color histograms and HOG classifiers,) making them not very robust. For example, these algorithms might get confused if the background is unusual or in situations where sharp changes in lighting conditions cause sharp changes in skin color or the tracked object becomes occluded.(see [here for a review](https://www.cse.unr.edu/~bebis/handposerev.pdf) paper on hand pose estimation from the HCI perspective) With sufficiently large datasets, neural networks provide opportunity to train models that perform well and address challenges of existing object tracking/detection algorithms - varied/poor lighting, noisy environments, diverse viewpoints and even occlusion. The main drawbacks to usage for real-time tracking/detection is that they can be complex, are relatively slow compared to tracking-only algorithms and it can be quite expensive to assemble a good dataset. But things are changing with advances in fast neural networks. Furthermore, this entire area of work has been made more approachable by deep learning frameworks (such as the tensorflow object detection api) that simplify the process of training a model for custom object detection. More importantly, the advent of fast neural network models like ssd, faster r-cnn, rfcn (see [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models) ) etc make neural networks an attractive candidate for real-time detection (and tracking) applications. Hopefully, this repo demonstrates this. > If you are not interested in the process of training the detector, you can skip straight to applying the [pretrained model I provide in detecting hands](#detecting-hands). Training a model is a multi-stage process (assembling dataset, cleaning, splitting into training/test partitions and generating an inference graph). While I lightly touch on the details of these parts, there are a few other tutorials cover training a custom object detector using the tensorflow object detection api in more detail[ see [here](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) and [here](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9) ]. I recommend you walk through those if interested in training a custom object detector from scratch. ## Data preparation and network training in Tensorflow (Dataset, Import, Training) **The Egohands Dataset** The hand detector model is built using data from the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) dataset. This dataset works well for several reasons. It contains high quality, pixel level annotations (>15000 ground truth labels) where hands are located across 4800 images. All images are captured from an egocentric view (Google glass) across 48 different environments (indoor, outdoor) and activities (playing cards, chess, jenga, solving puzzles etc). <img src="images/egohandstrain.jpg" width="100%"> If you will be using the Egohands dataset, you can cite them as follows: > Bambach, Sven, et al. "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions." Proceedings of the IEEE International Conference on Computer Vision. 2015. The Egohands dataset (zip file with labelled data) contains 48 folders of locations where video data was collected (100 images per folder). ``` -- LOCATION_X -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder -- LOCATION_Y -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder ``` **Converting data to Tensorflow Format** Some initial work needs to be done to the Egohands dataset to transform it into the format (`tfrecord`) which Tensorflow needs to train a model. This repo contains `egohands_dataset_clean.py` a script that will help you generate these csv files. - Downloads the egohands datasets - Renames all files to include their directory names to ensure each filename is unique - Splits the dataset into train (80%), test (10%) and eval (10%) folders. - Reads in `polygons.mat` for each folder, generates bounding boxes and visualizes them to ensure correctness (see image above). - Once the script is done running, you should have an images folder containing three folders - train, test and eval. Each of these folders should also contain a csv label document each - `train_labels.csv`, `test_labels.csv` that can be used to generate `tfrecords` Note: While the egohands dataset provides four separate labels for hands (own left, own right, other left, and other right), for my purpose, I am only interested in the general `hand` class and label all training data as `hand`. You can modify the data prep script to generate `tfrecords` that support 4 labels. Next: convert your dataset + csv files to tfrecords. A helpful guide on this can be found [here](https://pythonprogramming.net/creating-tfrecord-files-tensorflow-object-detection-api-tutorial/).For each folder, you should be able to generate `train.record`, `test.record` required in the training process. ## Training the hand detection Model Now that the dataset has been assembled (and your tfrecords), the next task is to train a model based on this. With neural networks, it is possible to use a process called [transfer learning](https://www.tensorflow.org/tutorials/image_retraining) to shorten the amount of time needed to train the entire model. This means we can take an existing model (that has been trained well on a related domain (here image classification) and retrain its final layer(s) to detect hands for us. Sweet!. Given that neural networks sometimes have thousands or millions of parameters that can take weeks or months to train, transfer learning helps shorten training time to possibly hours. Tensorflow does offer a few models (in the tensorflow [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models)) and I chose to use the `ssd_mobilenet_v1_coco` model as my start point given it is currently (one of) the fastest models (read the SSD research [paper here](https://arxiv.org/pdf/1512.02325.pdf)). The training process can be done locally on your CPU machine which may take a while or better on a (cloud) GPU machine (which is what I did). For reference, training on my macbook pro (tensorflow compiled from source to take advantage of the mac's cpu architecture) the maximum speed I got was 5 seconds per step as opposed to the ~0.5 seconds per step I got with a GPU. For reference it would take about 12 days to run 200k steps on my mac (i7, 2.5GHz, 16GB) compared to ~5hrs on a GPU. > **Training on your own images**: Please use the [guide provided by Harrison from pythonprogramming](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) on how to generate tfrecords given your label csv files and your images. The guide also covers how to start the training process if training locally. [see [here] (https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/)]. If training in the cloud using a service like GCP, see the [guide here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md). As the training process progresses, the expectation is that total loss (errors) gets reduced to its possible minimum (about a value of 1 or thereabout). By observing the tensorboard graphs for total loss(see image below), it should be possible to get an idea of when the training process is complete (total loss does not decrease with further iterations/steps). I ran my training job for 200k steps (took about 5 hours) and stopped at a total Loss (errors) value of 2.575.(In retrospect, I could have stopped the training at about 50k steps and gotten a similar total loss value). With tensorflow, you can also run an evaluation concurrently that assesses your model to see how well it performs on the test data. A commonly used metric for performance is mean average precision (mAP) which is single number used to summarize the area under the precision-recall curve. mAP is a measure of how well the model generates a bounding box that has at least a 50% overlap with the ground truth bounding box in our test dataset. For the hand detector trained here, the mAP value was **0.9686@0.5IOU**. mAP values range from 0-1, the higher the better. <img src="images/accuracy.jpg" width="100%"> Once training is completed, the trained inference graph (`frozen_inference_graph.pb`) is then exported (see the earlier referenced guides for how to do this) and saved in the `hand_inference_graph` folder. Now its time to do some interesting detection. ## Using the Detector to Detect/Track hands If you have not done this yet, please following the guide on installing [Tensorflow and the Tensorflow object detection api](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md). This will walk you through setting up the tensorflow framework, cloning the tensorflow github repo and a guide on - Load the `frozen_inference_graph.pb` trained on the hands dataset as well as the corresponding label map. In this repo, this is done in the `utils/detector_utils.py` script by the `load_inference_graph` method. ```python detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') sess = tf.Session(graph=detection_graph) print("> ====== Hand Inference graph loaded.") ``` - Detect hands. In this repo, this is done in the `utils/detector_utils.py` script by the `detect_objects` method. ```python (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded}) ``` - Visualize detected bounding detection_boxes. In this repo, this is done in the `utils/detector_utils.py` script by the `draw_box_on_image` method. This repo contains two scripts that tie all these steps together. - detect_multi_threaded.py : A threaded implementation for reading camera video input detection and detecting. Takes a set of command line flags to set parameters such as `--display` (visualize detections), image parameters `--width` and `--height`, videe `--source` (0 for camera) etc. - detect_single_threaded.py : Same as above, but single threaded. This script works for video files by setting the video source parameter videe `--source` (path to a video file). ```cmd # load and run detection on video at path "videos/chess.mov" python detect_single_threaded.py --source videos/chess.mov ``` > Update: If you do have errors loading the frozen inference graph in this repo, feel free to generate a new graph that fits your TF version from the model-checkpoint in this repo. Use the [export_inference_graph.py](https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py) script provided in the tensorflow object detection api repo. More guidance on this [here](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/). ## Thoughts on Optimization. A few things that led to noticeable performance increases. - Threading: Turns out that reading images from a webcam is a heavy I/O event and if run on the main application thread can slow down the program. I implemented some good ideas from [Adrian Rosebuck](https://www.pyimagesearch.com/2017/02/06/faster-video-file-fps-with-cv2-videocapture-and-opencv/) on parrallelizing image capture across multiple worker threads. This mostly led to an FPS increase of about 5 points. - For those new to Opencv, images from the `cv2.read()` method return images in [BGR format](https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/). Ensure you convert to RGB before detection (accuracy will be much reduced if you dont). ```python cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) ``` - Keeping your input image small will increase fps without any significant accuracy drop.(I used about 320 x 240 compared to the 1280 x 720 which my webcam provides). - Model Quantization. Moving from the current 32 bit to 8 bit can achieve up to 4x reduction in memory required to load and store models. One way to further speed up this model is to explore the use of [8-bit fixed point quantization](https://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd). Performance can also be increased by a clever combination of tracking algorithms with the already decent detection and this is something I am still experimenting with. Have ideas for optimizing better, please share! <img src="images/general.jpg" width="100%"> Note: The detector does reflect some limitations associated with the training set. This includes non-egocentric viewpoints, very noisy backgrounds (e.g in a sea of hands) and sometimes skin tone. There is opportunity to improve these with additional data. ## Integrating Multiple DNNs. One way to make things more interesting is to integrate our new knowledge of where "hands" are with other detectors trained to recognize other objects. Unfortunately, while our hand detector can in fact detect hands, it cannot detect other objects (a factor or how it is trained). To create a detector that classifies multiple different objects would mean a long involved process of assembling datasets for each class and a lengthy training process. > Given the above, a potential strategy is to explore structures that allow us **efficiently** interleave output form multiple pretrained models for various object classes and have them detect multiple objects on a single image. An example of this is with my primary use case where I am interested in understanding the position of objects on a table with respect to hands on same table. I am currently doing some work on a threaded application that loads multiple detectors and outputs bounding boxes on a single image. More on this soon.
abusufyanvu
MIT Introduction to Deep Learning (6.S191) Instructors: Alexander Amini and Ava Soleimany Course Information Summary Prerequisites Schedule Lectures Labs, Final Projects, Grading, and Prizes Software labs Gather.Town lab + Office Hour sessions Final project Paper Review Project Proposal Presentation Project Proposal Grading Rubric Past Project Proposal Ideas Awards + Categories Important Links and Emails Course Information Summary MIT's introductory course on deep learning methods with applications to computer vision, natural language processing, biology, and more! Students will gain foundational knowledge of deep learning algorithms and get practical experience in building neural networks in TensorFlow. Course concludes with a project proposal competition with feedback from staff and a panel of industry sponsors. Prerequisites We expect basic knowledge of calculus (e.g., taking derivatives), linear algebra (e.g., matrix multiplication), and probability (e.g., Bayes theorem) -- we'll try to explain everything else along the way! Experience in Python is helpful but not necessary. This class is taught during MIT's IAP term by current MIT PhD researchers. Listeners are welcome! Schedule Monday Jan 18, 2021 Lecture: Introduction to Deep Learning and NNs Lab: Lab 1A Tensorflow and building NNs from scratch Tuesday Jan 19, 2021 Lecture: Deep Sequence Modelling Lab: Lab 1B Music Generation using RNNs Wednesday Jan 20, 2021 Lecture: Deep Computer Vision Lab: Lab 2A Image classification and detection Thursday Jan 21, 2021 Lecture: Deep Generative Modelling Lab: Lab 2B Debiasing facial recognition systems Friday Jan 22, 2021 Lecture: Deep Reinforcement Learning Lab: Lab 3 pixel-to-control planning Monday Jan 25, 2021 Lecture: Limitations and New Frontiers Lab: Lab 3 continued Tuesday Jan 26, 2021 Lecture (part 1): Evidential Deep Learning Lecture (part 2): Bias and Fairness Lab: Work on final assignments Lab competition entries due at 11:59pm ET on Canvas! Lab 1, Lab 2, and Lab 3 Wednesday Jan 27, 2021 Lecture (part 1): Nigel Duffy, Ernst & Young Lecture (part 2): Kate Saenko, Boston University and MIT-IBM Watson AI Lab Lab: Work on final assignments Assignments due: Sign up for Final Project Competition Thursday Jan 28, 2021 Lecture (part 1): Sanja Fidler, U. Toronto, Vector Institute, and NVIDIA Lecture (part 2): Katherine Chou, Google Lab: Work on final assignments Assignments due: 1 page paper review (if applicable) Friday Jan 29, 2021 Lecture: Student project pitch competition Lab: Awards ceremony and prize giveaway Assignments due: Project proposals (if applicable) Lectures Lectures will be held starting at 1:00pm ET from Jan 18 - Jan 29 2021, Monday through Friday, virtually through Zoom. Current MIT students, faculty, postdocs, researchers, staff, etc. will be able to access the lectures during this two week period, synchronously or asynchronously, via the MIT Canvas course webpage (MIT internal only). Lecture recordings will be uploaded to the Canvas as soon as possible; students are not required to attend any lectures synchronously. Please see the Canvas for details on Zoom links. The public edition of the course will only be made available after completion of the MIT course. Labs, Final Projects, Grading, and Prizes Course will be graded during MIT IAP for 6 units under P/D/F grading. Receiving a passing grade requires completion of each software lab project (through honor code, with submission required to enter lab competitions), a final project proposal/presentation or written review of a deep learning paper (submission required), and attendance/lecture viewing (through honor code). Submission of a written report or presentation of a project proposal will ensure a passing grade. MIT students will be eligible for prizes and awards as part of the class competitions. There will be two parts to the competitions: (1) software labs and (2) final projects. More information is provided below. Winners will be announced on the last day of class, with thousands of dollars of prizes being given away! Software labs There are three TensorFlow software lab exercises for the course, designed as iPython notebooks hosted in Google Colab. Software labs can be found on GitHub: https://github.com/aamini/introtodeeplearning. These are self-paced exercises and are designed to help you gain practical experience implementing neural networks in TensorFlow. For registered MIT students, submission of lab materials is not necessary to get credit for the course or to pass the course. At the end of each software lab there will be task-associated materials to submit (along with instructions) for entry into the competitions, open to MIT students and affiliates during the IAP offering. This includes MIT students/affiliates who are taking the class as listeners -- you are eligible! These instructions are provided at the end of each of the labs. Completing these tasks and submitting your materials to Canvas will enter you into a per-lab competition. MIT students and affiliates will be eligible for prizes during the IAP offering; at the end of the course, prize-winners will be awarded with their prizes. All competition submissions are due on January 26 at 11:59pm ET to Canvas. For the software lab competitions, submissions will be judged on the basis of the following criteria: Strength and quality of final results (lab dependent) Soundness of implementation and approach Thoroughness and quality of provided descriptions and figures Gather.Town lab + Office Hour sessions After each day’s lecture, there will be open Office Hours in the class GatherTown, up until 3pm ET. An MIT email is required to log in and join the GatherTown. During these sessions, there will not be a walk through or dictation of the labs; the labs are designed to be self-paced and to be worked on on your own time. The GatherTown sessions will be hosted by course staff and are held so you can: Ask questions on course lectures, labs, logistics, project, or anything else; Work on the labs in the presence of classmates/TAs/instructors; Meet classmates to find groups for the final project; Group work time for the final project; Bring the class community together. Final project To satisfy the final project requirement for this course, students will have two options: (1) write a 1 page paper review (single-spaced) on a recent deep learning paper of your choice or (2) participate and present in the project proposal pitch competition. The 1 page paper review option is straightforward, we propose some papers within this document to help you get started, and you can satisfy a passing grade with this option -- you will not be eligible for the grand prizes. On the other hand, participation in the project proposal pitch competition will equivalently satisfy your course requirements but additionally make you eligible for the grand prizes. See the section below for more details and requirements for each of these options. Paper Review Students may satisfy the final project requirement by reading and reviewing a recent deep learning paper of their choosing. In the written review, students should provide both: 1) a description of the problem, technical approach, and results of the paper; 2) critical analysis and exposition of the limitations of the work and opportunities for future work. Reviews should be submitted on Canvas by Thursday Jan 28, 2021, 11:59:59pm Eastern Time (ET). Just a few paper options to consider... https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf https://papers.nips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf https://science.sciencemag.org/content/362/6419/1140 https://papers.nips.cc/paper/2018/file/0e64a7b00c83e3d22ce6b3acf2c582b6-Paper.pdf https://arxiv.org/pdf/1906.11829.pdf https://www.nature.com/articles/s42256-020-00237-3 https://pubmed.ncbi.nlm.nih.gov/32084340/ Project Proposal Presentation Keyword: proposal This is a 2 week course so we do not require results or working implementations! However, to win the top prizes, nice, clear results and implementations will demonstrate feasibility of your proposal which is something we look for! Logistics -- please read! You must sign up to present before 11:59:59pm Eastern Time (ET) on Wednesday Jan 27, 2021 Slides must be in a Google Slide before 11:59:59pm Eastern Time (ET) on Thursday Jan 28, 2021 Project groups can be between 1 and 5 people Listeners welcome To be eligible for a prize you must have at least 1 registered MIT student in your group Each participant will only be allowed to be in one group and present one project pitch Synchronous attendance on 1/29/21 is required to make the project pitch! 3 min presentation on your idea (we will be very strict with the time limits) Prizes! (see below) Sign up to Present here: by 11:59pm ET on Wednesday Jan 27 Once you sign up, make your slide in the following Google Slides; submit by midnight on Thursday Jan 28. Please specify the project group # on your slides!!! Things to Consider This doesn’t have to be a new deep learning method. It can just be an interesting application that you apply some existing deep learning method to. What problem are you solving? Are there use cases/applications? Why do you think deep learning methods might be suited to this task? How have people done it before? Is it a new task? If so, what are similar tasks that people have worked on? In what aspects have they succeeded or failed? What is your method of solving this problem? What type of model + architecture would you use? Why? What is the data for this task? Do you need to make a dataset or is there one publicly available? What are the characteristics of the data? Is it sparse, messy, imbalanced? How would you deal with that? Project Proposal Grading Rubric Project proposals will be evaluated by a panel of judges on the basis of the following three criteria: 1) novelty and impact; 2) technical soundness, feasibility, and organization, including quality of any presented results; 3) clarity and presentation. Each judge will award a score from 1 (lowest) to 5 (highest) for each of the criteria; the average score from each judge across these criteria will then be averaged with that of the other judges to provide the final score. The proposals with the highest final scores will be selected for prizes. Here are the guidelines for the criteria: Novelty and impact: encompasses the potential impact of the project idea, its novelty with respect to existing approaches. Why does the proposed work matter? What problem(s) does it solve? Why are these problems important? Technical soundness, feasibility, and organization: encompasses all technical aspects of the proposal. Do the proposed methodology and architecture make sense? Is the architecture the best suited for the proposed problem? Is deep learning the best approach for the problem? How realistic is it to implement the idea? Was there any implementation of the method? If results and data are presented, we will evaluate the strength of the results/data. Clarity and presentation: encompasses the delivery and quality of the presentation itself. Is the talk well organized? Are the slides aesthetically compelling? Is there a clear, well-delivered narrative? Are the problem and proposed method clearly presented? Past Project Proposal Ideas Recipe Generation with RNNs Can we compress videos with CNN + RNN? Music Generation with RNNs Style Transfer Applied to X GAN’s on a new modality Summarizing text/news articles Combining news articles about similar events Code or spec generation Multimodal speech → handwriting Generate handwriting based on keywords (i.e. cursive, slanted, neat) Predicting stock market trends Show language learners articles or videos at their level Transfer of writing style Chemical Synthesis with Recurrent Neural networks Transfer learning to learn something in a domain for which it’s hard or risky to gather data or do training RNNs to model some type of time series data Computer vision to coach sports players Computer vision system for safety brakes or warnings Use IBM Watson API to get the sentiment of your Facebook newsfeed Deep learning webcam to give wifi-access to friends or improve video chat in some way Domain-specific chatbot to help you perform a specific task Detect whether a signature is fraudulent Awards + Categories Final Project Awards: 1x NVIDIA RTX 3080 4x Google Home Max 3x Display Monitors Software Lab Awards: Bose headphones (Lab 1) Display monitor (Lab 2) Bebop drone (Lab 3) Important Links and Emails Course website: http://introtodeeplearning.com Course staff: introtodeeplearning-staff@mit.edu Piazza forum (MIT only): https://piazza.com/mit/spring2021/6s191 Canvas (MIT only): https://canvas.mit.edu/courses/8291 Software lab repository: https://github.com/aamini/introtodeeplearning Lab/office hour sessions (MIT only): https://gather.town/app/56toTnlBrsKCyFgj/MITDeepLearning
himanshub1007
# AD-Prediction Convolutional Neural Networks for Alzheimer's Disease Prediction Using Brain MRI Image ## Abstract Alzheimers disease (AD) is characterized by severe memory loss and cognitive impairment. It associates with significant brain structure changes, which can be measured by magnetic resonance imaging (MRI) scan. The observable preclinical structure changes provides an opportunity for AD early detection using image classification tools, like convolutional neural network (CNN). However, currently most AD related studies were limited by sample size. Finding an efficient way to train image classifier on limited data is critical. In our project, we explored different transfer-learning methods based on CNN for AD prediction brain structure MRI image. We find that both pretrained 2D AlexNet with 2D-representation method and simple neural network with pretrained 3D autoencoder improved the prediction performance comparing to a deep CNN trained from scratch. The pretrained 2D AlexNet performed even better (**86%**) than the 3D CNN with autoencoder (**77%**). ## Method #### 1. Data In this project, we used public brain MRI data from **Alzheimers Disease Neuroimaging Initiative (ADNI)** Study. ADNI is an ongoing, multicenter cohort study, started from 2004. It focuses on understanding the diagnostic and predictive value of Alzheimers disease specific biomarkers. The ADNI study has three phases: ADNI1, ADNI-GO, and ADNI2. Both ADNI1 and ADNI2 recruited new AD patients and normal control as research participants. Our data included a total of 686 structure MRI scans from both ADNI1 and ADNI2 phases, with 310 AD cases and 376 normal controls. We randomly derived the total sample into training dataset (n = 519), validation dataset (n = 100), and testing dataset (n = 67). #### 2. Image preprocessing Image preprocessing were conducted using Statistical Parametric Mapping (SPM) software, version 12. The original MRI scans were first skull-stripped and segmented using segmentation algorithm based on 6-tissue probability mapping and then normalized to the International Consortium for Brain Mapping template of European brains using affine registration. Other configuration includes: bias, noise, and global intensity normalization. The standard preprocessing process output 3D image files with an uniform size of 121x145x121. Skull-stripping and normalization ensured the comparability between images by transforming the original brain image into a standard image space, so that same brain substructures can be aligned at same image coordinates for different participants. Diluted or enhanced intensity was used to compensate the structure changes. the In our project, we used both whole brain (including both grey matter and white matter) and grey matter only. #### 3. AlexNet and Transfer Learning Convolutional Neural Networks (CNN) are very similar to ordinary Neural Networks. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers are either convolutional, pooling or fully connected. ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network. #### 3.1. AlexNet The net contains eight layers with weights; the first five are convolutional and the remaining three are fully connected. The overall architecture is shown in Figure 1. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. AlexNet maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution. The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU (as shown in Figure1). The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully connected layers are connected to all neurons in the previous layer. Response-normalization layers follow the first and second convolutional layers. Max-pooling layers follow both response-normalization layers as well as the fifth convolutional layer. The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.  The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5x5x48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3x3x256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3x3x192 , and the fifth convolutional layer has 256 kernels of size 3x3x192. The fully-connected layers have 4096 neurons each. #### 3.2. Transfer Learning Training an entire Convolutional Network from scratch (with random initialization) is impractical[14] because it is relatively rare to have a dataset of sufficient size. An alternative is to pretrain a Conv-Net on a very large dataset (e.g. ImageNet), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. Typically, there are three major transfer learning scenarios: **ConvNet as fixed feature extractor:** We can take a ConvNet pretrained on ImageNet, and remove the last fully-connected layer, then treat the rest structure as a fixed feature extractor for the target dataset. In AlexNet, this would be a 4096-D vector. Usually, we call these features as CNN codes. Once we get these features, we can train a linear classifier (e.g. linear SVM or Softmax classifier) for our target dataset. **Fine-tuning the ConvNet:** Another idea is not only replace the last fully-connected layer in the classifier, but to also fine-tune the parameters of the pretrained network. Due to overfitting concerns, we can only fine-tune some higher-level part of the network. This suggestion is motivated by the observation that earlier features in a ConvNet contains more generic features (e.g. edge detectors or color blob detectors) that can be useful for many kind of tasks. But the later layer of the network becomes progressively more specific to the details of the classes contained in the original dataset. **Pretrained models:** The released pretrained model is usually the final ConvNet checkpoint. So it is common to see people use the network for fine-tuning. #### 4. 3D Autoencoder and Convolutional Neural Network We take a two-stage approach where we first train a 3D sparse autoencoder to learn filters for convolution operations, and then build a convolutional neural network whose first layer uses the filters learned with the autoencoder.  #### 4.1. Sparse Autoencoder An autoencoder is a 3-layer neural network that is used to extract features from an input such as an image. Sparse representations can provide a simple interpretation of the input data in terms of a small number of \parts by extracting the structure hidden in the data. The autoencoder has an input layer, a hidden layer and an output layer, and the input and output layers have same number of units, while the hidden layer contains more units for a sparse and overcomplete representation. The encoder function maps input x to representation h, and the decoder function maps the representation h to the output x. In our problem, we extract 3D patches from scans as the input to the network. The decoder function aims to reconstruct the input form the hidden representation h. #### 4.2. 3D Convolutional Neural Network Training the 3D convolutional neural network(CNN) is the second stage. The CNN we use in this project has one convolutional layer, one pooling layer, two linear layers, and finally a log softmax layer. After training the sparse autoencoder, we take the weights and biases of the encoder from trained model, and use them a 3D filter of a 3D convolutional layer of the 1-layer convolutional neural network. Figure 2 shows the architecture of the network. #### 5. Tools In this project, we used Nibabel for MRI image processing and PyTorch Neural Networks implementation.
Aryia-Behroziuan
An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[68] Decision trees Main article: Decision tree learning Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making. Support vector machines Main article: Support vector machines Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.[69] An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Illustration of linear regression on a data set. Regression analysis Main article: Regression analysis Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel[70]), logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher-dimensional space. Bayesian networks Main article: Bayesian network A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet. A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Genetic algorithms Main article: Genetic algorithm A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.[71][72] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[73] Training models Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model. Federated learning Main article: Federated learning Federated learning is an adapted form of distributed artificial intelligence to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.[74] Applications There are many applications for machine learning, including: Agriculture Anatomy Adaptive websites Affective computing Banking Bioinformatics Brain–machine interfaces Cheminformatics Citizen science Computer networks Computer vision Credit-card fraud detection Data quality DNA sequence classification Economics Financial market analysis[75] General game playing Handwriting recognition Information retrieval Insurance Internet fraud detection Linguistics Machine learning control Machine perception Machine translation Marketing Medical diagnosis Natural language processing Natural language understanding Online advertising Optimization Recommender systems Robot locomotion Search engines Sentiment analysis Sequence mining Software engineering Speech recognition Structural health monitoring Syntactic pattern recognition Telecommunication Theorem proving Time series forecasting User behavior analytics In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy of its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.[76] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.[77] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis.[78] In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[79] In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings and that it may have revealed previously unrecognized influences among artists.[80] In 2019 Springer Nature published the first research book created using machine learning.[81] Limitations Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results.[82][83][84] Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[85] In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a collision.[86] Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of dollars invested.[87][88] Bias Main article: Algorithmic bias Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.[89] Language models learned from data have been shown to contain human-like biases.[90][91] Machine learning systems used for criminal risk assessment have been found to be biased against black people.[92][93] In 2015, Google photos would often tag black people as gorillas,[94] and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all.[95] Similar issues with recognizing non-white people have been found in many other systems.[96] In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language.[97] Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.[98] Concern for fairness in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including Fei-Fei Li, who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”[99] Model assessments Classification of machine learning models can be validated by accuracy estimation techniques like the holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[100] In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the false positive rate (FPR) as well as the false negative rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The total operating characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used receiver operating characteristic (ROC) and ROC's associated area under the curve (AUC).[101] Ethics Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[102] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[103][104] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning. Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.[105][106] Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.[107] Hardware Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training deep neural networks (a particular narrow subdomain of machine learning) that contain many layers of non-linear hidden units.[108] By 2019, graphic processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.[109] OpenAI estimated the hardware compute used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.[110][111] Software Software suites containing a variety of machine learning algorithms include the following: Free and open-source so
Aryia-Behroziuan
Poole, Mackworth & Goebel 1998, p. 1. Russell & Norvig 2003, p. 55. Definition of AI as the study of intelligent agents: Poole, Mackworth & Goebel (1998), which provides the version that is used in this article. These authors use the term "computational intelligence" as a synonym for artificial intelligence.[1] Russell & Norvig (2003) (who prefer the term "rational agent") and write "The whole-agent view is now widely accepted in the field".[2] Nilsson 1998 Legg & Hutter 2007 Russell & Norvig 2009, p. 2. McCorduck 2004, p. 204 Maloof, Mark. "Artificial Intelligence: An Introduction, p. 37" (PDF). georgetown.edu. Archived (PDF) from the original on 25 August 2018. "How AI Is Getting Groundbreaking Changes In Talent Management And HR Tech". Hackernoon. Archived from the original on 11 September 2019. Retrieved 14 February 2020. Schank, Roger C. (1991). "Where's the AI". AI magazine. Vol. 12 no. 4. p. 38. Russell & Norvig 2009. "AlphaGo – Google DeepMind". Archived from the original on 10 March 2016. Allen, Gregory (April 2020). "Department of Defense Joint AI Center - Understanding AI Technology" (PDF). AI.mil - The official site of the Department of Defense Joint Artificial Intelligence Center. Archived (PDF) from the original on 21 April 2020. Retrieved 25 April 2020. Optimism of early AI: * Herbert Simon quote: Simon 1965, p. 96 quoted in Crevier 1993, p. 109. * Marvin Minsky quote: Minsky 1967, p. 2 quoted in Crevier 1993, p. 109. Boom of the 1980s: rise of expert systems, Fifth Generation Project, Alvey, MCC, SCI: * McCorduck 2004, pp. 426–441 * Crevier 1993, pp. 161–162,197–203, 211, 240 * Russell & Norvig 2003, p. 24 * NRC 1999, pp. 210–211 * Newquist 1994, pp. 235–248 First AI Winter, Mansfield Amendment, Lighthill report * Crevier 1993, pp. 115–117 * Russell & Norvig 2003, p. 22 * NRC 1999, pp. 212–213 * Howe 1994 * Newquist 1994, pp. 189–201 Second AI winter: * McCorduck 2004, pp. 430–435 * Crevier 1993, pp. 209–210 * NRC 1999, pp. 214–216 * Newquist 1994, pp. 301–318 AI becomes hugely successful in the early 21st century * Clark 2015 Pamela McCorduck (2004, p. 424) writes of "the rough shattering of AI in subfields—vision, natural language, decision theory, genetic algorithms, robotics ... and these with own sub-subfield—that would hardly have anything to say to each other." This list of intelligent traits is based on the topics covered by the major AI textbooks, including: * Russell & Norvig 2003 * Luger & Stubblefield 2004 * Poole, Mackworth & Goebel 1998 * Nilsson 1998 Kolata 1982. Maker 2006. Biological intelligence vs. intelligence in general: Russell & Norvig 2003, pp. 2–3, who make the analogy with aeronautical engineering. McCorduck 2004, pp. 100–101, who writes that there are "two major branches of artificial intelligence: one aimed at producing intelligent behavior regardless of how it was accomplished, and the other aimed at modeling intelligent processes found in nature, particularly human ones." Kolata 1982, a paper in Science, which describes McCarthy's indifference to biological models. Kolata quotes McCarthy as writing: "This is AI, so we don't care if it's psychologically real".[19] McCarthy recently reiterated his position at the AI@50 conference where he said "Artificial intelligence is not, by definition, simulation of human intelligence".[20]. Neats vs. scruffies: * McCorduck 2004, pp. 421–424, 486–489 * Crevier 1993, p. 168 * Nilsson 1983, pp. 10–11 Symbolic vs. sub-symbolic AI: * Nilsson (1998, p. 7), who uses the term "sub-symbolic". General intelligence (strong AI) is discussed in popular introductions to AI: * Kurzweil 1999 and Kurzweil 2005 See the Dartmouth proposal, under Philosophy, below. McCorduck 2004, p. 34. McCorduck 2004, p. xviii. McCorduck 2004, p. 3. McCorduck 2004, pp. 340–400. This is a central idea of Pamela McCorduck's Machines Who Think. She writes: "I like to think of artificial intelligence as the scientific apotheosis of a venerable cultural tradition."[26] "Artificial intelligence in one form or another is an idea that has pervaded Western intellectual history, a dream in urgent need of being realized."[27] "Our history is full of attempts—nutty, eerie, comical, earnest, legendary and real—to make artificial intelligences, to reproduce what is the essential us—bypassing the ordinary means. Back and forth between myth and reality, our imaginations supplying what our workshops couldn't, we have engaged for a long time in this odd form of self-reproduction."[28] She traces the desire back to its Hellenistic roots and calls it the urge to "forge the Gods."[29] "Stephen Hawking believes AI could be mankind's last accomplishment". BetaNews. 21 October 2016. Archived from the original on 28 August 2017. Lombardo P, Boehm I, Nairz K (2020). "RadioComics – Santa Claus and the future of radiology". Eur J Radiol. 122 (1): 108771. doi:10.1016/j.ejrad.2019.108771. PMID 31835078. Ford, Martin; Colvin, Geoff (6 September 2015). "Will robots create more jobs than they destroy?". The Guardian. Archived from the original on 16 June 2018. Retrieved 13 January 2018. AI applications widely used behind the scenes: * Russell & Norvig 2003, p. 28 * Kurzweil 2005, p. 265 * NRC 1999, pp. 216–222 * Newquist 1994, pp. 189–201 AI in myth: * McCorduck 2004, pp. 4–5 * Russell & Norvig 2003, p. 939 AI in early science fiction. * McCorduck 2004, pp. 17–25 Formal reasoning: * Berlinski, David (2000). The Advent of the Algorithm. Harcourt Books. ISBN 978-0-15-601391-8. OCLC 46890682. Archived from the original on 26 July 2020. Retrieved 22 August 2020. Turing, Alan (1948), "Machine Intelligence", in Copeland, B. Jack (ed.), The Essential Turing: The ideas that gave birth to the computer age, Oxford: Oxford University Press, p. 412, ISBN 978-0-19-825080-7 Russell & Norvig 2009, p. 16. Dartmouth conference: * McCorduck 2004, pp. 111–136 * Crevier 1993, pp. 47–49, who writes "the conference is generally recognized as the official birthdate of the new science." * Russell & Norvig 2003, p. 17, who call the conference "the birth of artificial intelligence." * NRC 1999, pp. 200–201 McCarthy, John (1988). "Review of The Question of Artificial Intelligence". Annals of the History of Computing. 10 (3): 224–229., collected in McCarthy, John (1996). "10. Review of The Question of Artificial Intelligence". Defending AI Research: A Collection of Essays and Reviews. CSLI., p. 73, "[O]ne of the reasons for inventing the term "artificial intelligence" was to escape association with "cybernetics". Its concentration on analog feedback seemed misguided, and I wished to avoid having either to accept Norbert (not Robert) Wiener as a guru or having to argue with him." Hegemony of the Dartmouth conference attendees: * Russell & Norvig 2003, p. 17, who write "for the next 20 years the field would be dominated by these people and their students." * McCorduck 2004, pp. 129–130 Russell & Norvig 2003, p. 18. Schaeffer J. (2009) Didn't Samuel Solve That Game?. In: One Jump Ahead. Springer, Boston, MA Samuel, A. L. (July 1959). "Some Studies in Machine Learning Using the Game of Checkers". IBM Journal of Research and Development. 3 (3): 210–229. CiteSeerX 10.1.1.368.2254. doi:10.1147/rd.33.0210. "Golden years" of AI (successful symbolic reasoning programs 1956–1973): * McCorduck 2004, pp. 243–252 * Crevier 1993, pp. 52–107 * Moravec 1988, p. 9 * Russell & Norvig 2003, pp. 18–21 The programs described are Arthur Samuel's checkers program for the IBM 701, Daniel Bobrow's STUDENT, Newell and Simon's Logic Theorist and Terry Winograd's SHRDLU. DARPA pours money into undirected pure research into AI during the 1960s: * McCorduck 2004, p. 131 * Crevier 1993, pp. 51, 64–65 * NRC 1999, pp. 204–205 AI in England: * Howe 1994 Lighthill 1973. Expert systems: * ACM 1998, I.2.1 * Russell & Norvig 2003, pp. 22–24 * Luger & Stubblefield 2004, pp. 227–331 * Nilsson 1998, chpt. 17.4 * McCorduck 2004, pp. 327–335, 434–435 * Crevier 1993, pp. 145–62, 197–203 * Newquist 1994, pp. 155–183 Mead, Carver A.; Ismail, Mohammed (8 May 1989). Analog VLSI Implementation of Neural Systems (PDF). The Kluwer International Series in Engineering and Computer Science. 80. Norwell, MA: Kluwer Academic Publishers. doi:10.1007/978-1-4613-1639-8. ISBN 978-1-4613-1639-8. Archived from the original (PDF) on 6 November 2019. Retrieved 24 January 2020. Formal methods are now preferred ("Victory of the neats"): * Russell & Norvig 2003, pp. 25–26 * McCorduck 2004, pp. 486–487 McCorduck 2004, pp. 480–483. Markoff 2011. "Ask the AI experts: What's driving today's progress in AI?". McKinsey & Company. Archived from the original on 13 April 2018. Retrieved 13 April 2018. Administrator. "Kinect's AI breakthrough explained". i-programmer.info. Archived from the original on 1 February 2016. Rowinski, Dan (15 January 2013). "Virtual Personal Assistants & The Future Of Your Smartphone [Infographic]". ReadWrite. Archived from the original on 22 December 2015. "Artificial intelligence: Google's AlphaGo beats Go master Lee Se-dol". BBC News. 12 March 2016. Archived from the original on 26 August 2016. Retrieved 1 October 2016. Metz, Cade (27 May 2017). "After Win in China, AlphaGo's Designers Explore New AI". Wired. Archived from the original on 2 June 2017. "World's Go Player Ratings". May 2017. Archived from the original on 1 April 2017. "柯洁迎19岁生日 雄踞人类世界排名第一已两年" (in Chinese). May 2017. Archived from the original on 11 August 2017. Clark, Jack (8 December 2015). "Why 2015 Was a Breakthrough Year in Artificial Intelligence". Bloomberg News. Archived from the original on 23 November 2016. Retrieved 23 November 2016. After a half-decade of quiet breakthroughs in artificial intelligence, 2015 has been a landmark year. Computers are smarter and learning faster than ever. "Reshaping Business With Artificial Intelligence". MIT Sloan Management Review. Archived from the original on 19 May 2018. Retrieved 2 May 2018. Lorica, Ben (18 December 2017). "The state of AI adoption". O'Reilly Media. Archived from the original on 2 May 2018. Retrieved 2 May 2018. Allen, Gregory (6 February 2019). "Understanding China's AI Strategy". Center for a New American Security. Archived from the original on 17 March 2019. "Review | How two AI superpowers – the U.S. and China – battle for supremacy in the field". Washington Post. 2 November 2018. Archived from the original on 4 November 2018. Retrieved 4 November 2018. at 10:11, Alistair Dabbs 22 Feb 2019. "Artificial Intelligence: You know it isn't real, yeah?". www.theregister.co.uk. Archived from the original on 21 May 2020. Retrieved 22 August 2020. "Stop Calling it Artificial Intelligence". Archived from the original on 2 December 2019. Retrieved 1 December 2019. "AI isn't taking over the world – it doesn't exist yet". GBG Global website. Archived from the original on 11 August 2020. Retrieved 22 August 2020. Kaplan, Andreas; Haenlein, Michael (1 January 2019). "Siri, Siri, in my hand: Who's the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence". Business Horizons. 62 (1): 15–25. doi:10.1016/j.bushor.2018.08.004. Domingos 2015, Chapter 5. Domingos 2015, Chapter 7. Lindenbaum, M., Markovitch, S., & Rusakov, D. (2004). Selective sampling for nearest neighbor classifiers. Machine learning, 54(2), 125–152. Domingos 2015, Chapter 1. Intractability and efficiency and the combinatorial explosion: * Russell & Norvig 2003, pp. 9, 21–22 Domingos 2015, Chapter 2, Chapter 3. Hart, P. E.; Nilsson, N. J.; Raphael, B. (1972). "Correction to "A Formal Basis for the Heuristic Determination of Minimum Cost Paths"". SIGART Newsletter (37): 28–29. doi:10.1145/1056777.1056779. S2CID 6386648. Domingos 2015, Chapter 2, Chapter 4, Chapter 6. "Can neural network computers learn from experience, and if so, could they ever become what we would call 'smart'?". Scientific American. 2018. Archived from the original on 25 March 2018. Retrieved 24 March 2018. Domingos 2015, Chapter 6, Chapter 7. Domingos 2015, p. 286. "Single pixel change fools AI programs". BBC News. 3 November 2017. Archived from the original on 22 March 2018. Retrieved 12 March 2018. "AI Has a Hallucination Problem That's Proving Tough to Fix". WIRED. 2018. Archived from the original on 12 March 2018. Retrieved 12 March 2018. Matti, D.; Ekenel, H. K.; Thiran, J. P. (2017). Combining LiDAR space clustering and convolutional neural networks for pedestrian detection. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). pp. 1–6. arXiv:1710.06160. doi:10.1109/AVSS.2017.8078512. ISBN 978-1-5386-2939-0. S2CID 2401976. Ferguson, Sarah; Luders, Brandon; Grande, Robert C.; How, Jonathan P. (2015). Real-Time Predictive Modeling and Robust Avoidance of Pedestrians with Uncertain, Changing Intentions. Algorithmic Foundations of Robotics XI. Springer Tracts in Advanced Robotics. 107. Springer, Cham. pp. 161–177. arXiv:1405.5581. doi:10.1007/978-3-319-16595-0_10. ISBN 978-3-319-16594-3. S2CID 8681101. "Cultivating Common Sense | DiscoverMagazine.com". Discover Magazine. 2017. Archived from the original on 25 March 2018. Retrieved 24 March 2018. Davis, Ernest; Marcus, Gary (24 August 2015). "Commonsense reasoning and commonsense knowledge in artificial intelligence". Communications of the ACM. 58 (9): 92–103. doi:10.1145/2701413. S2CID 13583137. Archived from the original on 22 August 2020. Retrieved 6 April 2020. Winograd, Terry (January 1972). "Understanding natural language". Cognitive Psychology. 3 (1): 1–191. doi:10.1016/0010-0285(72)90002-3. "Don't worry: Autonomous cars aren't coming tomorrow (or next year)". Autoweek. 2016. Archived from the original on 25 March 2018. Retrieved 24 March 2018. Knight, Will (2017). "Boston may be famous for bad drivers, but it's the testing ground for a smarter self-driving car". MIT Technology Review. Archived from the original on 22 August 2020. Retrieved 27 March 2018. Prakken, Henry (31 August 2017). "On the problem of making autonomous vehicles conform to traffic law". Artificial Intelligence and Law. 25 (3): 341–363. doi:10.1007/s10506-017-9210-0. Lieto, Antonio (May 2018). "The knowledge level in cognitive architectures: Current limitations and possible developments". Cognitive Systems Research. 48: 39–55. doi:10.1016/j.cogsys.2017.05.001. hdl:2318/1665207. S2CID 206868967. Problem solving, puzzle solving, game playing and deduction: * Russell & Norvig 2003, chpt. 3–9, * Poole, Mackworth & Goebel 1998, chpt. 2,3,7,9, * Luger & Stubblefield 2004, chpt. 3,4,6,8, * Nilsson 1998, chpt. 7–12 Uncertain reasoning: * Russell & Norvig 2003, pp. 452–644, * Poole, Mackworth & Goebel 1998, pp. 345–395, * Luger & Stubblefield 2004, pp. 333–381, * Nilsson 1998, chpt. 19 Psychological evidence of sub-symbolic reasoning: * Wason & Shapiro (1966) showed that people do poorly on completely abstract problems, but if the problem is restated to allow the use of intuitive social intelligence, performance dramatically improves. (See Wason selection task) * Kahneman, Slovic & Tversky (1982) have shown that people are terrible at elementary problems that involve uncertain reasoning. (See list of cognitive biases for several examples). * Lakoff & Núñez (2000) have controversially argued that even our skills at mathematics depend on knowledge and skills that come from "the body", i.e. sensorimotor and perceptual skills. (See Where Mathematics Comes From) Knowledge representation: * ACM 1998, I.2.4, * Russell & Norvig 2003, pp. 320–363, * Poole, Mackworth & Goebel 1998, pp. 23–46, 69–81, 169–196, 235–277, 281–298, 319–345, * Luger & Stubblefield 2004, pp. 227–243, * Nilsson 1998, chpt. 18 Knowledge engineering: * Russell & Norvig 2003, pp. 260–266, * Poole, Mackworth & Goebel 1998, pp. 199–233, * Nilsson 1998, chpt. ≈17.1–17.4 Representing categories and relations: Semantic networks, description logics, inheritance (including frames and scripts): * Russell & Norvig 2003, pp. 349–354, * Poole, Mackworth & Goebel 1998, pp. 174–177, * Luger & Stubblefield 2004, pp. 248–258, * Nilsson 1998, chpt. 18.3 Representing events and time:Situation calculus, event calculus, fluent calculus (including solving the frame problem): * Russell & Norvig 2003, pp. 328–341, * Poole, Mackworth & Goebel 1998, pp. 281–298, * Nilsson 1998, chpt. 18.2 Causal calculus: * Poole, Mackworth & Goebel 1998, pp. 335–337 Representing knowledge about knowledge: Belief calculus, modal logics: * Russell & Norvig 2003, pp. 341–344, * Poole, Mackworth & Goebel 1998, pp. 275–277 Sikos, Leslie F. (June 2017). Description Logics in Multimedia Reasoning. Cham: Springer. doi:10.1007/978-3-319-54066-5. ISBN 978-3-319-54066-5. S2CID 3180114. Archived from the original on 29 August 2017. Ontology: * Russell & Norvig 2003, pp. 320–328 Smoliar, Stephen W.; Zhang, HongJiang (1994). "Content based video indexing and retrieval". IEEE Multimedia. 1 (2): 62–72. doi:10.1109/93.311653. S2CID 32710913. Neumann, Bernd; Möller, Ralf (January 2008). "On scene interpretation with description logics". Image and Vision Computing. 26 (1): 82–101. doi:10.1016/j.imavis.2007.08.013. Kuperman, G. J.; Reichley, R. M.; Bailey, T. C. (1 July 2006). "Using Commercial Knowledge Bases for Clinical Decision Support: Opportunities, Hurdles, and Recommendations". Journal of the American Medical Informatics Association. 13 (4): 369–371. doi:10.1197/jamia.M2055. PMC 1513681. PMID 16622160. MCGARRY, KEN (1 December 2005). "A survey of interestingness measures for knowledge discovery". The Knowledge Engineering Review. 20 (1): 39–61. doi:10.1017/S0269888905000408. S2CID 14987656. Bertini, M; Del Bimbo, A; Torniai, C (2006). "Automatic annotation and semantic retrieval of video sequences using multimedia ontologies". MM '06 Proceedings of the 14th ACM international conference on Multimedia. 14th ACM international conference on Multimedia. Santa Barbara: ACM. pp. 679–682. Qualification problem: * McCarthy & Hayes 1969 * Russell & Norvig 2003[page needed] While McCarthy was primarily concerned with issues in the logical representation of actions, Russell & Norvig 2003 apply the term to the more general issue of default reasoning in the vast network of assumptions underlying all our commonsense knowledge. Default reasoning and default logic, non-monotonic logics, circumscription, closed world assumption, abduction (Poole et al. places abduction under "default reasoning". Luger et al. places this under "uncertain reasoning"): * Russell & Norvig 2003, pp. 354–360, * Poole, Mackworth & Goebel 1998, pp. 248–256, 323–335, * Luger & Stubblefield 2004, pp. 335–363, * Nilsson 1998, ~18.3.3 Breadth of commonsense knowledge: * Russell & Norvig 2003, p. 21, * Crevier 1993, pp. 113–114, * Moravec 1988, p. 13, * Lenat & Guha 1989 (Introduction) Dreyfus & Dreyfus 1986. Gladwell 2005. Expert knowledge as embodied intuition: * Dreyfus & Dreyfus 1986 (Hubert Dreyfus is a philosopher and critic of AI who was among the first to argue that most useful human knowledge was encoded sub-symbolically. See Dreyfus' critique of AI) * Gladwell 2005 (Gladwell's Blink is a popular introduction to sub-symbolic reasoning and knowledge.) * Hawkins & Blakeslee 2005 (Hawkins argues that sub-symbolic knowledge should be the primary focus of AI research.) Planning: * ACM 1998, ~I.2.8, * Russell & Norvig 2003, pp. 375–459, * Poole, Mackworth & Goebel 1998, pp. 281–316, * Luger & Stubblefield 2004, pp. 314–329, * Nilsson 1998, chpt. 10.1–2, 22 Information value theory: * Russell & Norvig 2003, pp. 600–604 Classical planning: * Russell & Norvig 2003, pp. 375–430, * Poole, Mackworth & Goebel 1998, pp. 281–315, * Luger & Stubblefield 2004, pp. 314–329, * Nilsson 1998, chpt. 10.1–2, 22 Planning and acting in non-deterministic domains: conditional planning, execution monitoring, replanning and continuous planning: * Russell & Norvig 2003, pp. 430–449 Multi-agent planning and emergent behavior: * Russell & Norvig 2003, pp. 449–455 Turing 1950. Solomonoff 1956. Alan Turing discussed the centrality of learning as early as 1950, in his classic paper "Computing Machinery and Intelligence".[120] In 1956, at the original Dartmouth AI summer conference, Ray Solomonoff wrote a report on unsupervised probabilistic machine learning: "An Inductive Inference Machine".[121] This is a form of Tom Mitchell's widely quoted definition of machine learning: "A computer program is set to learn from an experience E with respect to some task T and some performance measure P if its performance on T as measured by P improves with experience E." Learning: * ACM 1998, I.2.6, * Russell & Norvig 2003, pp. 649–788, * Poole, Mackworth & Goebel 1998, pp. 397–438, * Luger & Stubblefield 2004, pp. 385–542, * Nilsson 1998, chpt. 3.3, 10.3, 17.5, 20 Jordan, M. I.; Mitchell, T. M. (16 July 2015). "Machine learning: Trends, perspectives, and prospects". Science. 349 (6245): 255–260. Bibcode:2015Sci...349..255J. doi:10.1126/science.aaa8415. PMID 26185243. S2CID 677218. Reinforcement learning: * Russell & Norvig 2003, pp. 763–788 * Luger & Stubblefield 2004, pp. 442–449 Natural language processing: * ACM 1998, I.2.7 * Russell & Norvig 2003, pp. 790–831 * Poole, Mackworth & Goebel 1998, pp. 91–104 * Luger & Stubblefield 2004, pp. 591–632 "Versatile question answering systems: seeing in synthesis" Archived 1 February 2016 at the Wayback Machine, Mittal et al., IJIIDS, 5(2), 119–142, 2011 Applications of natural language processing, including information retrieval (i.e. text mining) and machine translation: * Russell & Norvig 2003, pp. 840–857, * Luger & Stubblefield 2004, pp. 623–630 Cambria, Erik; White, Bebo (May 2014). "Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]". IEEE Computational Intelligence Magazine. 9 (2): 48–57. doi:10.1109/MCI.2014.2307227. S2CID 206451986. Vincent, James (7 November 2019). "OpenAI has published the text-generating AI it said was too dangerous to share". The Verge. Archived from the original on 11 June 2020. Retrieved 11 June 2020. Machine perception: * Russell & Norvig 2003, pp. 537–581, 863–898 * Nilsson 1998, ~chpt. 6 Speech recognition: * ACM 1998, ~I.2.7 * Russell & Norvig 2003, pp. 568–578 Object recognition: * Russell & Norvig 2003, pp. 885–892 Computer vision: * ACM 1998, I.2.10 * Russell & Norvig 2003, pp. 863–898 * Nilsson 1998, chpt. 6 Robotics: * ACM 1998, I.2.9, * Russell & Norvig 2003, pp. 901–942, * Poole, Mackworth & Goebel 1998, pp. 443–460 Moving and configuration space: * Russell & Norvig 2003, pp. 916–932 Tecuci 2012. Robotic mapping (localization, etc): * Russell & Norvig 2003, pp. 908–915 Cadena, Cesar; Carlone, Luca; Carrillo, Henry; Latif, Yasir; Scaramuzza, Davide; Neira, Jose; Reid, Ian; Leonard, John J. (December 2016). "Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age". IEEE Transactions on Robotics. 32 (6): 1309–1332. arXiv:1606.05830. Bibcode:2016arXiv160605830C. doi:10.1109/TRO.2016.2624754. S2CID 2596787. Moravec, Hans (1988). Mind Children. Harvard University Press. p. 15. Chan, Szu Ping (15 November 2015). "This is what will happen when robots take over the world". Archived from the original on 24 April 2018. Retrieved 23 April 2018. "IKEA furniture and the limits of AI". The Economist. 2018. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Kismet. Thompson, Derek (2018). "What Jobs Will the Robots Take?". The Atlantic. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Scassellati, Brian (2002). "Theory of mind for a humanoid robot". Autonomous Robots. 12 (1): 13–24. doi:10.1023/A:1013298507114. S2CID 1979315. Cao, Yongcan; Yu, Wenwu; Ren, Wei; Chen, Guanrong (February 2013). "An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination". IEEE Transactions on Industrial Informatics. 9 (1): 427–438. arXiv:1207.3231. doi:10.1109/TII.2012.2219061. S2CID 9588126. Thro 1993. Edelson 1991. Tao & Tan 2005. Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. Emotion and affective computing: * Minsky 2006 Waddell, Kaveh (2018). "Chatbots Have Entered the Uncanny Valley". The Atlantic. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Pennachin, C.; Goertzel, B. (2007). Contemporary Approaches to Artificial General Intelligence. Artificial General Intelligence. Cognitive Technologies. Cognitive Technologies. Berlin, Heidelberg: Springer. doi:10.1007/978-3-540-68677-4_1. ISBN 978-3-540-23733-4. Roberts, Jacob (2016). "Thinking Machines: The Search for Artificial Intelligence". Distillations. Vol. 2 no. 2. pp. 14–23. Archived from the original on 19 August 2018. Retrieved 20 March 2018. "The superhero of artificial intelligence: can this genius keep it in check?". the Guardian. 16 February 2016. Archived from the original on 23 April 2018. Retrieved 26 April 2018. Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis (26 February 2015). "Human-level control through deep reinforcement learning". Nature. 518 (7540): 529–533. Bibcode:2015Natur.518..529M. doi:10.1038/nature14236. PMID 25719670. S2CID 205242740. Sample, Ian (14 March 2017). "Google's DeepMind makes AI program that can learn like a human". the Guardian. Archived from the original on 26 April 2018. Retrieved 26 April 2018. "From not working to neural networking". The Economist. 2016. Archived from the original on 31 December 2016. Retrieved 26 April 2018. Domingos 2015. Artificial brain arguments: AI requires a simulation of the operation of the human brain * Russell & Norvig 2003, p. 957 * Crevier 1993, pp. 271 and 279 A few of the people who make some form of the argument: * Moravec 1988 * Kurzweil 2005, p. 262 * Hawkins & Blakeslee 2005 The most extreme form of this argument (the brain replacement scenario) was put forward by Clark Glymour in the mid-1970s and was touched on by Zenon Pylyshyn and John Searle in 1980. Goertzel, Ben; Lian, Ruiting; Arel, Itamar; de Garis, Hugo; Chen, Shuo (December 2010). "A world survey of artificial brain projects, Part II: Biologically inspired cognitive architectures". Neurocomputing. 74 (1–3): 30–49. doi:10.1016/j.neucom.2010.08.012. Nilsson 1983, p. 10. Nils Nilsson writes: "Simply put, there is wide disagreement in the field about what AI is all about."[163] AI's immediate precursors: * McCorduck 2004, pp. 51–107 * Crevier 1993, pp. 27–32 * Russell & Norvig 2003, pp. 15, 940 * Moravec 1988, p. 3 Haugeland 1985, pp. 112–117 The most dramatic case of sub-symbolic AI being pushed into the background was the devastating critique of perceptrons by Marvin Minsky and Seymour Papert in 1969. See History of AI, AI winter, or Frank Rosenblatt. Cognitive simulation, Newell and Simon, AI at CMU (then called Carnegie Tech): * McCorduck 2004, pp. 139–179, 245–250, 322–323 (EPAM) * Crevier 1993, pp. 145–149 Soar (history): * McCorduck 2004, pp. 450–451 * Crevier 1993, pp. 258–263 McCarthy and AI research at SAIL and SRI International: * McCorduck 2004, pp. 251–259 * Crevier 1993 AI research at Edinburgh and in France, birth of Prolog: * Crevier 1993, pp. 193–196 * Howe 1994 AI at MIT under Marvin Minsky in the 1960s : * McCorduck 2004, pp. 259–305 * Crevier 1993, pp. 83–102, 163–176 * Russell & Norvig 2003, p. 19 Cyc: * McCorduck 2004, p. 489, who calls it "a determinedly scruffy enterprise" * Crevier 1993, pp. 239–243 * Russell & Norvig 2003, p. 363−365 * Lenat & Guha 1989 Knowledge revolution: * McCorduck 2004, pp. 266–276, 298–300, 314, 421 * Russell & Norvig 2003, pp. 22–23 Frederick, Hayes-Roth; William, Murray; Leonard, Adelman. "Expert systems". AccessScience. doi:10.1036/1097-8542.248550. Embodied approaches to AI: * McCorduck 2004, pp. 454–462 * Brooks 1990 * Moravec 1988 Weng et al. 2001. Lungarella et al. 2003. Asada et al. 2009. Oudeyer 2010. Revival of connectionism: * Crevier 1993, pp. 214–215 * Russell & Norvig 2003, p. 25 Computational intelligence * IEEE Computational Intelligence Society Archived 9 May 2008 at the Wayback Machine Hutson, Matthew (16 February 2018). "Artificial intelligence faces reproducibility crisis". Science. pp. 725–726. Bibcode:2018Sci...359..725H. doi:10.1126/science.359.6377.725. Archived from the original on 29 April 2018. Retrieved 28 April 2018. Norvig 2012. Langley 2011. Katz 2012. The intelligent agent paradigm: * Russell & Norvig 2003, pp. 27, 32–58, 968–972 * Poole, Mackworth & Goebel 1998, pp. 7–21 * Luger & Stubblefield 2004, pp. 235–240 * Hutter 2005, pp. 125–126 The definition used in this article, in terms of goals, actions, perception and environment, is due to Russell & Norvig (2003). Other definitions also include knowledge and learning as additional criteria. Agent architectures, hybrid intelligent systems: * Russell & Norvig (2003, pp. 27, 932, 970–972) * Nilsson (1998, chpt. 25) Hierarchical control system: * Albus 2002 Lieto, Antonio; Lebiere, Christian; Oltramari, Alessandro (May 2018). "The knowledge level in cognitive architectures: Current limitations and possibile developments". Cognitive Systems Research. 48: 39–55. doi:10.1016/j.cogsys.2017.05.001. hdl:2318/1665207. S2CID 206868967. Lieto, Antonio; Bhatt, Mehul; Oltramari, Alessandro; Vernon, David (May 2018). "The role of cognitive architectures in general artificial intelligence". Cognitive Systems Research. 48: 1–3. doi:10.1016/j.cogsys.2017.08.003. hdl:2318/1665249. S2CID 36189683. Russell & Norvig 2009, p. 1. White Paper: On Artificial Intelligence - A European approach to excellence and trust (PDF). Brussels: European Commission. 2020. p. 1. Archived (PDF) from the original on 20 February 2020. Retrieved 20 February 2020. CNN 2006. Using AI to predict flight delays Archived 20 November 2018 at the Wayback Machine, Ishti.org. N. Aletras; D. Tsarapatsanis; D. Preotiuc-Pietro; V. Lampos (2016). "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective". PeerJ Computer Science. 2: e93. doi:10.7717/peerj-cs.93. "The Economist Explains: Why firms are piling into artificial intelligence". The Economist. 31 March 2016. Archived from the original on 8 May 2016. Retrieved 19 May 2016. Lohr, Steve (28 February 2016). "The Promise of Artificial Intelligence Unfolds in Small Steps". The New York Times. Archived from the original on 29 February 2016. Retrieved 29 February 2016. Frangoul, Anmar (14 June 2019). "A Californian business is using A.I. to change the way we think about energy storage". CNBC. Archived from the original on 25 July 2020. Retrieved 5 November 2019. Wakefield, Jane (15 June 2016). "Social media 'outstrips TV' as news source for young people". BBC News. Archived from the original on 24 June 2016. Smith, Mark (22 July 2016). "So you think you chose to read this article?". BBC News. Archived from the original on 25 July 2016. Brown, Eileen. "Half of Americans do not believe deepfake news could target them online". ZDNet. Archived from the original on 6 November 2019. Retrieved 3 December 2019. The Turing test: Turing's original publication: * Turing 1950 Historical influence and philosophical implications: * Haugeland 1985, pp. 6–9 * Crevier 1993, p. 24 * McCorduck 2004, pp. 70–71 * Russell & Norvig 2003, pp. 2–3 and 948 Dartmouth proposal: * McCarthy et al. 1955 (the original proposal) * Crevier 1993, p. 49 (historical significance) The physical symbol systems hypothesis: * Newell & Simon 1976, p. 116 * McCorduck 2004, p. 153 * Russell & Norvig 2003, p. 18 Dreyfus 1992, p. 156. Dreyfus criticized the necessary condition of the physical symbol system hypothesis, which he called the "psychological assumption": "The mind can be viewed as a device operating on bits of information according to formal rules."[206] Dreyfus' critique of artificial intelligence: * Dreyfus 1972, Dreyfus & Dreyfus 1986 * Crevier 1993, pp. 120–132 * McCorduck 2004, pp. 211–239 * Russell & Norvig 2003, pp. 950–952, Gödel 1951: in this lecture, Kurt Gödel uses the incompleteness theorem to arrive at the following disjunction: (a) the human mind is not a consistent finite machine, or (b) there exist Diophantine equations for which it cannot decide whether solutions exist. Gödel finds (b) implausible, and thus seems to have believed the human mind was not equivalent to a finite machine, i.e., its power exceeded that of any finite machine. He recognized that this was only a conjecture, since one could never disprove (b). Yet he considered the disjunctive conclusion to be a "certain fact". The Mathematical Objection: * Russell & Norvig 2003, p. 949 * McCorduck 2004, pp. 448–449 Making the Mathematical Objection: * Lucas 1961 * Penrose 1989 Refuting Mathematical Objection: * Turing 1950 under "(2) The Mathematical Objection" * Hofstadter 1979 Background: * Gödel 1931, Church 1936, Kleene 1935, Turing 1937 Graham Oppy (20 January 2015). "Gödel's Incompleteness Theorems". Stanford Encyclopedia of Philosophy. Archived from the original on 22 April 2016. Retrieved 27 April 2016. These Gödelian anti-mechanist arguments are, however, problematic, and there is wide consensus that they fail. Stuart J. Russell; Peter Norvig (2010). "26.1.2: Philosophical Foundations/Weak AI: Can Machines Act Intelligently?/The mathematical objection". Artificial Intelligence: A Modern Approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall. ISBN 978-0-13-604259-4. even if we grant that computers have limitations on what they can prove, there is no evidence that humans are immune from those limitations. Mark Colyvan. An introduction to the philosophy of mathematics. Cambridge University Press, 2012. From 2.2.2, 'Philosophical significance of Gödel's incompleteness results': "The accepted wisdom (with which I concur) is that the Lucas-Penrose arguments fail." Iphofen, Ron; Kritikos, Mihalis (3 January 2019). "Regulating artificial intelligence and robotics: ethics by design in a digital society". Contemporary Social Science: 1–15. doi:10.1080/21582041.2018.1563803. ISSN 2158-2041. "Ethical AI Learns Human Rights Framework". Voice of America. Archived from the original on 11 November 2019. Retrieved 10 November 2019. Crevier 1993, pp. 132–144. In the early 1970s, Kenneth Colby presented a version of Weizenbaum's ELIZA known as DOCTOR which he promoted as a serious therapeutic tool.[216] Joseph Weizenbaum's critique of AI: * Weizenbaum 1976 * Crevier 1993, pp. 132–144 * McCorduck 2004, pp. 356–373 * Russell & Norvig 2003, p. 961 Weizenbaum (the AI researcher who developed the first chatterbot program, ELIZA) argued in 1976 that the misuse of artificial intelligence has the potential to devalue human life. Wendell Wallach (2010). Moral Machines, Oxford University Press. Wallach, pp 37–54. Wallach, pp 55–73. Wallach, Introduction chapter. Michael Anderson and Susan Leigh Anderson (2011), Machine Ethics, Cambridge University Press. "Machine Ethics". aaai.org. Archived from the original on 29 November 2014. Rubin, Charles (Spring 2003). "Artificial Intelligence and Human Nature". The New Atlantis. 1: 88–100. Archived from the original on 11 June 2012. Brooks, Rodney (10 November 2014). "artificial intelligence is a tool, not a threat". Archived from the original on 12 November 2014. "Stephen Hawking, Elon Musk, and Bill Gates Warn About Artificial Intelligence". Observer. 19 August 2015. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Chalmers, David (1995). "Facing up to the problem of consciousness". Journal of Consciousness Studies. 2 (3): 200–219. Archived from the original on 8 March 2005. Retrieved 11 October 2018. See also this link Archived 8 April 2011 at the Wayback Machine Horst, Steven, (2005) "The Computational Theory of Mind" Archived 11 September 2018 at the Wayback Machine in The Stanford Encyclopedia of Philosophy Searle 1980, p. 1. This version is from Searle (1999), and is also quoted in Dennett 1991, p. 435. Searle's original formulation was "The appropriately programmed computer really is a mind, in the sense that computers given the right programs can be literally said to understand and have other cognitive states." [230] Strong AI is defined similarly by Russell & Norvig (2003, p. 947): "The assertion that machines could possibly act intelligently
ananya2001gupta
Identify the software project, create business case, arrive at a problem statement. REQUIREMENT: Window XP, Internet, MS Office, etc. Problem Description: - 1. Introduction of AI and Machine Learning: - Artificial Intelligence applies machine learning, deep learning and other techniques to solve actual problems. Artificial intelligence (AI) brings the genuine human-to-machine interaction. Simply, Machine Learning is the algorithm that give computers the ability to learn from data and then make decisions and predictions, AI refers to idea where machines can execute tasks smartly. It is a faster process in learning the risk factors, and profitable opportunities. They have a feature of learning from their mistakes and experiences. When Machine learning is combined with Artificial Intelligence, it can be a large field to gather an immense amount of information and then rectify the errors and learn from further experiences, developing in a smarter, faster and accuracy handling technique. The main difference between Machine Learning and Artificial Intelligence is , If it is written in python then it is probably machine learning, If it is written in power point then it is artificial intelligence. As there are many existing projects that are implemented using AI and Machine Learning , And one of the project i.e., Bitcoin Price Prediction :- Bitcoin (₿ ) (founder - Satoshi Nakamoto , Ledger start: 3 January 2009 ) is a digital currency, a type of electronic money. It is decentralized advanced cash without a national bank or single chairman that can be sent from client to client on the shared Bitcoin arrange without middle people's requirement. Machine learning models can likely give us the insight we need to learn about the future of Cryptocurrency. It will not tell us the future but it might tell us the general trend and direction to expect the prices to move. These machine learning models predict the future of Bitcoin by coding them out in Python. Machine learning and AI-assisted trading have attracted growing interest for the past few years. this approach is to test the hypothesis that the inefficiency of the cryptocurrency market can be exploited to generate abnormal profits. the application of machine learning algorithms to the cryptocurrency market has been limited so far to the analysis of Bitcoin prices, using random forests , Bayesian neural network , long short-term memory neural network , and other algorithms. 2. Applications/Scope of AI and Machine Learning :- a) Sentiment Analysis :- It is the classification of subjective opinions or emotions (positive, negative, and neutral) within text data using natural language processing. b) It is Characterized as a use of computerized reasoning where accessible data is utilized through calculations to process or help the handling of factual information. BITCOIN PRICE PREDICTION USING AI AND MACHINE LEARNING: - The main aim of this is to find the actual Bitcoin price in US dollars can be predicted. The chance to make a model equipped for anticipating digital currencies fundamentally Bitcoin. # It works the prediction by taking the coinMarkup cap. # CoinMarketCap provides with historical data for Bitcoin price changes, keep a record of all the transactions by recording the amount of coins in circulation and the volume of coins traded in the last 24-hours. # Quandl is used to filter the dataset by using the MAT Lab properties. 3. Problem statement: - Some AI and Machine Learning problem statements are: - a) Data Privacy and Security: Once a company has dug up the data, privacy and security is eye-catching aspect that needs to be taken care of. b) Data Scarcity: The data is a very important aspect of AI, and labeled data is used to train machines to learn and make predictions. c) Data acquisition: In the process of machine learning, a large amount of data is used in the process of training and learning. d) High error susceptibility: In the process of artificial intelligence and machine learning, the high amount of data is used. Some problem statements of Bitcoin Price Prediction using AI and Machine Learning: - a) Experimental Phase Risk: It is less experimental than other counterparts. In addition, relative to traditional assets, its level can be assessed as high because this asset is not intended for conservative investors. b) Technology Risks: There is a technological risk to other cryptocurrencies in the form of the potential appearance of a more advanced cryptocurrency. Investors may simply not notice the moment when their virtual assets lose their real value. c) Price Variability: The variability of the value of cryptocurrency are the large volumes of exchange trading, the integration of Bitcoin with various companies, legislative initiatives of regulatory bodies and many other, sometimes disregarded phenomena. d) Consumer Protection: The property of the irreversibility of transactions in itself has little effect on the risks of investing in Bitcoin as an asset. e) Price Fluctuation Prediction: Since many investors care more about whether the sudden rise or fall is worth following. Bitcoin price often fluctuates by more than 10% (or even more than 30%) at some times. f) Lacks Government Regulation: Regulators in traditional financial markets are basically missing in the field of cryptocurrencies. For instance, fake news frequently affects the decisions of individual investors. g) It is difficult to use large interval data (e.g., day-level, and month-level data) . h) The change time of mining difficulties is much longer. Moreover, do not consider the news information since it is hard to determine the authenticity of a news or predict the occurrence of emergencies.
Rushikesh8983
Language Translation In this project, you’re going to take a peek into the realm of neural network machine translation. You’ll be training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French. Get the Data Since translating the whole language of English to French will take lots of time to train, we have provided you with a small portion of the English corpus. """ DON'T MODIFY ANYTHING IN THIS CELL """ import helper import problem_unittests as tests source_path = 'data/small_vocab_en' target_path = 'data/small_vocab_fr' source_text = helper.load_data(source_path) target_text = helper.load_data(target_path) Explore the Data Play around with view_sentence_range to view different parts of the data. view_sentence_range = (0, 10) """ DON'T MODIFY ANYTHING IN THIS CELL """ import numpy as np print('Dataset Stats') print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()}))) sentences = source_text.split('\n') word_counts = [len(sentence.split()) for sentence in sentences] print('Number of sentences: {}'.format(len(sentences))) print('Average number of words in a sentence: {}'.format(np.average(word_counts))) print() print('English sentences {} to {}:'.format(*view_sentence_range)) print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]])) print() print('French sentences {} to {}:'.format(*view_sentence_range)) print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]])) Dataset Stats Roughly the number of unique words: 227 Number of sentences: 137861 Average number of words in a sentence: 13.225277634719028 English sentences 0 to 10: new jersey is sometimes quiet during autumn , and it is snowy in april . the united states is usually chilly during july , and it is usually freezing in november . california is usually quiet during march , and it is usually hot in june . the united states is sometimes mild during june , and it is cold in september . your least liked fruit is the grape , but my least liked is the apple . his favorite fruit is the orange , but my favorite is the grape . paris is relaxing during december , but it is usually chilly in july . new jersey is busy during spring , and it is never hot in march . our least liked fruit is the lemon , but my least liked is the grape . the united states is sometimes busy during january , and it is sometimes warm in november . French sentences 0 to 10: new jersey est parfois calme pendant l' automne , et il est neigeux en avril . les états-unis est généralement froid en juillet , et il gèle habituellement en novembre . california est généralement calme en mars , et il est généralement chaud en juin . les états-unis est parfois légère en juin , et il fait froid en septembre . votre moins aimé fruit est le raisin , mais mon moins aimé est la pomme . son fruit préféré est l'orange , mais mon préféré est le raisin . paris est relaxant en décembre , mais il est généralement froid en juillet . new jersey est occupé au printemps , et il est jamais chaude en mars . notre fruit est moins aimé le citron , mais mon moins aimé est le raisin . les états-unis est parfois occupé en janvier , et il est parfois chaud en novembre . Implement Preprocessing Function Text to Word Ids As you did with other RNNs, you must turn the text into a number so the computer can understand it. In the function text_to_ids(), you'll turn source_text and target_text from words to ids. However, you need to add the <EOS> word id at the end of target_text. This will help the neural network predict when the sentence should end. You can get the <EOS> word id by doing: target_vocab_to_int['<EOS>'] You can get other word ids using source_vocab_to_int and target_vocab_to_int. def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int): """ Convert source and target text to proper word ids :param source_text: String that contains all the source text. :param target_text: String that contains all the target text. :param source_vocab_to_int: Dictionary to go from the source words to an id :param target_vocab_to_int: Dictionary to go from the target words to an id :return: A tuple of lists (source_id_text, target_id_text) """ # TODO: Implement Function source_id_text = [[source_vocab_to_int[word] for word in sentence.split()] \ for sentence in source_text.split('\n')] target_id_text = [[target_vocab_to_int[word] for word in sentence.split()] + [target_vocab_to_int['<EOS>']] \ for sentence in target_text.split('\n')] return source_id_text, target_id_text """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_text_to_ids(text_to_ids) Tests Passed Preprocess all the data and save it Running the code cell below will preprocess all the data and save it to file. """ DON'T MODIFY ANYTHING IN THIS CELL """ helper.preprocess_and_save_data(source_path, target_path, text_to_ids) Check Point This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk. import problem_unittests as tests """ DON'T MODIFY ANYTHING IN THIS CELL """ import numpy as np import helper (source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess() Check the Version of TensorFlow and Access to GPU This will check to make sure you have the correct version of TensorFlow and access to a GPU """ DON'T MODIFY ANYTHING IN THIS CELL """ from distutils.version import LooseVersion import warnings import tensorflow as tf from tensorflow.python.layers.core import Dense # Check TensorFlow Version assert LooseVersion(tf.__version__) >= LooseVersion('1.1'), 'Please use TensorFlow version 1.1 or newer' print('TensorFlow Version: {}'.format(tf.__version__)) # Check for a GPU if not tf.test.gpu_device_name(): warnings.warn('No GPU found. Please use a GPU to train your neural network.') else: print('Default GPU Device: {}'.format(tf.test.gpu_device_name())) TensorFlow Version: 1.1.0 Default GPU Device: /gpu:0 Build the Neural Network You'll build the components necessary to build a Sequence-to-Sequence model by implementing the following functions below: model_inputs process_decoder_input encoding_layer decoding_layer_train decoding_layer_infer decoding_layer seq2seq_model Input Implement the model_inputs() function to create TF Placeholders for the Neural Network. It should create the following placeholders: Input text placeholder named "input" using the TF Placeholder name parameter with rank 2. Targets placeholder with rank 2. Learning rate placeholder with rank 0. Keep probability placeholder named "keep_prob" using the TF Placeholder name parameter with rank 0. Target sequence length placeholder named "target_sequence_length" with rank 1 Max target sequence length tensor named "max_target_len" getting its value from applying tf.reduce_max on the target_sequence_length placeholder. Rank 0. Source sequence length placeholder named "source_sequence_length" with rank 1 Return the placeholders in the following the tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length) def model_inputs(): """ Create TF Placeholders for input, targets, learning rate, and lengths of source and target sequences. :return: Tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length) """ # TODO: Implement Function inputs = tf.placeholder(tf.int32, [None, None], 'input') targets = tf.placeholder(tf.int32, [None, None]) learning_rate = tf.placeholder(tf.float32, []) keep_prob = tf.placeholder(tf.float32, [], 'keep_prob') target_sequence_length = tf.placeholder(tf.int32, [None], 'target_sequence_length') max_target_len = tf.reduce_max(target_sequence_length) source_sequence_length = tf.placeholder(tf.int32, [None], 'source_sequence_length') return inputs, targets, learning_rate, keep_prob, target_sequence_length, max_target_len, source_sequence_length """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_model_inputs(model_inputs) Tests Passed Process Decoder Input Implement process_decoder_input by removing the last word id from each batch in target_data and concat the GO ID to the begining of each batch. def process_decoder_input(target_data, target_vocab_to_int, batch_size): """ Preprocess target data for encoding :param target_data: Target Placehoder :param target_vocab_to_int: Dictionary to go from the target words to an id :param batch_size: Batch Size :return: Preprocessed target data """ # TODO: Implement Function go = tf.constant([[target_vocab_to_int['<GO>']]]*batch_size) # end = tf.slice(target_data, [0, 0], [-1, batch_size]) end = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1]) return tf.concat([go, end], 1) """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_process_encoding_input(process_decoder_input) Tests Passed Encoding Implement encoding_layer() to create a Encoder RNN layer: Embed the encoder input using tf.contrib.layers.embed_sequence Construct a stacked tf.contrib.rnn.LSTMCell wrapped in a tf.contrib.rnn.DropoutWrapper Pass cell and embedded input to tf.nn.dynamic_rnn() from imp import reload reload(tests) def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, source_sequence_length, source_vocab_size, encoding_embedding_size): """ Create encoding layer :param rnn_inputs: Inputs for the RNN :param rnn_size: RNN Size :param num_layers: Number of layers :param keep_prob: Dropout keep probability :param source_sequence_length: a list of the lengths of each sequence in the batch :param source_vocab_size: vocabulary size of source data :param encoding_embedding_size: embedding size of source data :return: tuple (RNN output, RNN state) """ # TODO: Implement Function embed = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size) def lstm_cell(): lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size) return tf.contrib.rnn.DropoutWrapper(lstm, keep_prob) stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(num_layers)]) # initial_state = stacked_lstm.zero_state(source_sequence_length, tf.float32) return tf.nn.dynamic_rnn(stacked_lstm, embed, source_sequence_length, dtype=tf.float32) # initial_state=initial_state) """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_encoding_layer(encoding_layer) Tests Passed Decoding - Training Create a training decoding layer: Create a tf.contrib.seq2seq.TrainingHelper Create a tf.contrib.seq2seq.BasicDecoder Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_summary_length, output_layer, keep_prob): """ Create a decoding layer for training :param encoder_state: Encoder State :param dec_cell: Decoder RNN Cell :param dec_embed_input: Decoder embedded input :param target_sequence_length: The lengths of each sequence in the target batch :param max_summary_length: The length of the longest sequence in the batch :param output_layer: Function to apply the output layer :param keep_prob: Dropout keep probability :return: BasicDecoderOutput containing training logits and sample_id """ # TODO: Implement Function helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length) decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, helper, encoder_state, output_layer) dec_train_logits, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_summary_length) # for tensorflow 1.2: # dec_train_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_summary_length) return dec_train_logits # keep_prob/dropout not used? """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_decoding_layer_train(decoding_layer_train) Tests Passed Decoding - Inference Create inference decoder: Create a tf.contrib.seq2seq.GreedyEmbeddingHelper Create a tf.contrib.seq2seq.BasicDecoder Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob): """ Create a decoding layer for inference :param encoder_state: Encoder state :param dec_cell: Decoder RNN Cell :param dec_embeddings: Decoder embeddings :param start_of_sequence_id: GO ID :param end_of_sequence_id: EOS Id :param max_target_sequence_length: Maximum length of target sequences :param vocab_size: Size of decoder/target vocabulary :param decoding_scope: TenorFlow Variable Scope for decoding :param output_layer: Function to apply the output layer :param batch_size: Batch size :param keep_prob: Dropout keep probability :return: BasicDecoderOutput containing inference logits and sample_id """ # TODO: Implement Function start_tokens = tf.constant([start_of_sequence_id]*batch_size) helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, start_tokens, end_of_sequence_id) decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, helper, encoder_state, output_layer) dec_infer_logits, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_target_sequence_length) # for tensorflow 1.2: # dec_infer_logits, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=max_target_sequence_length) return dec_infer_logits """ DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE """ tests.test_decoding_layer_infer(decoding_layer_infer)
ajaybhatiya1234
 Read the technical deep dive: https://www.dessa.com/post/deepfake-detection-that-actually-works # Visual DeepFake Detection In our recent [article](https://www.dessa.com/post/deepfake-detection-that-actually-works), we make the following contributions: * We show that the model proposed in current state of the art in video manipulation (FaceForensics++) does not generalize to real-life videos randomly collected from Youtube. * We show the need for the detector to be constantly updated with real-world data, and propose an initial solution in hopes of solving deepfake video detection. Our Pytorch implementation, conducts extensive experiments to demonstrate that the datasets produced by Google and detailed in the FaceForensics++ paper are not sufficient for making neural networks generalize to detect real-life face manipulation techniques. It also provides a current solution for such behavior which relies on adding more data. Our Pytorch model is based on a pre-trained ResNet18 on Imagenet, that we finetune to solve the deepfake detection problem. We also conduct large scale experiments using Dessa's open source scheduler + experiment manger [Atlas](https://github.com/dessa-research/atlas). ## Setup ## Prerequisities To run the code, your system should meet the following requirements: RAM >= 32GB , GPUs >=1 ## Steps 0. Install [nvidia-docker](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)) 00. Install [ffmpeg](https://www.ffmpeg.org/download.html) or `sudo apt install ffmpeg` 1. Git Clone this repository. 2. If you haven't already, install [Atlas](https://github.com/dessa-research/atlas). 3. Once you've installed Atlas, activate your environment if you haven't already, and navigate to your project folder. That's it, You're ready to go! ## Datasets Half of the dataset used in this project is from the [FaceForensics](https://github.com/ondyari/FaceForensics/tree/master/dataset) deepfake detection dataset. . To download this data, please make sure to fill out the [google form](https://github.com/ondyari/FaceForensics/#access) to request access to the data. For the dataset that we collected from Youtube, it is accessible on [S3](ttps://deepfake-detection.s3.amazonaws.com/augment_deepfake.tar.gz) for download. To automatically download and restructure both datasets, please execute: ``` bash restructure_data.sh faceforensics_download.py ``` Note: You need to have received the download script from FaceForensics++ people before executing the restructure script. Note2: We created the `restructure_data.sh` to do a split that replicates our exact experiments avaiable in the UI above, please feel free to change the splits as you wish. ## Walkthrough Before starting to train/evaluate models, we should first create the docker image that we will be running our experiments with. To do so, we already prepared a dockerfile to do that inside `custom_docker_image`. To create the docker image, execute the following commands in terminal: ``` cd custom_docker_image nvidia-docker build . -t atlas_ff ``` Note: if you change the image name, please make sure you also modify line 16 of `job.config.yaml` to match the docker image name. Inside `job.config.yaml`, please modify the data path on host from `/media/biggie2/FaceForensics/datasets/` to the absolute path of your `datasets` folder. The folder containing your datasets should have the following structure: ``` datasets ├── augment_deepfake (2) │ ├── fake │ │ └── frames │ ├── real │ │ └── frames │ └── val │ ├── fake │ └── real ├── base_deepfake (1) │ ├── fake │ │ └── frames │ ├── real │ │ └── frames │ └── val │ ├── fake │ └── real ├── both_deepfake (3) │ ├── fake │ │ └── frames │ ├── real │ │ └── frames │ └── val │ ├── fake │ └── real ├── precomputed (4) └── T_deepfake (0) ├── manipulated_sequences │ ├── DeepFakeDetection │ ├── Deepfakes │ ├── Face2Face │ ├── FaceSwap │ └── NeuralTextures └── original_sequences ├── actors └── youtube ``` Notes: * (0) is the dataset downloaded using the FaceForensics repo scripts * (1) is a reshaped version of FaceForensics data to match the expected structure by the codebase. subfolders called `frames` contain frames collected using `ffmpeg` * (2) is the augmented dataset, collected from youtube, available on s3. * (3) is the combination of both base and augmented datasets. * (4) precomputed will be automatically created during training. It holds cashed cropped frames. Then, to run all the experiments we will show in the article to come, you can launch the script `hparams_search.py` using: ```bash python hparams_search.py ``` ## Results In the following pictures, the title for each subplot is in the form `real_prob, fake_prob | prediction | label`. #### Model trained on FaceForensics++ dataset For models trained on the paper dataset alone, we notice that the model only learns to detect the manipulation techniques mentioned in the paper and misses all the manipulations in real world data (from data)   #### Model trained on Youtube dataset Models trained on the youtube data alone learn to detect real world deepfakes, but also learn to detect easy deepfakes in the paper dataset as well. These models however fail to detect any other type of manipulation (such as NeuralTextures).   #### Model trained on Paper + Youtube dataset Finally, models trained on the combination of both datasets together, learns to detect both real world manipulation techniques as well as the other methods mentioned in FaceForensics++ paper.   for a more in depth explanation of these results, please refer to the [article](https://www.dessa.com/post/deepfake-detection-that-actually-works) we published. More results can be seen in the [interactive UI](http://deepfake-detection.dessa.com/projects) ## Help improve this technology Please feel free to fork this work and keep pushing on it. If you also want to help improving the deepfake detection datasets, please share your real/forged samples at foundations@dessa.com. ## LICENSE © 2020 Square, Inc. ATLAS, DESSA, the Dessa Logo, and others are trademarks of Square, Inc. All third party names and trademarks are properties of their respective owners and are used for identification purposes only.
*****PROJECT SPECIFICATION: Machine Learning Capstone Analysis Project***** This capstone project involves machine learning modeling and analysis of clinical, demographic, and brain related derived anatomic measures from human MRI (magnetic resonance imaging) tests (http://www.oasis-brains.org/). The objectives of these measurements are to diagnose the level of Dementia in the individuals and the probability that these individuals may have Alzheimer's Disease (AD). In published studies, Machine Learning has been applied to Alzheimer’s/Dementia identification from MRI scans and related data in the academic papers/theses in References 10 and 11 listed in the References Section below. Recently, a close relative of mine had to undergo a sequence of MRI tests for cognition difficulties.The motivation for choosing this topic for the Capstone project arose from the desire to understand and analyze potential for Dementia and AD from MRI related data. Cognitive testing, clinical assessments and demographic data related to these MRI tests are used in this project. This Capstone project does not use the MRI "imaging" data and does not focus on AD, focusses only on Dementia. *****Conclusions, Justification, and Reflections***** [Student adequately summarizes the end-to-end problem solution and discusses one or two particular aspects of the project they found interesting or difficult.] The formulation of OASIS data (Ref 1 and 2) in terms of a dementia classification problem based on demographic and clinical data only (and without directly using the MRI image data), is a simplification that has major advantages and appeal. This means the trained model can classify whether an individual has dementia or not with about 87% accuracy, without having to wait for radiological interpretation of MRI scans. This can provide an early alert for intervention and initiation of treatment for those with onset of dementia. The assumption that the combined cross-sectional and longitudinal datasets would lead to dementia label classification of acceptable accuracy came out to be true. The method required careful data cleaning and data preparation work, converting it to a binary classification problem, as outlined in this notebook. At the outset it was not clear which algorithm(s) would be more appropriate for the binary and multi-label classification problem. The approach of spot checking the algorithms early for accuracy led to the determination of a smaller set of algorithms with higher accuracy (e.g. Gadient Boosting and Random Forest) for a deeper dive examination, e.g. use of a k-fold cross-validation approach in classifying the CDR label. The neural network benchmark model accuracy of 78% for binary classification was exceeded by the classification accuracy of the main output of this study, the trained Gradient Boosting and Random Forest classification models. This builds confidence in the latter model for further training with new data and further classification use for new patients.
In this project, we take into account different approaches like Eigenfaces, Principal Component Analysis(PCA), Support Vector Machines(SVM), Artificial Neural Networks(ANN), Convolutional Neural Networks(CNN), K-Nearest Neighbour(KNN) for the problem of face recognition and compare all the approaches on the basis of different performance metrics such as Accuracy, Number of Iterations and Error Rate to see which technique is more feasible in real life. Then after we also recognize emotions in a face using the Support Vector Machines(SVM) and the Convolutional Neural Networks(CNN). The approaches we have considered treats Face Recognition and Emotions Recognition problem as two-dimensional recognition problem, the advantage being faces can be described by a small set of 2-D characteristics views. The dataset used is collected from the whole class where each student was asked to upload their selfies in six different emotions.
Sahrawat1
Abstract Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications of this task include machine translation, summarization, text generation, question answering, short answer grading, semantic search, dialogue and conversational systems. We developed Support Vector Regression model with various features including the similarity scores calculated using alignment-based methods and semantic composition based methods. We have also trained sentence semantic representations with BiLSTM and Convolutional Neural Networks (CNN). The correlations between our system output the human ratings were above 0.8 in the test dataset. Introduction The goal of this task is to measure semantic textual similarity between a given pair of sentences (what they mean rather than whether they look similar syntactically). While making such an assessment is trivial for humans, constructing algorithms and computational models that mimic human level performance represents a difficult and deep natural language understanding (NLU) problem. Example 1: English: Birdie is washing itself in the water basin. English Paraphrase: The bird is bathing in the sink. Similarity Score: 5 ( The two sentences are completely equivalent, as they mean the same thing.) Example 2: English: The young lady enjoys listening to the guitar. English Paraphrase: The woman is playing the violin. Similarity Score: 1 ( The two sentences are not equivalent, but are on the same topic. ) Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text. STS differs from both textual entailment and paraphrase detection in that it captures gradations of meaning overlap rather than making binary classifications of particular relationships. While semantic relatedness expresses a graded semantic relationship as well, it is non-specific about the nature of the relationship with contradictory material still being a candidate for a high score (e.g., “night” and “day” are highly related but not particularly similar). The task involves producing real-valued similarity scores for sentence pairs. Performance is measured by the Pearson correlation of machine scores with human judgments.
muhammadsohaib60
Our project is based on one of the most important application of machine learning i.e. pattern recognition. Optical character recognition or optical character reader is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image. We are working on developing an OCR for URDU. We studied a couple of research papers related to our project. So far, we have found that Both Arabic and Urdu are written in Perso-Arabic script; at the written level, therefore, they share similarities. The styles of Arabic and Persian writing have a heavy influence on the Urdu script. There are 6 major styles for writing Arabic, Persian and Pashto as well. Urdu is written in Naskh writing style which is most famous of all. Optical character recognition (OCR) is the process of converting an image of text, such as a scanned paper document or electronic fax file, into computer-editable text [1]. The text in an image is not editable: the letters are made of tiny dots (pixels) that together form a picture of text. During OCR, the software analyzes an image and converts the pictures of the characters to editable text based on the patterns of the pixels in the image. After OCR, the converted text can be exported and used with a variety of word-processing, page layout and spreadsheet applications [2]. One of the main aims of OCR is to emulate the human ability to read at a much faster rate by associating symbolic identities with images of characters. Its potential applications include Screen Readers, Refreshable Braille Displays [3], reading customer filled forms, reading postal address off envelops, archiving and retrieving text etc. OCR’s ultimate goal is to develop a communication interface between the computer and its potential users. Urdu is the national language of Pakistan. It is a language that is understood by over 300 million people belonging to Pakistan, India and Bangladesh. Due to its historical database of literature, there is definitely a need to devise automatic systems for conversion of this literature into electronic form that may be accessible on the worldwide web. Although much work has been done in the field of OCR, Urdu and other languages using the Arabic script like Farsi, Urdu and Arabic, have received least attention. This is due in part to a lack of interest in the field and in part to the intricacies of the Arabic script. Owing to this state of indifference, there remains a huge amount of Urdu and Arabic literature unattended and rotting away on some old shelves. The proposed research aims to develop workable solutions to many of the problems faced in realization of an OCR designed specifically for Urdu Noori Nastaleeq Script, which is widely used in Urdu newspapers, governmental documents and books. The underlying processes first isolate and classify ligatures based on certain carefully chosen special, contour and statistical features and eventually recognize them with the aid of Feed-Forward Back Propagation Neural Networks. The input to the system is a monochrome bitmap image file of Urdu text written in Noori Nastaleeq and the output is the equivalent text converted to an editable text file.
Tirth8038
The main aim of the project is to scan the X-rays of human lungs and classify them into 3 given categories like healthy patients, patients with pre-existing conditions, and serious patients who need immediate attention using Convolutional Neural Network. The provided dataset of Grayscale Human Lungs X-ray is in the form of a numpy array and has dimensions of (13260, 64, 64, 1). Similarly, the corresponding labels of X-ray images are of size (13260, 2) with classes (0) if the patient is healthy, (1) if patient has pre-existing conditions or (2) if patient has Effusion/Mass in the lungs. During data exploration, I found that the class labels are highly imbalanced. Thus, for handling such imbalanced class labels, I used Data augmentation techniques such as horizontal & vertical flips, rotation, altering brightness and height & width shift to increase the number of training images to prevent overfitting problem. After preprocessing the data, the dimension of the dataset is (31574, 64, 64, 1). For Model Selection, I built 4 architectures of CNN Model similar to the architecture of LeNet-5, VGGNet, AlexNet with various Conv2D layers followed by MaxPooling2D layers and fitted them with different epochs, batch size and different optimizer learning rate. Moreover, I also built a custom architecture with comparatively less complex structure than previous models. Further to avoid Overfitting, I also tried regularizing Kernel layer and Dense layer using Absolute Weight Regularizer(L1) and to restrict the bias in classification, I used Bias Regularizer in the Dense layer. In addition to this, I also tried applying Dropout with a 20% dropout rate during training and Early Stopping method for preventing overfitting and evaluated that Early Stopping gave better results than Dropout. For evaluation of models, I split the dataset into training,testing and validation split with (60,20,20) ratio and calculated Macro F1 Score , AUC Score on test data and using the Confusion Matrix, I calculated the accuracy by dividing the sum of diagonal elements by sum of all elements. In addition to this, I plotted training vs. validation loss and accuracy graphs to visualize the performance of models. Interestingly, the CNN model similar to VGGNet with 5 Conv2D and 3 MaxPooling layers and 2 Dense layers performed better than other architecture with Macro F1 score of 0.773 , AUC score of 0.911 and accuracy of 0.777.
Business Problem: Dataset of a bank with 10,000 customers measured lots of attributes of the customer and is seeing unusual churn rates at a high rate. Want to understand what the problem is, address the problem, and give them insights. 10,000 is a sample, millions of customer across Europe. Took a sample of 10,000 measured six months ago lots of factors (name, credit score, grography, age, tenure, balance, numOfProducts, credit card, active member, estimated salary, exited, etc.). For these 10,000 randomly selected customers and track which stayed or left. Goal: create a geographic segmentation model to tell which of the customers are at highest risk of leaving. Valuable to any customer-oriented organisations. Geographic Segmentation Modeling can be applied to millions of scenarios, very valuable. (doesn't have to be for banks, churn rate, etc.). Same scenario works for (e.g. should this person get a loan or not? Should this be approved for credit => binary outcome, model, more likely to be reliable). Fradulant transactions (which is more likely to be fradulant) Binary outcome with lots of independent variables you can build a proper robust model to tell you which factors influence the outcome. alt text Problem: Classification problem with lots of independent variables (credit score, balance, number of products) and based on these variables we're predicting which of these customers will leave the bank. Artificial Neural Networks can do a terrific job with Classification problems and making those kind of predictions. Libraries used: Theano numerical computation library, very efficient for fast numerical computations based on Numpy syntax GPU is much more powerful than CPU, as there are many more cores and run more floating points calculations per second GPU is much more specialized for highly intensive computing tasks and parallel computations, exactly for the case for neural networks When we're forward propogating the activations of the different neurons in the neural network thanks to the activation function well that involves parallel computations When errors are backpropagated to the neural networks that again involves parallel computation GPU is a much better choice for deep neural network than CPU - simple neural networks, CPU is sufficient Created by Machine Learning group at the Univeristy of Montreal Tensorflow Another numerical computation library that runs very fast computations that can run on your CPU or GPU Google Brain, Apache 2.0 license Theano & Tensorflow are used primarily for research and development in the deep learning field Deep Learning neural network from scratch, use the above Great for inventing new deep learning neural networks, deep learning models, lots of line of code Keras Wrapper for Theano + Tensorflow Amazing library to build deep neural networks in a few lines of code Very powerful deep neural networks in few lines of code based on Theano and Tensorflow Sci-kit Learn (Machine Learning models), Keras (Deep Learning models) Installing Theano, Tensorflow in three steps with Anaconda installed: $ pip install theano $ pip install tensorflow $ pip install keras $ conda update --all
Cables are essential components of any device in the electronic waste sector. For several years disassembly processes in this sector got automated step by step. However, most of these automation’s were device-specific and not usable on a wide range 1 . An advantage came about as neural networks improved in image recognition. The recent improvements marked the starting point for many research groups to focus on disassembly lines that can handle multiple devices. This bachelor thesis aims to address a fundamental problem in such disassembly processes: Cable-Detection . Consider the task of disassembling a DVD-Player which still includes usable components. Cables connect most of these components for information and energy exchange. Therefore one main goal before removing individual components is to cut the cables between the components to ensure their safe removal. Inevitably this task needs a precise knowledge about the position of every single cable. For a disassembly robot, a strategy that is worth pursuing is to locate the cables process- ing an RGB image. An RGB image is a lightweight solution with high standards regarding the improvement in camera technology over the last year through mobile devices. This Bachelor thesis presents a model for cable detection. A state-of-the-art instance segmentation model Mask R-CNN 2 , published in 2018, is trained. The presented work is public on GitHub to make this global concerning topic available for the public. Keywords: Deep Convolutional Neuronal Networks, Mask R-CNN, Cable-Detection, Disassembly, E-Waste
maritimebear
Physics-informed neural network for 2-D Poisson problems.
janeminmin
1> Background information Bluebikes is Metro Boston’s public bike share program, with more than 1800 bikes at over 200 stations across Boston and nearby areas. The bikes sharing program launched in 2011. The program aimed for individuals to use it for short-term basis for a price. It allows individuals to borrow a bike from a dock station after using it, which makes it ideal for one-way trips. The City of Boston is committed to providing bike share as a part of the public transportation system. However, to build a transport system that encourages bicycling, it is important to build knowledge about the current bicycle flows, and what factors are involved in the decision-making of potential bicyclists when choosing whether to use the bicycle. It is logical to make hypotheses that age and gender, bicycle infrastructure, safety perception are possible determinants of bicycling. On the short-term perspective, it has been shown that weather plays an important role whether to choose the bicycle. 2> Data collection The Bluebikes collects and provides system data to the public. The datasets used in the project can be download through this link (https://www.bluebikes.com/system-data). Based on this time series dataset (start from 2017-01-01 00:00:00 to 2019-03-31 23:00:00), we could have the information includes: Trip duration, start time and data, stop time and data, start station name and id, end station name and id, bike id, user type (casual or subscribed), birth year, gender. Besides, any trips that were below 60 seconds in length is considered as potentially false starts, which is already removed in the datasets. The number of bicycles used during a particular time period, varies over time based on several factors, including the current weather conditions, time of the day, time of the year and the current interest of the biker to use the bicycle as a transport mode. The current interest is different between subscribed users and casual users, so we should analyze them separately. Factors such as season, day of a week, month, hour, and if a holiday can be extracted from the date and time column in the datasets. Since we would analyze the hourly bicycle rental flow, we need hourly weather conditions data from 2017-01-01 00:00:00 to 2019-03-31 23:00:00 to complete our regression model of prediction. The weather data used in the project is scrapped using python selenium from Logan airport station (42.38 °N, 71.04 °W) webpage (https://www.wunderground.com/history/daily/us/ma/boston/KBOS/date/2019-7-15) maintained by weather underground website. The hourly weather observations include time, temperature, dew point, humidity, wind, wind speed, wind gust, pressure, precipitation, precipitation accumulated, condition. 3> The problem The aims of the project are to gain insight of the factors that could give short-term perspective of bicycle flows in Boston. It also aimed to investigate the how busy each station is, the division of bicycle trip direction and duration of the usage of a busy station and the mean flows variation within a day or during that period. The addition to the factors included in the regression model, there also exist other factors than influence how the bicycle flows vary over longer periods time. For example, general tendency to use the bicycle. Therefore, there is potential to improve the regression model accuracy by incorporating a long-term trend estimate taken over the time series of bicycle usage. Then the result from the machine learning algorithm-based regression model should be compared with the time series forecasting-based models. 4> Possible solutions Data preprocessing/Exploration and variable selection: date approximation manipulation, correlation analysis among variables, merging data, scrubbing for duplicate data, verifying errors, interpolation for missing values, handling outliers and skewness, binning low frequent levels, encoding categorical variables. Data visualization: split number of bike usage by subscribed/casual to build time series; build heatmap to present how busy is each station and locate the busiest station in the busiest period of a busy day; using boxplot and histogram to check outliers and determine appropriate data transformation, using weather condition text to build word cloud. Time series trend curve estimates: two possible way we considered are fitting polynomials of various degrees to the data points in the time series or by using time series decomposition functions and forecast functions to extract and forecast. We would emphasize on the importance to generate trend curve estimates that do not follow the seasonal variations: the seasonal variations should be captured explicitly by the input weather related variables in the regression model. Prediction/regression/time series forecasting: It is possible to build up multilayer perceptron neural network regressor to build up models and give prediction based on all variables of data, time and weather. However, considering the interpretability of model, we prefer to build regression models based on machine learning algorithms (like random forest or SVM) respectively for subscribed/casual users. Then the regressor would be combined with trend curve extracted and forecasted by ARIMA, and then comparing with the result of time series forecasting by STL (Seasonal and Trend decomposition using Loess) with multiple seasonal periods and the result of TBATS (Trigonometric Seasonal, Box-Cox Transformation, ARMA residuals, Trend and Seasonality).
rsundhar247
Text summarization is a very challenging problem that can only be truly realized by understanding meaning of the textual content. Recently, deep recurrent neural networks have been used in the sequence to sequence framework to achieve good results on summarization tasks. In this project we explore the success achieved by a stacked LSTM encoder – attention based decoder architecture on the English Gigaword dataset. We employ the usual best practices of sequence to sequence models with a complete end-to-end training, and have decent results to show on generating summaries of 10 words or less, for 2 line texts with a maximum of 30 words. We also explore on improvements in network training time, computation and refinement of summaries through various experiments.
nmerovingian
Using Physics-Informed Neural Network to solve convective-diffusion problem (both 1-D and 2-D) for rotating disk electrode(RDE). The edge effect of RDE is thus quantified.
The goal of this project is to label images of 10 handwritten digits of “zero”, “one”,...,“nine”. The images are 28 by 28 in size (MNIST dataset), which we will be represented as a vector x of dimension 784 by listing all the pixel values in raster scan order. The labels are 0,1,2,...,9 corresponding to 10 classes as written in the image. There are 3000 training cases, containing 300 examples of each of 10 classes. PROBLEM 1: Here you must read an input file. Each line contains 785 numbers (comma delimited): the first values are between 0.0 and 1.0 correspond to the 784 pixel values (black and white images), and the last number denotes the class label: 0 corresponds to digit 0, 1 corresponds to digit 1, etc. PROBLEM 2: Implement the backpropagation algorithm in a zero hidden layer neural network (weights between input and output nodes). The output layer should be a softmax output over 10 classes corresponding to 10 classes of handwritten digits (e.g. an architecture: 784 > 10). Your backprop code should minimize the cross-entropy entropy function for multi-class classification problem (categorical cross entropy). PROBLEM 3: Extend your code from problem 2 to support a single layer neural network with N hidden units (e.g. an architecture: 784 > 10 > 10). These hidden units should be using sigmoid activations. PROBLEM 4: Extend your code from problem 3 (use cross entropy error) and implement a 2-layer neural network, starting with a simple architecture containing N hidden units in each layer (e.g. with architecture: 784 > 10 > 10 > 10). These hidden units should be using sigmoid activations. Extend your code from problem 4 to implement different activations functions which will be passed as a parameter. In this problem all activations (except the final layer which should remain a softmax) must be changed to the passed activation function. PROBLEM 6: Extend your code from problem 5 to implement momentum with your gradient descent. The momentum value will be passed as a parameter. Your function should perform “epoch” number of epochs and return the resulting weights.
VasundharaSK
How to Develop a Deep CNN for Fashion-MNIST Clothing Classification by Jason Brownlee on May 10, 2019 in Deep Learning for Computer Vision Tweet Share Last Updated on October 3, 2019 The Fashion-MNIST clothing classification problem is a new standard dataset used in computer vision and deep learning. Although the dataset is relatively simple, it can be used as the basis for learning and practicing how to develop, evaluate, and use deep convolutional neural networks for image classification from scratch. This includes how to develop a robust test harness for estimating the performance of the model, how to explore improvements to the model, and how to save the model and later load it to make predictions on new data. In this tutorial, you will discover how to develop a convolutional neural network for clothing classification from scratch. After completing this tutorial, you will know: How to develop a test harness to develop a robust evaluation of a model and establish a baseline of performance for a classification task. How to explore extensions to a baseline model to improve learning and model capacity. How to develop a finalized model, evaluate the performance of the final model, and use it to make predictions on new images. Discover how to build models for photo classification, object detection, face recognition, and more in my new computer vision book, with 30 step-by-step tutorials and full source code. Let’s get started. Update Jun/2019: Fixed minor bug where the model was defined outside of the CV loop. Updated results (thanks Aditya). Updated Oct/2019: Updated for Keras 2.3 and TensorFlow 2.0. How to Develop a Deep Convolutional Neural Network From Scratch for Fashion MNIST Clothing Classification How to Develop a Deep Convolutional Neural Network From Scratch for Fashion MNIST Clothing Classification Photo by Zdrovit Skurcz, some rights reserved. Tutorial Overview This tutorial is divided into five parts; they are: Fashion MNIST Clothing Classification Model Evaluation Methodology How to Develop a Baseline Model How to Develop an Improved Model How to Finalize the Model and Make Predictions Want Results with Deep Learning for Computer Vision? Take my free 7-day email crash course now (with sample code). Click to sign-up and also get a free PDF Ebook version of the course. Download Your FREE Mini-Course Fashion MNIST Clothing Classification The Fashion-MNIST dataset is proposed as a more challenging replacement dataset for the MNIST dataset. It is a dataset comprised of 60,000 small square 28×28 pixel grayscale images of items of 10 types of clothing, such as shoes, t-shirts, dresses, and more. The mapping of all 0-9 integers to class labels is listed below. 0: T-shirt/top 1: Trouser 2: Pullover 3: Dress 4: Coat 5: Sandal 6: Shirt 7: Sneaker 8: Bag 9: Ankle boot It is a more challenging classification problem than MNIST and top results are achieved by deep learning convolutional neural networks with a classification accuracy of about 90% to 95% on the hold out test dataset.
Problem Statement: We are provided with a training set and a test set of images of dogs. Each image has a filename that is its unique id. The dataset comprises 120 breeds of dogs. The goal is to create a classifier capable of determining a dog's breed from a photo. The list of breeds is as follows 2. Task Description Who's a good dog? Who likes ear scratches? Well, it seems those fancy deep neural networks don't have all the answers. However, maybe they can answer that ubiquitous question we all ask when meeting a four-legged stranger: what kind of good pup is that? In this Task, we were provided a strictly canine subset of ImageNet in order to practice fine-grained image categorization. How well we can tell our Norfolk Terriers from our Norwich Terriers? With 120 breeds of dogs and a limited number training images per class, we might find the problem more, err, ruff than we anticipated.
This is a Python implementation of a simple 2-layer neural network. The network is designed to work with binary classification problems. The network consists of an input layer, a hidden layer with ReLU activation function and an output layer with sigmoid activation function.
Project Aims To Identify Similar(QUORA) Question Pairs. Applied Siamese Network a special type of neural network which consists of two identical neural networks, each taking one of the two input question pairs. The last layers of the two networks are then fed to a Manhattan lstm model, which predicts the similarity between the two inputs by assigning a score of relevance. Solved the problem using two approaches for the same problem. 1) One Approach using Flair Embeddings. 2) Another Approach using Glove Embeddings
In this project, we take into account different approaches like Eigenfaces, Principal Component Analysis(PCA), Support Vector Machines(SVM), Artificial Neural Networks(ANN), Convolutional Neural Networks(CNN), K-Nearest Neighbour(KNN) for the problem of face recognition and compare all the approaches on the basis of different performance metrics such as Accuracy, Number of Iterations and Error Rate to see which technique is more feasible in real life. Then after we also recognize emotions in a face using the Support Vector Machines(SVM) and the Convolutional Neural Networks(CNN). The approaches we have considered treats Face Recognition and Emotions Recognition problem as two-dimensional recognition problem, the advantage being faces can be described by a small set of 2-D characteristics views. The dataset used is collected from the whole class where each student was asked to upload their selfies in six different emotions.
FU-CHANG-KAI
The aim of this repository was to predict the outbreak of COVID-19 (SARS-CoV-2) in the United States using historical daily confirmed cases and twitter data. In terms of time-series problems, the predicting capabilities of the well-known autoregressive (AR) and a modified Recurrent Neural Network (RNN) with merge layer and dense residual links were compared. For the proposed RNNCON-Res model, the merge layer combines tweets volume as additional features to capture the temporal changes of the COVID-19 pandemic. Dense residual links can effectively restrain over-fitting problem and make the training process more rapid. In addition, Data figure analysis was implemented to gain insight that tweets volume is highly correlated to historical confirmed cases specifically for states hit by the pandemic in an early timeframe. In summary, the proposed RNNCON-Res model demonstrates dominating capability in country-level prediction.
Microsatelite is a unique type of recurrent genomic circuit that is spread throughout the genome and exhibits a high level of allele polymorphism.The microsatelite state turns unstable due to a mismatch repair deoxyribonucleic acid system error.Identification of microsatelite status can be done using deep learning method, Convolutional Neural Networks (CNN).There have been several studies using CNN in this case, but there are problems with model stability and accuracy resulting in less than the maximum that may be caused by overfitting.Based on the problem, the study proposed CNN method by proposing modification of VGG19 model then adding augmentation and dropout after pooling layer (APL).The data used is an image of colorectal and gastric cancer cells.In colorectal cancer cells, the proposed model obtained an accuracy of 76.2%, the addition of augmentation gained an accuracy of 77.1%, then gave the APL dropout lowering accuracy to 76.3%.While in gastric cancer cells testing was conducted using a proposed model with augmentation and the addition of callbacks obtained accuracy of 76%.The proposed model is still overfitting, but augmentation can slightly address overfitting and improve accuracy.While the APL dropout lowers accuracy but provides a stable development of accuracy charts and no indication of overfitting.
Image Classification for a City Dog Show Project Goal Improving your programming skills using Python In this project you will use a created image classifier to identify dog breeds. We ask you to focus on Python and not on the actual classifier (We will focus on building a classifier ourselves later in the program). Description: Your city is hosting a citywide dog show and you have volunteered to help the organizing committee with contestant registration. Every participant that registers must submit an image of their dog along with biographical information about their dog. The registration system tags the images based upon the biographical information. Some people are planning on registering pets that aren’t actual dogs. You need to use an already developed Python classifier to make sure the participants are dogs. Note, you DO NOT need to create the classifier. It will be provided to you. You will need to apply the Python tools you just learned to USE the classifier. Your Tasks: Using your Python skills, you will determine which image classification algorithm works the "best" on classifying images as "dogs" or "not dogs". Determine how well the "best" classification algorithm works on correctly identifying a dog's breed. If you are confused by the term image classifier look at it simply as a tool that has an input and an output. The Input is an image. The output determines what the image depicts. (for example: a dog). Be mindful of the fact that image classifiers do not always categorize the images correctly. (We will get to all those details much later on the program). Time how long each algorithm takes to solve the classification problem. With computational tasks, there is often a trade-off between accuracy and runtime. The more accurate an algorithm, the higher the likelihood that it will take more time to run and use more computational resources to run. For further clarifications, please check our FAQs here. Important Notes: For this image classification task you will be using an image classification application using a deep learning model called a convolutional neural network (often abbreviated as CNN). CNNs work particularly well for detecting features in images like colors, textures, and edges; then using these features to identify objects in the images. You'll use a CNN that has already learned the features from a giant dataset of 1.2 million images called ImageNet. There are different types of CNNs that have different structures (architectures) that work better or worse depending on your criteria. With this project you'll explore the three different architectures (AlexNet, VGG, and ResNet) and determine which is best for your application. We have provided you with a classifier function in classifier.py that will allow you to use these CNNs to classify your images. The test_classifier.py file contains an example program that demonstrates how to use the classifier function. For this project, you will be focusing on using your Python skills to complete these tasks using the classifier function; in the Neural Networks lesson you will be learning more about how these algorithms work. Remember that certain breeds of dog look very similar. The more images of two similar looking dog breeds that the algorithm has learned from, the more likely the algorithm will be able to distinguish between those two breeds. We have found the following breeds to look very similar: Great Pyrenees and Kuvasz, German Shepherd and Malinois, Beagle and Walker Hound, amongst others.