Found 326 repositories(showing 30)
sayantann11
Classification - Machine Learning This is ‘Classification’ tutorial which is a part of the Machine Learning course offered by Simplilearn. We will learn Classification algorithms, types of classification algorithms, support vector machines(SVM), Naive Bayes, Decision Tree and Random Forest Classifier in this tutorial. Objectives Let us look at some of the objectives covered under this section of Machine Learning tutorial. Define Classification and list its algorithms Describe Logistic Regression and Sigmoid Probability Explain K-Nearest Neighbors and KNN classification Understand Support Vector Machines, Polynomial Kernel, and Kernel Trick Analyze Kernel Support Vector Machines with an example Implement the Naïve Bayes Classifier Demonstrate Decision Tree Classifier Describe Random Forest Classifier Classification: Meaning Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. It predicts a class for an input variable as well. There are 2 types of Classification: Binomial Multi-Class Classification: Use Cases Some of the key areas where classification cases are being used: To find whether an email received is a spam or ham To identify customer segments To find if a bank loan is granted To identify if a kid will pass or fail in an examination Classification: Example Social media sentiment analysis has two potential outcomes, positive or negative, as displayed by the chart given below. https://www.simplilearn.com/ice9/free_resources_article_thumb/classification-example-machine-learning.JPG This chart shows the classification of the Iris flower dataset into its three sub-species indicated by codes 0, 1, and 2. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-flower-dataset-graph.JPG The test set dots represent the assignment of new test data points to one class or the other based on the trained classifier model. Types of Classification Algorithms Let’s have a quick look into the types of Classification Algorithm below. Linear Models Logistic Regression Support Vector Machines Nonlinear models K-nearest Neighbors (KNN) Kernel Support Vector Machines (SVM) Naïve Bayes Decision Tree Classification Random Forest Classification Logistic Regression: Meaning Let us understand the Logistic Regression model below. This refers to a regression model that is used for classification. This method is widely used for binary classification problems. It can also be extended to multi-class classification problems. Here, the dependent variable is categorical: y ϵ {0, 1} A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail, healthy or sick, etc In this case, you model the probability distribution of output y as 1 or 0. This is called the sigmoid probability (σ). If σ(θ Tx) > 0.5, set y = 1, else set y = 0 Unlike Linear Regression (and its Normal Equation solution), there is no closed form solution for finding optimal weights of Logistic Regression. Instead, you must solve this with maximum likelihood estimation (a probability model to detect the maximum likelihood of something happening). It can be used to calculate the probability of a given outcome in a binary model, like the probability of being classified as sick or passing an exam. https://www.simplilearn.com/ice9/free_resources_article_thumb/logistic-regression-example-graph.JPG Sigmoid Probability The probability in the logistic regression is often represented by the Sigmoid function (also called the logistic function or the S-curve): https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-function-machine-learning.JPG In this equation, t represents data values * the number of hours studied and S(t) represents the probability of passing the exam. Assume sigmoid function: https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-probability-machine-learning.JPG g(z) tends toward 1 as z -> infinity , and g(z) tends toward 0 as z -> infinity K-nearest Neighbors (KNN) K-nearest Neighbors algorithm is used to assign a data point to clusters based on similarity measurement. It uses a supervised method for classification. The steps to writing a k-means algorithm are as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-distribution-graph-machine-learning.JPG Choose the number of k and a distance metric. (k = 5 is common) Find k-nearest neighbors of the sample that you want to classify Assign the class label by majority vote. KNN Classification A new input point is classified in the category such that it has the most number of neighbors from that category. For example: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-classification-machine-learning.JPG Classify a patient as high risk or low risk. Mark email as spam or ham. Keen on learning about Classification Algorithms in Machine Learning? Click here! Support Vector Machine (SVM) Let us understand Support Vector Machine (SVM) in detail below. SVMs are classification algorithms used to assign data to various classes. They involve detecting hyperplanes which segregate data into classes. SVMs are very versatile and are also capable of performing linear or nonlinear classification, regression, and outlier detection. Once ideal hyperplanes are discovered, new data points can be easily classified. https://www.simplilearn.com/ice9/free_resources_article_thumb/support-vector-machines-graph-machine-learning.JPG The optimization objective is to find “maximum margin hyperplane” that is farthest from the closest points in the two classes (these points are called support vectors). In the given figure, the middle line represents the hyperplane. SVM Example Let’s look at this image below and have an idea about SVM in general. Hyperplanes with larger margins have lower generalization error. The positive and negative hyperplanes are represented by: https://www.simplilearn.com/ice9/free_resources_article_thumb/positive-negative-hyperplanes-machine-learning.JPG Classification of any new input sample xtest : If w0 + wTxtest > 1, the sample xtest is said to be in the class toward the right of the positive hyperplane. If w0 + wTxtest < -1, the sample xtest is said to be in the class toward the left of the negative hyperplane. When you subtract the two equations, you get: https://www.simplilearn.com/ice9/free_resources_article_thumb/equation-subtraction-machine-learning.JPG Length of vector w is (L2 norm length): https://www.simplilearn.com/ice9/free_resources_article_thumb/length-of-vector-machine-learning.JPG You normalize with the length of w to arrive at: https://www.simplilearn.com/ice9/free_resources_article_thumb/normalize-equation-machine-learning.JPG SVM: Hard Margin Classification Given below are some points to understand Hard Margin Classification. The left side of equation SVM-1 given above can be interpreted as the distance between the positive (+ve) and negative (-ve) hyperplanes; in other words, it is the margin that can be maximized. Hence the objective of the function is to maximize with the constraint that the samples are classified correctly, which is represented as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-machine-learning.JPG This means that you are minimizing ‖w‖. This also means that all positive samples are on one side of the positive hyperplane and all negative samples are on the other side of the negative hyperplane. This can be written concisely as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-formula.JPG Minimizing ‖w‖ is the same as minimizing. This figure is better as it is differentiable even at w = 0. The approach listed above is called “hard margin linear SVM classifier.” SVM: Soft Margin Classification Given below are some points to understand Soft Margin Classification. To allow for linear constraints to be relaxed for nonlinearly separable data, a slack variable is introduced. (i) measures how much ith instance is allowed to violate the margin. The slack variable is simply added to the linear constraints. https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-machine-learning.JPG Subject to the above constraints, the new objective to be minimized becomes: https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-formula.JPG You have two conflicting objectives now—minimizing slack variable to reduce margin violations and minimizing to increase the margin. The hyperparameter C allows us to define this trade-off. Large values of C correspond to larger error penalties (so smaller margins), whereas smaller values of C allow for higher misclassification errors and larger margins. https://www.simplilearn.com/ice9/free_resources_article_thumb/machine-learning-certification-video-preview.jpg SVM: Regularization The concept of C is the reverse of regularization. Higher C means lower regularization, which increases bias and lowers the variance (causing overfitting). https://www.simplilearn.com/ice9/free_resources_article_thumb/concept-of-c-graph-machine-learning.JPG IRIS Data Set The Iris dataset contains measurements of 150 IRIS flowers from three different species: Setosa Versicolor Viriginica Each row represents one sample. Flower measurements in centimeters are stored as columns. These are called features. IRIS Data Set: SVM Let’s train an SVM model using sci-kit-learn for the Iris dataset: https://www.simplilearn.com/ice9/free_resources_article_thumb/svm-model-graph-machine-learning.JPG Nonlinear SVM Classification There are two ways to solve nonlinear SVMs: by adding polynomial features by adding similarity features Polynomial features can be added to datasets; in some cases, this can create a linearly separable dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/nonlinear-classification-svm-machine-learning.JPG In the figure on the left, there is only 1 feature x1. This dataset is not linearly separable. If you add x2 = (x1)2 (figure on the right), the data becomes linearly separable. Polynomial Kernel In sci-kit-learn, one can use a Pipeline class for creating polynomial features. Classification results for the Moons dataset are shown in the figure. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-machine-learning.JPG Polynomial Kernel with Kernel Trick Let us look at the image below and understand Kernel Trick in detail. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-with-kernel-trick.JPG For large dimensional datasets, adding too many polynomial features can slow down the model. You can apply a kernel trick with the effect of polynomial features without actually adding them. The code is shown (SVC class) below trains an SVM classifier using a 3rd-degree polynomial kernel but with a kernel trick. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-equation-machine-learning.JPG The hyperparameter coefθ controls the influence of high-degree polynomials. Kernel SVM Let us understand in detail about Kernel SVM. Kernel SVMs are used for classification of nonlinear data. In the chart, nonlinear data is projected into a higher dimensional space via a mapping function where it becomes linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-machine-learning.JPG In the higher dimension, a linear separating hyperplane can be derived and used for classification. A reverse projection of the higher dimension back to original feature space takes it back to nonlinear shape. As mentioned previously, SVMs can be kernelized to solve nonlinear classification problems. You can create a sample dataset for XOR gate (nonlinear problem) from NumPy. 100 samples will be assigned the class sample 1, and 100 samples will be assigned the class label -1. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-graph-machine-learning.JPG As you can see, this data is not linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-non-separable.JPG You now use the kernel trick to classify XOR dataset created earlier. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-xor-machine-learning.JPG Naïve Bayes Classifier What is Naive Bayes Classifier? Have you ever wondered how your mail provider implements spam filtering or how online news channels perform news text classification or even how companies perform sentiment analysis of their audience on social media? All of this and more are done through a machine learning algorithm called Naive Bayes Classifier. Naive Bayes Named after Thomas Bayes from the 1700s who first coined this in the Western literature. Naive Bayes classifier works on the principle of conditional probability as given by the Bayes theorem. Advantages of Naive Bayes Classifier Listed below are six benefits of Naive Bayes Classifier. Very simple and easy to implement Needs less training data Handles both continuous and discrete data Highly scalable with the number of predictors and data points As it is fast, it can be used in real-time predictions Not sensitive to irrelevant features Bayes Theorem We will understand Bayes Theorem in detail from the points mentioned below. According to the Bayes model, the conditional probability P(Y|X) can be calculated as: P(Y|X) = P(X|Y)P(Y) / P(X) This means you have to estimate a very large number of P(X|Y) probabilities for a relatively small vector space X. For example, for a Boolean Y and 30 possible Boolean attributes in the X vector, you will have to estimate 3 billion probabilities P(X|Y). To make it practical, a Naïve Bayes classifier is used, which assumes conditional independence of P(X) to each other, with a given value of Y. This reduces the number of probability estimates to 2*30=60 in the above example. Naïve Bayes Classifier for SMS Spam Detection Consider a labeled SMS database having 5574 messages. It has messages as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-machine-learning.JPG Each message is marked as spam or ham in the data set. Let’s train a model with Naïve Bayes algorithm to detect spam from ham. The message lengths and their frequency (in the training dataset) are as shown below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-spam-detection.JPG Analyze the logic you use to train an algorithm to detect spam: Split each message into individual words/tokens (bag of words). Lemmatize the data (each word takes its base form, like “walking” or “walked” is replaced with “walk”). Convert data to vectors using scikit-learn module CountVectorizer. Run TFIDF to remove common words like “is,” “are,” “and.” Now apply scikit-learn module for Naïve Bayes MultinomialNB to get the Spam Detector. This spam detector can then be used to classify a random new message as spam or ham. Next, the accuracy of the spam detector is checked using the Confusion Matrix. For the SMS spam example above, the confusion matrix is shown on the right. Accuracy Rate = Correct / Total = (4827 + 592)/5574 = 97.21% Error Rate = Wrong / Total = (155 + 0)/5574 = 2.78% https://www.simplilearn.com/ice9/free_resources_article_thumb/confusion-matrix-machine-learning.JPG Although confusion Matrix is useful, some more precise metrics are provided by Precision and Recall. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-recall-matrix-machine-learning.JPG Precision refers to the accuracy of positive predictions. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-formula-machine-learning.JPG Recall refers to the ratio of positive instances that are correctly detected by the classifier (also known as True positive rate or TPR). https://www.simplilearn.com/ice9/free_resources_article_thumb/recall-formula-machine-learning.JPG Precision/Recall Trade-off To detect age-appropriate videos for kids, you need high precision (low recall) to ensure that only safe videos make the cut (even though a few safe videos may be left out). The high recall is needed (low precision is acceptable) in-store surveillance to catch shoplifters; a few false alarms are acceptable, but all shoplifters must be caught. Learn about Naive Bayes in detail. Click here! Decision Tree Classifier Some aspects of the Decision Tree Classifier mentioned below are. Decision Trees (DT) can be used both for classification and regression. The advantage of decision trees is that they require very little data preparation. They do not require feature scaling or centering at all. They are also the fundamental components of Random Forests, one of the most powerful ML algorithms. Unlike Random Forests and Neural Networks (which do black-box modeling), Decision Trees are white box models, which means that inner workings of these models are clearly understood. In the case of classification, the data is segregated based on a series of questions. Any new data point is assigned to the selected leaf node. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-machine-learning.JPG Start at the tree root and split the data on the feature using the decision algorithm, resulting in the largest information gain (IG). This splitting procedure is then repeated in an iterative process at each child node until the leaves are pure. This means that the samples at each node belonging to the same class. In practice, you can set a limit on the depth of the tree to prevent overfitting. The purity is compromised here as the final leaves may still have some impurity. The figure shows the classification of the Iris dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-graph.JPG IRIS Decision Tree Let’s build a Decision Tree using scikit-learn for the Iris flower dataset and also visualize it using export_graphviz API. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-machine-learning.JPG The output of export_graphviz can be converted into png format: https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-output.JPG Sample attribute stands for the number of training instances the node applies to. Value attribute stands for the number of training instances of each class the node applies to. Gini impurity measures the node’s impurity. A node is “pure” (gini=0) if all training instances it applies to belong to the same class. https://www.simplilearn.com/ice9/free_resources_article_thumb/impurity-formula-machine-learning.JPG For example, for Versicolor (green color node), the Gini is 1-(0/54)2 -(49/54)2 -(5/54) 2 ≈ 0.168 https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-sample.JPG Decision Boundaries Let us learn to create decision boundaries below. For the first node (depth 0), the solid line splits the data (Iris-Setosa on left). Gini is 0 for Setosa node, so no further split is possible. The second node (depth 1) splits the data into Versicolor and Virginica. If max_depth were set as 3, a third split would happen (vertical dotted line). https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-boundaries.JPG For a sample with petal length 5 cm and petal width 1.5 cm, the tree traverses to depth 2 left node, so the probability predictions for this sample are 0% for Iris-Setosa (0/54), 90.7% for Iris-Versicolor (49/54), and 9.3% for Iris-Virginica (5/54) CART Training Algorithm Scikit-learn uses Classification and Regression Trees (CART) algorithm to train Decision Trees. CART algorithm: Split the data into two subsets using a single feature k and threshold tk (example, petal length < “2.45 cm”). This is done recursively for each node. k and tk are chosen such that they produce the purest subsets (weighted by their size). The objective is to minimize the cost function as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/cart-training-algorithm-machine-learning.JPG The algorithm stops executing if one of the following situations occurs: max_depth is reached No further splits are found for each node Other hyperparameters may be used to stop the tree: min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes Gini Impurity or Entropy Entropy is one more measure of impurity and can be used in place of Gini. https://www.simplilearn.com/ice9/free_resources_article_thumb/gini-impurity-entrophy.JPG It is a degree of uncertainty, and Information Gain is the reduction that occurs in entropy as one traverses down the tree. Entropy is zero for a DT node when the node contains instances of only one class. Entropy for depth 2 left node in the example given above is: https://www.simplilearn.com/ice9/free_resources_article_thumb/entrophy-for-depth-2.JPG Gini and Entropy both lead to similar trees. DT: Regularization The following figure shows two decision trees on the moons dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/dt-regularization-machine-learning.JPG The decision tree on the right is restricted by min_samples_leaf = 4. The model on the left is overfitting, while the model on the right generalizes better. Random Forest Classifier Let us have an understanding of Random Forest Classifier below. A random forest can be considered an ensemble of decision trees (Ensemble learning). Random Forest algorithm: Draw a random bootstrap sample of size n (randomly choose n samples from the training set). Grow a decision tree from the bootstrap sample. At each node, randomly select d features. Split the node using the feature that provides the best split according to the objective function, for instance by maximizing the information gain. Repeat the steps 1 to 2 k times. (k is the number of trees you want to create, using a subset of samples) Aggregate the prediction by each tree for a new data point to assign the class label by majority vote (pick the group selected by the most number of trees and assign new data point to that group). Random Forests are opaque, which means it is difficult to visualize their inner workings. https://www.simplilearn.com/ice9/free_resources_article_thumb/random-forest-classifier-graph.JPG However, the advantages outweigh their limitations since you do not have to worry about hyperparameters except k, which stands for the number of decision trees to be created from a subset of samples. RF is quite robust to noise from the individual decision trees. Hence, you need not prune individual decision trees. The larger the number of decision trees, the more accurate the Random Forest prediction is. (This, however, comes with higher computation cost). Key Takeaways Let us quickly run through what we have learned so far in this Classification tutorial. Classification algorithms are supervised learning methods to split data into classes. They can work on Linear Data as well as Nonlinear Data. Logistic Regression can classify data based on weighted parameters and sigmoid conversion to calculate the probability of classes. K-nearest Neighbors (KNN) algorithm uses similar features to classify data. Support Vector Machines (SVMs) classify data by detecting the maximum margin hyperplane between data classes. Naïve Bayes, a simplified Bayes Model, can help classify data using conditional probability models. Decision Trees are powerful classifiers and use tree splitting logic until pure or somewhat pure leaf node classes are attained. Random Forests apply Ensemble Learning to Decision Trees for more accurate classification predictions. Conclusion This completes ‘Classification’ tutorial. In the next tutorial, we will learn 'Unsupervised Learning with Clustering.'
msavva
ReVision: Automated Classification, Analysis and Redesign of Chart Images
amankaushik
The information system chosen for the project was a stock investment management website providing live prices, historical data, news articles, etc and also basic analysis and recommendations using data mining techniques. 1. Crawling and parsing Yahoo-Finance, Reuters and Twitter data (Java, twitter4j). 2. Web Interface using J2EE and Struts-2 framework. jQuery (highstocks lib) for showing technical charts. 3. Database integration, data cleaning, feature selection on the collected data and applying linear regression and classification algorithms : SVM, Naive Bayes to produce detailed analysis and recommendations.
victoria-lo
A custom image classification Pokedex app built with AutoML Vision API, Google Cloud Storage and Chart.js. Submitted to New Year New Hack Hackathon. 3rd place winner.
suhanitomar888
A data visualization project analyzing the Good Food Purchasing Program dataset to uncover insights on food spending, vendor patterns, agency trends, and product classifications across New York City agencies. This project transforms raw procurement data into clear, interactive charts to support data-driven decisions in public food policy.
The basis of this project involves analyzing Amgen future profitability based on its current business environment and financial performance. Technical Analysis, on the other hand, includes reading the charts and using statistical figures to identify the trends in the stock market. The dataset used for this analysis was downloaded from Yahoo finance for year 2009 to 2019. There are multiple variables in the dataset – date, open, high, low, volume. Adjusted close. The columns Open and Close represent the starting and final price at which the stock is traded on a day. High and Low represent the maximum, minimum price of the share for the day. The profit or loss calculation is usually determined by the closing price of a stock for the day, I used the adjusted closing price as the target variable. I downloaded data on the inflation rate, unemployment rate, Industrial Production Index, Consumer Price Index for All Urban Consumers: All Items and Real Gross Domestic Product as independent variables, Quarterly Financial Report: U.S. Corporations: Cash Dividends Charged to Retained Earnings All Manufacturing: All Nondurable Manufacturing: Chemicals: Pharmaceuticals and Medicines Industry, Producer Price Index by Industry: Pharmaceutical Preparation Manufacturing, 30-Year Treasury Constant Maturity Rate, and Producer Price Index by Industry: Pharmaceutical and Medicine Manufacturing Index. The independent variables are economic parameters which was obtained from Federal Reserve Economic Data (FRED) website. Methodology 1. Linear Regression: The linear regression model returns an equation that determines the relationship between the independent variables and the dependent variable. I used linear regression tool in Alteryx with ARIMA tool to forecast the stock prices for the year. The algorithm was trained with the historical data to see how the variables impact on the dependent variable. The test data was used to predict the adjusted closing price for the year and predicted a stock price of $193.38. 2. Support Vector Machines (SVM): Support Vector Networks (SVN), are a popular set of supervised learning algorithms originally developed for classification (categorical target) problems and can be used for regression (numerical target) problems. SVMs are memory efficient and can address many predictor variables. This model finds the best equation of one predictor, a plane (two predictors) or a hyperplane (three or more predictors) that maximally separates the groups of records, based on a measure of distance into different groups based on the target variable. A kernel function provides the measure of distance that causes to records to be placed in the same or different groups and involves taking a function of the predictor variables to define the distance metric. I used the SVM tool in Alteryx with ARIMA tool to forecast the stock prices for the year and predicted a stock price of $189.44. 3. Spline Model: The Spline Model tool was used because it provides the multivariate adaptive regression splines (or MARS) algorithm of Friedman. This statistical learning model self-determines which subset of fields best predict a target field of interest and can capture highly nonlinear relationships and interactions between fields. I used the Spline tool in Alteryx with ARIMA tool to forecast the stock prices for the year and predicted a stock price of $201.84. The results from the models was weighted by comparing the RMSE of each model. A lower RMSE indicates that the model’s predictions were closer to the actual values. However, a simpler model with the same RMSE as a more complex model is generally better, as simpler models are less likely to be overfit. Though the Spline model had a lower RMSE, the Linear Regression model had fewer variables. Thus, we combined the 3 models with the ARIMA forecast in a model ensemble, which allows us to use the results of multiple models. The forecasted stock price is $197.99 with 1.5% increase for 31st December 2019. Apart from economic parameters, stock price is affected by the news about the company and other factors like demonetization or merger/demerger of the companies. There are certain intangible factors which can often be impossible to predict beforehand hence the model predicts that the stock price of Amgen will continue to rise except there is a drastic downturn of the company.
A comprehensive machine learning application that predicts breast cancer malignancy using cytology measurements. Features an interactive Streamlit web interface with real-time visualizations including radar charts for cell nuclei analysis. Implements logistic regression with data preprocessing pipelines for accurate benign/malignant classification.
Inspired by the immense success shown by artificial neural networks in computer vision on images classification, we propose a novel framework to detect one of the rife fraudulent financial manipulations in crypto currency trading world known as pump and dump. The representation of crypto currency financial charts was re-imagined to ameliorate the classification by taking advantage of some of the very recent advancements of time series to spatial encoding techniques of Gramian Angular Field (GAF), Markov Transition Field (MTF) and Recurrence plots (RP) that are capable of spatially encoding the temporal financial time series data in the form of images. Encoded images were then used to train several convolutional neural network architectures which have been able to achieve a very high precision, recall and F1 values close to 99% over the unseen data for the above classification task. This is one of the first of such researches in pump and dump detection in crypto currencies using computer vision. This approach has the potential to be extended in detecting predefined shapes of time series charts.
ramazanunlu
ADeep Learning Approach for Classification and Early Detection of Control Chart Pattern Recognition
Public-Health-Bioinformatics
:syringe: :bar_chart: Flu Classification Suite
Krisloveless
:bar_chart: 1D-CNN architecture for scRNA-seq classification
abir0
An image classification project to classify various charts or diagrams.
christophostertag
Chart pattern classification from synthetic data with a CNN in Keras
hjkimk
source code for the paper: "Text Role Classification of Scientific Charts Using Multimodal Transformers"
moured
Our official implementation for the journal paper 'Swin-chart: An efficient approach for chart classification,' published in Pattern Recognition Letters.
Black-Lights
Automated soil classification from Cone Penetration Test (CPT) data using digitized boundary curves from the Robertson & Campanella (1983) soil classification chart.
rohit6996
A Python project that fetches Reddit discussions on any question, analyzes user comments with NLP sentiment classification, and visualizes the overall opinion as a chart.
souravray97
The Python and R code involves 3-4 techniques for predicting the classification of failures and non failures in the data. The following steps have been performed on the dataset: Data cleaning and/or one hot encoding for factor variables. Partitioning data into training and validation. Performing a logistic regression and predicting using the validation dataset. Lift and decile wise charts are constructed for the results obtained from the logistic regression performed. A classification tree has been built on the training dataset, the tree is pruned using the minimum cp value. A confusion matric for the tree has also been provided with an accuracy of 98.2%. A neural network with 1 hidden layer has been fit on the data.
RobertMarton
NLP projects out of work time in PKU Institute of Computational Linguistics, NLP includes Information Extraction,Relation Extraction,NER Cross-Lingual Graph Knowledge Bases Language Models Bertology Bert Distilled Bert Probe Document-level Representation Tune NLG Summarization NMT ChatBot QA MRC Dialogue Information System Other Text Style Transfer Parsing Chinese Attack Common Sense. ML projects include Architecture Transformer Relative position embedding Attention Normalization.Startegy includes Metric Learning Interpretability Multi-task Regularization Optimization Negative Sample.Data includes Unbalance Classify Noisy Labels Extreme Classification Tabular Learning Applied Data Scrience Table and Chart Demo.CV includes Video Prediction.System includes Distribution.
DharmeshPatel33
Chart Classification using ResNet with ImageNet
arvinnick
classification of price trends, using AlexNet and charts of the prices
hamzaaityoussef
A Python application for dataset management, visualization, and machine learning. Features include: Dataset Upload/Creation: Import or create datasets for analysis. Visualization: Generate interactive charts to explore data trends. ML Model Implementation: Apply various ML algorithms (regression, clustering, classification, etc.)
gh4n
:bar_chart: text classification using a recurrent neural net as a microservice
leiwng
Automatically evaluate chromosome AI segmentation and classification result from Final Karyotype Report Chart
Submission for Cuvette’s Data Science TA Hiring Assignment. Includes Python ML (EDA, classification), SQL (Chinook DB), Tableau Dashboard, Excel Pivot & Charts, AI tool usage (ChatGPT prompts), and reflections on teaching & TA mindset. Video walkthrough included.
harshdeepsokhey
Paper Reviews for CSE-704: Practical Techniques in Deep Learning
anirudh-g
This is the repo of the dissertation project "CHART TYPE CLASSIFICATION AND OBJECT RECOGNITION USING DEEP LEARNING" for MS Data Science at University of Glasgow for the academic year 2019-20
krishd1809
Problem statement: Create a classification model to predict the gender (male or female) based on different acoustic parameters Context: This database was created to identify a voice as male or female, based upon acoustic properties of the voice and speech. The dataset consists of 3,168 recorded voice samples, collected from male and female speakers. The voice samples are preprocessed by acoustic analysis in R using the seewave and tuneR packages, with an analyzed frequency range of 0hz-280hz (human vocal range). Column Description: • meanfreq: mean frequency (in kHz) • sd: standard deviation of frequency • median: median frequency (in kHz) • Q25: first quantile (in kHz) • Q75: third quantile (in kHz) • IQR: interquantile range (in kHz) • skew: skewness (see note in specprop description) • kurt: kurtosis (see note in specprop description) • sp.ent: spectral entropy • sfm: spectral flatness • mode: mode frequency • centroid: frequency centroid (see specprop) • peakf: peak frequency (frequency with highest energy) • meanfun: average of fundamental frequency measured across acoustic signal • minfun: minimum fundamental frequency measured across acoustic signal • maxfun: maximum fundamental frequency measured across acoustic signal • meandom: average of dominant frequency measured across acoustic signal • mindom: minimum of dominant frequency measured across acoustic signal • maxdom: maximum of dominant frequency measured across acoustic signal • dfrange: range of dominant frequency measured across acoustic signal • modindx: modulation index. Calculated as the accumulated absolute difference between adjacent measurements of fundamental frequencies divided by the frequency range • label: male or female Dataset: https://drive.google.com/file/d/1PGo2PZ2uP9NXUvA5yWhmNRURlkJYK1nU/view?usp=sharing Steps to consider: 1)Remove/handle null values (if any) 2)Depict percentage distribution of label on a pie chart 3)Considering all the features as independent feature and label as dependent feature, split the dataset training and testing data with test size=20% 4)Apply the following classifier models on training dataset and generate predictions for the test dataset a. Decision Tree Classifier b. Random Forest Classifier c. KNN Classifier d. Logistic Regression e. SVM Classifier 5)Also generate confusion_matrix and classification report for each model generated in Q4. 6)Report the model with the best accuracy.
This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. The course will start with a discussion of how machine learning is different than descriptive statistics, and introduce the scikit learn toolkit through a tutorial. The issue of dimensionality of data will be discussed, and the task of clustering data, as well as evaluating those clusters, will be tackled. Supervised approaches for creating predictive models will be described, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models. By the end of this course, students will be able to identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular dataset and need, engineer features to meet that need, and write python code to carry out an analysis. This course should be taken after Introduction to Data Science in Python and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python.
Summary The global "Thermoforming Plastic" market offers a thorough analysis of the market's size, share, competitive environment, segmentation growth, and sales study at the regional, international, and national levels. Additionally, it includes details on impacts, disclosures, analyses of values, opportunities, and recent changes, trading regulations, analyses of the strategic market expansion, and regional studies of local and international market participants. The study and forecast assessment of the global "Thermoforming Plastic" company comprise a geographic examination. The revenue performance of several local, regional, and national markets is examined in this study. For the past and upcoming years, it provides comprehensive and trustworthy volume statistics by country as well as market size data by region. The global competitive breakdowns of vendors include vendor information such as total revenue, competitive opportunities, business profiles, sales and revenue generated, global footprints, market share, and prices. For each participant analyzed in this study, sales, earnings, and market share data are provided. The study evaluates the financial performance of the market's top rivals as well as their business summaries, sector market shares, geographic reach, corporate strategies, technologies, mergers and acquisitions, recent developments, joint ventures, alliances, and partnerships. This study report is a crucial resource for fully understanding the market. Free Sample Report + All Related Graphs & Charts @ https://www.adroitmarketresearch.com/contacts/request-sample/1930 The "Thermoforming Plastic" market research report includes definitions, category classifications, growth trends, new dynamic analysis, product applications, industry chain structure, business overview, analysis of national policy and plans, competitive environment, product technologies, and more. It analyses how demand might change over the course of the projection period depending on growth drivers and threats. Global business trends are also taken into consideration in the investigation. This study looks at recent and upcoming technological developments in the global "Thermoforming Plastic" market to determine whether there are any promising investment opportunities. Purchase the report at https://www.adroitmarketresearch.com/researchreport/purchase/1930 Key Points Covered in the Report: A thorough analysis of value and volume at the worldwide, sector, and regional levels is included in the global 'Thermoforming Plastic' market report. The study offers a full business size 'Thermoforming Plastic' from a global point of view through a review of past facts and possible scenarios. Geographically, the Thermoforming Plastic of market analysis includes the number of regions and their contrast of revenue. The The market analysis focuses on ex-factory costs, output volume, market share & sales for every manufacturer on a company level basis. Key Reasons to Purchase this Report: A comprehensive study of market size, share and dynamics is a global 'Thermoforming Plastic' market research report and a thorough survey of developments in the field. It offers an in-depth overview of revenue growth and an analysis of the total business benefits. In addition to the strategic landscape for commodity pricing and marketing, the 'Thermoforming Plastic' industry research also provides key players. This is a new post covering the latest impact on the target market. The research report addresses the rapidly evolving market climate as well as the initial and future impact assessment. ABOUT US: Adroit Market Research is an India-based business analytics and consulting company. Our target audience is a wide range of corporations, manufacturing companies, product/technology development institutions and industry associations that require understanding of a market’s size, key trends, participants and future outlook of an industry. We intend to become our clients’ knowledge partner and provide them with valuable market insights to help create opportunities that increase their revenues. We follow a code– Explore, Learn and Transform. At our core, we are curious people who love to identify and understand industry patterns, create an insightful study around our findings and churn out money-making roadmaps. CONTACT US: Ryan Johnson Account Manager Global 3131 McKinney Ave Ste 600, Dallas, TX 75204, U.S.A Phone No.: USA: +1 9726644514/ +91 9665341414