Search Results

Found 33,117 repositories(showing 30)

SQL-Data-Analysis-and-Visualization-Projects

ptyadana

💛77

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

1.7k

572

MIT

Jupyter Notebook

Updated 4 hours ago

apache-sparkchallengesdata-analysis+16

Stock-Market-Analysis-and-Prediction

venky14

🧡67

Stock Market Analysis and Prediction is the project on technical analysis, visualization and prediction using data provided by Google Finance.

358

121

HTML

Updated 2 days ago

jupyter-notebookmatplotlibnumpy+5

all-classification-templetes-for-ML

sayantann11

🧡66

Classification - Machine Learning This is ‘Classification’ tutorial which is a part of the Machine Learning course offered by Simplilearn. We will learn Classification algorithms, types of classification algorithms, support vector machines(SVM), Naive Bayes, Decision Tree and Random Forest Classifier in this tutorial. Objectives Let us look at some of the objectives covered under this section of Machine Learning tutorial. Define Classification and list its algorithms Describe Logistic Regression and Sigmoid Probability Explain K-Nearest Neighbors and KNN classification Understand Support Vector Machines, Polynomial Kernel, and Kernel Trick Analyze Kernel Support Vector Machines with an example Implement the Naïve Bayes Classifier Demonstrate Decision Tree Classifier Describe Random Forest Classifier Classification: Meaning Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. It predicts a class for an input variable as well. There are 2 types of Classification: Binomial Multi-Class Classification: Use Cases Some of the key areas where classification cases are being used: To find whether an email received is a spam or ham To identify customer segments To find if a bank loan is granted To identify if a kid will pass or fail in an examination Classification: Example Social media sentiment analysis has two potential outcomes, positive or negative, as displayed by the chart given below. https://www.simplilearn.com/ice9/free_resources_article_thumb/classification-example-machine-learning.JPG This chart shows the classification of the Iris flower dataset into its three sub-species indicated by codes 0, 1, and 2. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-flower-dataset-graph.JPG The test set dots represent the assignment of new test data points to one class or the other based on the trained classifier model. Types of Classification Algorithms Let’s have a quick look into the types of Classification Algorithm below. Linear Models Logistic Regression Support Vector Machines Nonlinear models K-nearest Neighbors (KNN) Kernel Support Vector Machines (SVM) Naïve Bayes Decision Tree Classification Random Forest Classification Logistic Regression: Meaning Let us understand the Logistic Regression model below. This refers to a regression model that is used for classification. This method is widely used for binary classification problems. It can also be extended to multi-class classification problems. Here, the dependent variable is categorical: y ϵ {0, 1} A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail, healthy or sick, etc In this case, you model the probability distribution of output y as 1 or 0. This is called the sigmoid probability (σ). If σ(θ Tx) > 0.5, set y = 1, else set y = 0 Unlike Linear Regression (and its Normal Equation solution), there is no closed form solution for finding optimal weights of Logistic Regression. Instead, you must solve this with maximum likelihood estimation (a probability model to detect the maximum likelihood of something happening). It can be used to calculate the probability of a given outcome in a binary model, like the probability of being classified as sick or passing an exam. https://www.simplilearn.com/ice9/free_resources_article_thumb/logistic-regression-example-graph.JPG Sigmoid Probability The probability in the logistic regression is often represented by the Sigmoid function (also called the logistic function or the S-curve): https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-function-machine-learning.JPG In this equation, t represents data values * the number of hours studied and S(t) represents the probability of passing the exam. Assume sigmoid function: https://www.simplilearn.com/ice9/free_resources_article_thumb/sigmoid-probability-machine-learning.JPG g(z) tends toward 1 as z -> infinity , and g(z) tends toward 0 as z -> infinity K-nearest Neighbors (KNN) K-nearest Neighbors algorithm is used to assign a data point to clusters based on similarity measurement. It uses a supervised method for classification. The steps to writing a k-means algorithm are as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-distribution-graph-machine-learning.JPG Choose the number of k and a distance metric. (k = 5 is common) Find k-nearest neighbors of the sample that you want to classify Assign the class label by majority vote. KNN Classification A new input point is classified in the category such that it has the most number of neighbors from that category. For example: https://www.simplilearn.com/ice9/free_resources_article_thumb/knn-classification-machine-learning.JPG Classify a patient as high risk or low risk. Mark email as spam or ham. Keen on learning about Classification Algorithms in Machine Learning? Click here! Support Vector Machine (SVM) Let us understand Support Vector Machine (SVM) in detail below. SVMs are classification algorithms used to assign data to various classes. They involve detecting hyperplanes which segregate data into classes. SVMs are very versatile and are also capable of performing linear or nonlinear classification, regression, and outlier detection. Once ideal hyperplanes are discovered, new data points can be easily classified. https://www.simplilearn.com/ice9/free_resources_article_thumb/support-vector-machines-graph-machine-learning.JPG The optimization objective is to find “maximum margin hyperplane” that is farthest from the closest points in the two classes (these points are called support vectors). In the given figure, the middle line represents the hyperplane. SVM Example Let’s look at this image below and have an idea about SVM in general. Hyperplanes with larger margins have lower generalization error. The positive and negative hyperplanes are represented by: https://www.simplilearn.com/ice9/free_resources_article_thumb/positive-negative-hyperplanes-machine-learning.JPG Classification of any new input sample xtest : If w0 + wTxtest > 1, the sample xtest is said to be in the class toward the right of the positive hyperplane. If w0 + wTxtest < -1, the sample xtest is said to be in the class toward the left of the negative hyperplane. When you subtract the two equations, you get: https://www.simplilearn.com/ice9/free_resources_article_thumb/equation-subtraction-machine-learning.JPG Length of vector w is (L2 norm length): https://www.simplilearn.com/ice9/free_resources_article_thumb/length-of-vector-machine-learning.JPG You normalize with the length of w to arrive at: https://www.simplilearn.com/ice9/free_resources_article_thumb/normalize-equation-machine-learning.JPG SVM: Hard Margin Classification Given below are some points to understand Hard Margin Classification. The left side of equation SVM-1 given above can be interpreted as the distance between the positive (+ve) and negative (-ve) hyperplanes; in other words, it is the margin that can be maximized. Hence the objective of the function is to maximize with the constraint that the samples are classified correctly, which is represented as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-machine-learning.JPG This means that you are minimizing ‖w‖. This also means that all positive samples are on one side of the positive hyperplane and all negative samples are on the other side of the negative hyperplane. This can be written concisely as : https://www.simplilearn.com/ice9/free_resources_article_thumb/hard-margin-classification-formula.JPG Minimizing ‖w‖ is the same as minimizing. This figure is better as it is differentiable even at w = 0. The approach listed above is called “hard margin linear SVM classifier.” SVM: Soft Margin Classification Given below are some points to understand Soft Margin Classification. To allow for linear constraints to be relaxed for nonlinearly separable data, a slack variable is introduced. (i) measures how much ith instance is allowed to violate the margin. The slack variable is simply added to the linear constraints. https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-machine-learning.JPG Subject to the above constraints, the new objective to be minimized becomes: https://www.simplilearn.com/ice9/free_resources_article_thumb/soft-margin-calculation-formula.JPG You have two conflicting objectives now—minimizing slack variable to reduce margin violations and minimizing to increase the margin. The hyperparameter C allows us to define this trade-off. Large values of C correspond to larger error penalties (so smaller margins), whereas smaller values of C allow for higher misclassification errors and larger margins. https://www.simplilearn.com/ice9/free_resources_article_thumb/machine-learning-certification-video-preview.jpg SVM: Regularization The concept of C is the reverse of regularization. Higher C means lower regularization, which increases bias and lowers the variance (causing overfitting). https://www.simplilearn.com/ice9/free_resources_article_thumb/concept-of-c-graph-machine-learning.JPG IRIS Data Set The Iris dataset contains measurements of 150 IRIS flowers from three different species: Setosa Versicolor Viriginica Each row represents one sample. Flower measurements in centimeters are stored as columns. These are called features. IRIS Data Set: SVM Let’s train an SVM model using sci-kit-learn for the Iris dataset: https://www.simplilearn.com/ice9/free_resources_article_thumb/svm-model-graph-machine-learning.JPG Nonlinear SVM Classification There are two ways to solve nonlinear SVMs: by adding polynomial features by adding similarity features Polynomial features can be added to datasets; in some cases, this can create a linearly separable dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/nonlinear-classification-svm-machine-learning.JPG In the figure on the left, there is only 1 feature x1. This dataset is not linearly separable. If you add x2 = (x1)2 (figure on the right), the data becomes linearly separable. Polynomial Kernel In sci-kit-learn, one can use a Pipeline class for creating polynomial features. Classification results for the Moons dataset are shown in the figure. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-machine-learning.JPG Polynomial Kernel with Kernel Trick Let us look at the image below and understand Kernel Trick in detail. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-with-kernel-trick.JPG For large dimensional datasets, adding too many polynomial features can slow down the model. You can apply a kernel trick with the effect of polynomial features without actually adding them. The code is shown (SVC class) below trains an SVM classifier using a 3rd-degree polynomial kernel but with a kernel trick. https://www.simplilearn.com/ice9/free_resources_article_thumb/polynomial-kernel-equation-machine-learning.JPG The hyperparameter coefθ controls the influence of high-degree polynomials. Kernel SVM Let us understand in detail about Kernel SVM. Kernel SVMs are used for classification of nonlinear data. In the chart, nonlinear data is projected into a higher dimensional space via a mapping function where it becomes linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-machine-learning.JPG In the higher dimension, a linear separating hyperplane can be derived and used for classification. A reverse projection of the higher dimension back to original feature space takes it back to nonlinear shape. As mentioned previously, SVMs can be kernelized to solve nonlinear classification problems. You can create a sample dataset for XOR gate (nonlinear problem) from NumPy. 100 samples will be assigned the class sample 1, and 100 samples will be assigned the class label -1. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-graph-machine-learning.JPG As you can see, this data is not linearly separable. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-non-separable.JPG You now use the kernel trick to classify XOR dataset created earlier. https://www.simplilearn.com/ice9/free_resources_article_thumb/kernel-svm-xor-machine-learning.JPG Naïve Bayes Classifier What is Naive Bayes Classifier? Have you ever wondered how your mail provider implements spam filtering or how online news channels perform news text classification or even how companies perform sentiment analysis of their audience on social media? All of this and more are done through a machine learning algorithm called Naive Bayes Classifier. Naive Bayes Named after Thomas Bayes from the 1700s who first coined this in the Western literature. Naive Bayes classifier works on the principle of conditional probability as given by the Bayes theorem. Advantages of Naive Bayes Classifier Listed below are six benefits of Naive Bayes Classifier. Very simple and easy to implement Needs less training data Handles both continuous and discrete data Highly scalable with the number of predictors and data points As it is fast, it can be used in real-time predictions Not sensitive to irrelevant features Bayes Theorem We will understand Bayes Theorem in detail from the points mentioned below. According to the Bayes model, the conditional probability P(Y|X) can be calculated as: P(Y|X) = P(X|Y)P(Y) / P(X) This means you have to estimate a very large number of P(X|Y) probabilities for a relatively small vector space X. For example, for a Boolean Y and 30 possible Boolean attributes in the X vector, you will have to estimate 3 billion probabilities P(X|Y). To make it practical, a Naïve Bayes classifier is used, which assumes conditional independence of P(X) to each other, with a given value of Y. This reduces the number of probability estimates to 2*30=60 in the above example. Naïve Bayes Classifier for SMS Spam Detection Consider a labeled SMS database having 5574 messages. It has messages as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-machine-learning.JPG Each message is marked as spam or ham in the data set. Let’s train a model with Naïve Bayes algorithm to detect spam from ham. The message lengths and their frequency (in the training dataset) are as shown below: https://www.simplilearn.com/ice9/free_resources_article_thumb/naive-bayes-spam-spam-detection.JPG Analyze the logic you use to train an algorithm to detect spam: Split each message into individual words/tokens (bag of words). Lemmatize the data (each word takes its base form, like “walking” or “walked” is replaced with “walk”). Convert data to vectors using scikit-learn module CountVectorizer. Run TFIDF to remove common words like “is,” “are,” “and.” Now apply scikit-learn module for Naïve Bayes MultinomialNB to get the Spam Detector. This spam detector can then be used to classify a random new message as spam or ham. Next, the accuracy of the spam detector is checked using the Confusion Matrix. For the SMS spam example above, the confusion matrix is shown on the right. Accuracy Rate = Correct / Total = (4827 + 592)/5574 = 97.21% Error Rate = Wrong / Total = (155 + 0)/5574 = 2.78% https://www.simplilearn.com/ice9/free_resources_article_thumb/confusion-matrix-machine-learning.JPG Although confusion Matrix is useful, some more precise metrics are provided by Precision and Recall. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-recall-matrix-machine-learning.JPG Precision refers to the accuracy of positive predictions. https://www.simplilearn.com/ice9/free_resources_article_thumb/precision-formula-machine-learning.JPG Recall refers to the ratio of positive instances that are correctly detected by the classifier (also known as True positive rate or TPR). https://www.simplilearn.com/ice9/free_resources_article_thumb/recall-formula-machine-learning.JPG Precision/Recall Trade-off To detect age-appropriate videos for kids, you need high precision (low recall) to ensure that only safe videos make the cut (even though a few safe videos may be left out). The high recall is needed (low precision is acceptable) in-store surveillance to catch shoplifters; a few false alarms are acceptable, but all shoplifters must be caught. Learn about Naive Bayes in detail. Click here! Decision Tree Classifier Some aspects of the Decision Tree Classifier mentioned below are. Decision Trees (DT) can be used both for classification and regression. The advantage of decision trees is that they require very little data preparation. They do not require feature scaling or centering at all. They are also the fundamental components of Random Forests, one of the most powerful ML algorithms. Unlike Random Forests and Neural Networks (which do black-box modeling), Decision Trees are white box models, which means that inner workings of these models are clearly understood. In the case of classification, the data is segregated based on a series of questions. Any new data point is assigned to the selected leaf node. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-machine-learning.JPG Start at the tree root and split the data on the feature using the decision algorithm, resulting in the largest information gain (IG). This splitting procedure is then repeated in an iterative process at each child node until the leaves are pure. This means that the samples at each node belonging to the same class. In practice, you can set a limit on the depth of the tree to prevent overfitting. The purity is compromised here as the final leaves may still have some impurity. The figure shows the classification of the Iris dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-classifier-graph.JPG IRIS Decision Tree Let’s build a Decision Tree using scikit-learn for the Iris flower dataset and also visualize it using export_graphviz API. https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-machine-learning.JPG The output of export_graphviz can be converted into png format: https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-output.JPG Sample attribute stands for the number of training instances the node applies to. Value attribute stands for the number of training instances of each class the node applies to. Gini impurity measures the node’s impurity. A node is “pure” (gini=0) if all training instances it applies to belong to the same class. https://www.simplilearn.com/ice9/free_resources_article_thumb/impurity-formula-machine-learning.JPG For example, for Versicolor (green color node), the Gini is 1-(0/54)2 -(49/54)2 -(5/54) 2 ≈ 0.168 https://www.simplilearn.com/ice9/free_resources_article_thumb/iris-decision-tree-sample.JPG Decision Boundaries Let us learn to create decision boundaries below. For the first node (depth 0), the solid line splits the data (Iris-Setosa on left). Gini is 0 for Setosa node, so no further split is possible. The second node (depth 1) splits the data into Versicolor and Virginica. If max_depth were set as 3, a third split would happen (vertical dotted line). https://www.simplilearn.com/ice9/free_resources_article_thumb/decision-tree-boundaries.JPG For a sample with petal length 5 cm and petal width 1.5 cm, the tree traverses to depth 2 left node, so the probability predictions for this sample are 0% for Iris-Setosa (0/54), 90.7% for Iris-Versicolor (49/54), and 9.3% for Iris-Virginica (5/54) CART Training Algorithm Scikit-learn uses Classification and Regression Trees (CART) algorithm to train Decision Trees. CART algorithm: Split the data into two subsets using a single feature k and threshold tk (example, petal length < “2.45 cm”). This is done recursively for each node. k and tk are chosen such that they produce the purest subsets (weighted by their size). The objective is to minimize the cost function as given below: https://www.simplilearn.com/ice9/free_resources_article_thumb/cart-training-algorithm-machine-learning.JPG The algorithm stops executing if one of the following situations occurs: max_depth is reached No further splits are found for each node Other hyperparameters may be used to stop the tree: min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes Gini Impurity or Entropy Entropy is one more measure of impurity and can be used in place of Gini. https://www.simplilearn.com/ice9/free_resources_article_thumb/gini-impurity-entrophy.JPG It is a degree of uncertainty, and Information Gain is the reduction that occurs in entropy as one traverses down the tree. Entropy is zero for a DT node when the node contains instances of only one class. Entropy for depth 2 left node in the example given above is: https://www.simplilearn.com/ice9/free_resources_article_thumb/entrophy-for-depth-2.JPG Gini and Entropy both lead to similar trees. DT: Regularization The following figure shows two decision trees on the moons dataset. https://www.simplilearn.com/ice9/free_resources_article_thumb/dt-regularization-machine-learning.JPG The decision tree on the right is restricted by min_samples_leaf = 4. The model on the left is overfitting, while the model on the right generalizes better. Random Forest Classifier Let us have an understanding of Random Forest Classifier below. A random forest can be considered an ensemble of decision trees (Ensemble learning). Random Forest algorithm: Draw a random bootstrap sample of size n (randomly choose n samples from the training set). Grow a decision tree from the bootstrap sample. At each node, randomly select d features. Split the node using the feature that provides the best split according to the objective function, for instance by maximizing the information gain. Repeat the steps 1 to 2 k times. (k is the number of trees you want to create, using a subset of samples) Aggregate the prediction by each tree for a new data point to assign the class label by majority vote (pick the group selected by the most number of trees and assign new data point to that group). Random Forests are opaque, which means it is difficult to visualize their inner workings. https://www.simplilearn.com/ice9/free_resources_article_thumb/random-forest-classifier-graph.JPG However, the advantages outweigh their limitations since you do not have to worry about hyperparameters except k, which stands for the number of decision trees to be created from a subset of samples. RF is quite robust to noise from the individual decision trees. Hence, you need not prune individual decision trees. The larger the number of decision trees, the more accurate the Random Forest prediction is. (This, however, comes with higher computation cost). Key Takeaways Let us quickly run through what we have learned so far in this Classification tutorial. Classification algorithms are supervised learning methods to split data into classes. They can work on Linear Data as well as Nonlinear Data. Logistic Regression can classify data based on weighted parameters and sigmoid conversion to calculate the probability of classes. K-nearest Neighbors (KNN) algorithm uses similar features to classify data. Support Vector Machines (SVMs) classify data by detecting the maximum margin hyperplane between data classes. Naïve Bayes, a simplified Bayes Model, can help classify data using conditional probability models. Decision Trees are powerful classifiers and use tree splitting logic until pure or somewhat pure leaf node classes are attained. Random Forests apply Ensemble Learning to Decision Trees for more accurate classification predictions. Conclusion This completes ‘Classification’ tutorial. In the next tutorial, we will learn 'Unsupervised Learning with Clustering.'

288

Python

Updated 3 days ago

Data-Analytics-Projects-in-python

shsarv

💛71

A collection of data analysis and visualization projects designed to uncover insights from diverse datasets. These projects include analyses on COVID-19 trends, stock trading patterns, housing market prices, IoT data, and more, showcasing the power of data-driven storytelling.

206

MIT

Jupyter Notebook

Updated 2 hours ago

data-analysisdata-analysis-projectdata-analysis-python+4

Data-Science-For-Beginners-from-scratch-course

SENATOROVAI

🧡56

Data science for beginners involves learning to extract insights from data using statistics, programming (Python/R), and visualization. Key steps include data collection, cleaning, analysis, modeling, and communicating findings. Beginners should start with Python, basic math (linear algebra/calculus), and build projects to create a portfolio.

152

122

MIT

Python

Updated 45 minutes ago

aialgebracalculus+17

Data-Analytics-Projects

Shorya22

🧡65

Explore a collection of end-to-end data analytics projects showcasing SQL, Python, and Power BI. Gain valuable insights and solutions to real-world problems through data extraction, analysis, and visualization. Ideal for beginners and professionals looking to enhance their skills in data analytics.

125

HTML

Updated 5 days ago

datadata-sciencedataanalysis+4

Stock_Market_Data_Analysis

jcwill415

🧡55

Scrape, analyze & visualize stock market data for the S&P500 using Python. Build a basic trading strategy using machine learning to assess company performance and determine buy, sell, hold. Read me & instructions available in Spanish. This is a working repo, with plans to expand the project from technical analysis to fundamental analysis.

Jupyter Notebook

Updated 1 week ago

beautifulsoup4data-manipulationdata-visualization+11

Python-Basic-programs

sanusanth

🧡55

What is Python? Executive Summary Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. Often, programmers fall in love with Python because of the increased productivity it provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is easy: a bug or bad input will never cause a segmentation fault. Instead, when the interpreter discovers an error, it raises an exception. When the program doesn't catch the exception, the interpreter prints a stack trace. A source level debugger allows inspection of local and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a line at a time, and so on. The debugger is written in Python itself, testifying to Python's introspective power. On the other hand, often the quickest way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle makes this simple approach very effective. What is Python? Python is a popular programming language. It was created by Guido van Rossum, and released in 1991. It is used for: web development (server-side), software development, mathematics, system scripting. What can Python do? Python can be used on a server to create web applications. Python can be used alongside software to create workflows. Python can connect to database systems. It can also read and modify files. Python can be used to handle big data and perform complex mathematics. Python can be used for rapid prototyping, or for production-ready software development. Why Python? Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc). Python has a simple syntax similar to the English language. Python has syntax that allows developers to write programs with fewer lines than some other programming languages. Python runs on an interpreter system, meaning that code can be executed as soon as it is written. This means that prototyping can be very quick. Python can be treated in a procedural way, an object-oriented way or a functional way. Good to know The most recent major version of Python is Python 3, which we shall be using in this tutorial. However, Python 2, although not being updated with anything other than security updates, is still quite popular. In this tutorial Python will be written in a text editor. It is possible to write Python in an Integrated Development Environment, such as Thonny, Pycharm, Netbeans or Eclipse which are particularly useful when managing larger collections of Python files. Python Syntax compared to other programming languages Python was designed for readability, and has some similarities to the English language with influence from mathematics. Python uses new lines to complete a command, as opposed to other programming languages which often use semicolons or parentheses. Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions and classes. Other programming languages often use curly-brackets for this purpose. Applications for Python Python is used in many application domains. Here's a sampling. The Python Package Index lists thousands of third party modules for Python. Web and Internet Development Python offers many choices for web development: Frameworks such as Django and Pyramid. Micro-frameworks such as Flask and Bottle. Advanced content management systems such as Plone and django CMS. Python's standard library supports many Internet protocols: HTML and XML JSON E-mail processing. Support for FTP, IMAP, and other Internet protocols. Easy-to-use socket interface. And the Package Index has yet more libraries: Requests, a powerful HTTP client library. Beautiful Soup, an HTML parser that can handle all sorts of oddball HTML. Feedparser for parsing RSS/Atom feeds. Paramiko, implementing the SSH2 protocol. Twisted Python, a framework for asynchronous network programming. Scientific and Numeric Python is widely used in scientific and numeric computing: SciPy is a collection of packages for mathematics, science, and engineering. Pandas is a data analysis and modeling library. IPython is a powerful interactive shell that features easy editing and recording of a work session, and supports visualizations and parallel computing. The Software Carpentry Course teaches basic skills for scientific computing, running bootcamps and providing open-access teaching materials. Education Python is a superb language for teaching programming, both at the introductory level and in more advanced courses. Books such as How to Think Like a Computer Scientist, Python Programming: An Introduction to Computer Science, and Practical Programming. The Education Special Interest Group is a good place to discuss teaching issues. Desktop GUIs The Tk GUI library is included with most binary distributions of Python. Some toolkits that are usable on several platforms are available separately: wxWidgets Kivy, for writing multitouch applications. Qt via pyqt or pyside Platform-specific toolkits are also available: GTK+ Microsoft Foundation Classes through the win32 extensions Software Development Python is often used as a support language for software developers, for build control and management, testing, and in many other ways. SCons for build control. Buildbot and Apache Gump for automated continuous compilation and testing. Roundup or Trac for bug tracking and project management. Business Applications Python is also used to build ERP and e-commerce systems: Odoo is an all-in-one management software that offers a range of business applications that form a complete suite of enterprise management applications. Try ton is a three-tier high-level general purpose application platform.

Python

Updated 4 weeks ago

spotify-artists-analysis

khanhnamle1994

🧡60

An exploratory data analysis and data visualization project using data from Spotify Web API

MIT

Updated 3 weeks ago

data-visualizationggplot2music+5

WazeCCPProcessor

LouisvilleMetro

❤️25

Waze WARP takes your CCP data feed and processes it into your cloud provider for access, analysis, and visualization. An Open Government Coalition (OGC) project. @govintheopen

MIT

TypeScript

Updated 2 years ago

amazon-awsawscloudformation+6

WSJ_WebScraping_NLP

philippe-heitzmann

🧡65

Python data Manipulation, visualizations and Natural Language Processing analysis for Wall Street Journal web scraping project #2 for NYC Data Science Academy bootcamp

Jupyter Notebook

Updated 3 days ago

Analyzing-Behavioral-Patterns-in-ADHD-Patients

Okes2024

❤️45

This project generates synthetic ADHD and control patient data, applying machine learning and statistical analysis to uncover behavioral patterns, visualize differences, and evaluate predictive models for cognitive and activity-related traits.

Python

Updated 1 month ago

Sentiment-Shift-in-Therapy-Conversations-Over-Time

Okes2024

❤️35

This project analyzes emotional dynamics in therapy by modeling sentiment shifts across sessions using synthetic conversations. It detects trends, slopes, and change-points, providing insights into patient–therapist interactions and potential therapeutic progress over time through data-driven analysis and visualizations.

Python

Updated 3 months ago

Spotify-Data-Analysis-using-Python

mrankitgupta

🧡50

An exploratory data analysis (EDA) and data visualization project using data from Spotify using Python.

MIT

Jupyter Notebook

Updated 1 month ago

LearnBasicBigDataTech

weltond

❤️45

:rocket:Some projects on Big Data Analysis like Spark, Hive, Presto and Data Visualization like Superset

Java

Updated 2 months ago

Data-Analysis-And-Visualization-Projects

Sid-TheAnalyst

🧡55

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Updated 1 week ago

excelmachine-learningnumpy+7

world-cup-2018

khanhnamle1994

❤️40

An exploratory data analysis and data visualization project for World Cup 2018

MIT

Jupyter Notebook

Updated 4 months ago

data-analysisdata-visualization

PokemonDataScience

Daalma7

🧡55

Data Science project involving all Pokémon data: from data gathering (web scraping) to machine learning techniques, including data cleaning, analysis, and visualization.

MIT

Jupyter Notebook

Updated 2 weeks ago

Market-Basket-Analysis

AmirhosseinHonardoust

🧡60

Python project for Market Basket Analysis. Generates synthetic retail transactions, mines frequent itemsets using Apriori & FP-Growth, derives association rules, and outputs CSVs + visualizations. Portfolio-ready example demonstrating data science methods for uncovering product co-purchase patterns.

MIT

Python

Updated 3 weeks ago

aprioriassociation-rulescustomer-behavior+9

Openrank-OpenGov-Hub

Ziccc1

❤️45

OpenGov Hub: An all-in-one open source governance dashboard. Built on OpenDigger ecosystem, it provides real-time project visualization, multi-dimensional analysis, AI risk alerts (18→24h), and natural language interaction. Enables data-driven governance for China's open source ecosystem.

MIT

TypeScript

Updated 2 months ago

SQL-PROJECTS

swaapnaa

🧡55

These projects demonstrate my proficiency in SQL and my capacity to analyze complex data. They exhibit my abilities in data investigation, visualization, and analysis.

Updated 1 week ago

data-cleaningdata-queryingdata-schema+3

phitter-kernel

phitter-hub

🧡55

Phitter is a python library for accurately fitting statistical distributions to datasets, offering intuitive usage, comprehensive visualization, and support for multiple distributions to enhance data analysis projects.

MIT

Jupyter Notebook

Updated 1 week ago

operations-researchprobability-distributionprobability-models+1

Data-Analysis-and-Analytics

Hazrat-Ali9

🧡60

👹Data Analysis and 🤖 Analytics combining 🤡 technical skills with 🦊 strategic thinking Ideal 🐳 analysts data scientists 🐲 and business intelligence 🦖 enthusiasts ready to 🐸 turn data into action 🐡 data cleaning exploration 🦜 visualization modeling 🏵 and reporting real world 🌳 case studies business scenarios 🍇 and insight-driven project

MIT

Updated 2 weeks ago

datadata-analysisdata-visualization+4

IBM-Data-Science-Specialization-Coursera

Ashleshk

❤️40

Data Science has been ranked as one of the hottest professions and the demand for data practitioners is booming. This Professional Certificate from IBM is intended for anyone interested in developing skills and experience to pursue a career in Data Science or Machine Learning. This program consists of 9 courses providing you with latest job-ready skills and techniques covering a wide array of data science topics including: open source tools and libraries, methodologies, Python, databases, SQL, data visualization, data analysis, and machine learning. You will practice hands-on in the IBM Cloud using real data science tools and real-world data sets. It is a myth that to become a data scientist you need a Ph.D. This Professional Certificate is suitable for anyone who has some computer skills and a passion for self-learning. No prior computer science or programming knowledge is necessary. We start small, re-enforce applied learning, and build up to more complex topics. Upon successfully completing these courses you will have done several hands-on assignments and built a portfolio of data science projects to provide you with the confidence to plunge into an exciting profession in Data Science. In addition to earning a Professional Certificate from Coursera, you will also receive a digital Badge from IBM recognizing your proficiency in Data Science.

MIT

Jupyter Notebook

Updated 3 months ago

TidyTuesday

fgazzelloni

❤️35

Explore fascinating TidyTuesday projects in my portfolio, showcasing data visualization and analysis skills.

Updated 6 months ago

data-sciencedatavisdatavisualisation+4

EDA---Projects

VenkyAdi

❤️45

Exploratory Data Analysis (EDA) Projects A collection of EDA projects exploring various datasets to uncover patterns, gain insights, and visualize trends across different industries. Projects include analyses of Amazon Prime content, banking fraud detection, logistics performance, hotel booking trends, and more.

Jupyter Notebook

Updated 2 months ago

amazonfedexmmt+1

Data-Analysis-Projects

aravindbc

🧡65

This repository contains a collection of data analysis and visualization projects that I have worked on using tools like Excel, SQL, Python, and Power BI.

Jupyter Notebook

Updated 3 days ago

excelpowerbipython+1

UCSD_Data_Science_and_Engineering

OrysyaStus

❤️45

In the Data Science and Engineering program, engineering professionals combine the skills of software programmer, database manager, and statistician to create mathematical models of the data, identify trends/deviations, then present them in effective visual ways that can be understood by others. Data scientists unlock new sources of economic value, provide fresh insights into science, and inform decision makers by analyzing large, diverse, complex, longitudinal, and distributed data sets generated from instruments, sensors, internet transactions, email, video, and other digital sources. Students entering the MAS program for a degree in Data Science and Engineering will undertake courses in programming, analysis, and applications management and visualization. This program requires three foundational courses, four core courses, and two electives totaling thirty-four units, plus a capstone team project course of four units, for a total of thirty-eight units.

Jupyter Notebook

Updated 2 months ago

FIFA-2019-Analysis

sharmaroshan

❤️40

This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations

GPL-3.0

Jupyter Notebook

Updated 1 year ago

beginnerdata-cleaningdata-manipulation+8

Carbon-Emission-Forecasting-using-LSTM-models.

Geraldine-Winston

❤️35

Harnessing LSTM neural networks, this project forecasts carbon emissions from historical data. Through streamlined preprocessing, dynamic modeling, and vivid visualizations, it transforms raw data into actionable, exportable insights for impactful environmental analysis and decision-making.

Updated 4 months ago

GitHub Explorer

Search Results

SQL-Data-Analysis-and-Visualization-Projects

Stock-Market-Analysis-and-Prediction

all-classification-templetes-for-ML

Data-Analytics-Projects-in-python

Data-Science-For-Beginners-from-scratch-course

Data-Analytics-Projects

Stock_Market_Data_Analysis

Python-Basic-programs

spotify-artists-analysis

WazeCCPProcessor

WSJ_WebScraping_NLP

Analyzing-Behavioral-Patterns-in-ADHD-Patients

Sentiment-Shift-in-Therapy-Conversations-Over-Time

Spotify-Data-Analysis-using-Python

LearnBasicBigDataTech

Data-Analysis-And-Visualization-Projects

world-cup-2018

PokemonDataScience

Market-Basket-Analysis

Openrank-OpenGov-Hub

SQL-PROJECTS

phitter-kernel

Data-Analysis-and-Analytics

IBM-Data-Science-Specialization-Coursera

TidyTuesday

EDA---Projects

Data-Analysis-Projects

UCSD_Data_Science_and_Engineering

FIFA-2019-Analysis

Carbon-Emission-Forecasting-using-LSTM-models.

SQL-Data-Analysis-and-Visualization-Projects

Stock-Market-Analysis-and-Prediction

all-classification-templetes-for-ML

Data-Analytics-Projects-in-python

Data-Science-For-Beginners-from-scratch-course

Data-Analytics-Projects

Stock_Market_Data_Analysis

Python-Basic-programs

spotify-artists-analysis

WazeCCPProcessor

WSJ_WebScraping_NLP

Analyzing-Behavioral-Patterns-in-ADHD-Patients

Sentiment-Shift-in-Therapy-Conversations-Over-Time

Spotify-Data-Analysis-using-Python

LearnBasicBigDataTech

Data-Analysis-And-Visualization-Projects

world-cup-2018

PokemonDataScience

Market-Basket-Analysis

Openrank-OpenGov-Hub

SQL-PROJECTS

phitter-kernel

Data-Analysis-and-Analytics

IBM-Data-Science-Specialization-Coursera

TidyTuesday

EDA---Projects

Data-Analysis-Projects

UCSD_Data_Science_and_Engineering

FIFA-2019-Analysis

Carbon-Emission-Forecasting-using-LSTM-models.