Found 4,775 repositories(showing 30)
PriyankaJhaTheDeveloper
This repository contains my Data Analytics portfolio projects ranging from SQL, Python, Tableau, Excel, and Hadoop (HiveQL).
Aryia-Behroziuan
An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[68] Decision trees Main article: Decision tree learning Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making. Support vector machines Main article: Support vector machines Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.[69] An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Illustration of linear regression on a data set. Regression analysis Main article: Regression analysis Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel[70]), logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher-dimensional space. Bayesian networks Main article: Bayesian network A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet. A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Genetic algorithms Main article: Genetic algorithm A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.[71][72] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[73] Training models Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model. Federated learning Main article: Federated learning Federated learning is an adapted form of distributed artificial intelligence to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.[74] Applications There are many applications for machine learning, including: Agriculture Anatomy Adaptive websites Affective computing Banking Bioinformatics Brain–machine interfaces Cheminformatics Citizen science Computer networks Computer vision Credit-card fraud detection Data quality DNA sequence classification Economics Financial market analysis[75] General game playing Handwriting recognition Information retrieval Insurance Internet fraud detection Linguistics Machine learning control Machine perception Machine translation Marketing Medical diagnosis Natural language processing Natural language understanding Online advertising Optimization Recommender systems Robot locomotion Search engines Sentiment analysis Sequence mining Software engineering Speech recognition Structural health monitoring Syntactic pattern recognition Telecommunication Theorem proving Time series forecasting User behavior analytics In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy of its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.[76] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.[77] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis.[78] In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[79] In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings and that it may have revealed previously unrecognized influences among artists.[80] In 2019 Springer Nature published the first research book created using machine learning.[81] Limitations Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results.[82][83][84] Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[85] In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a collision.[86] Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of dollars invested.[87][88] Bias Main article: Algorithmic bias Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.[89] Language models learned from data have been shown to contain human-like biases.[90][91] Machine learning systems used for criminal risk assessment have been found to be biased against black people.[92][93] In 2015, Google photos would often tag black people as gorillas,[94] and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all.[95] Similar issues with recognizing non-white people have been found in many other systems.[96] In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language.[97] Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.[98] Concern for fairness in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including Fei-Fei Li, who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”[99] Model assessments Classification of machine learning models can be validated by accuracy estimation techniques like the holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[100] In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the false positive rate (FPR) as well as the false negative rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The total operating characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used receiver operating characteristic (ROC) and ROC's associated area under the curve (AUC).[101] Ethics Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[102] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[103][104] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning. Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.[105][106] Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.[107] Hardware Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training deep neural networks (a particular narrow subdomain of machine learning) that contain many layers of non-linear hidden units.[108] By 2019, graphic processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.[109] OpenAI estimated the hardware compute used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.[110][111] Software Software suites containing a variety of machine learning algorithms include the following: Free and open-source so
Pandey-Adarsh
HR Data Analytics portfolio project on a dataset of 80K records that deals with HR KPIs like Performance tracking, attrition rate. Project involved cleaning, transforming data and visualizing it to create a dashboard. I have used the MS Excel and Power BI's Query Editor for data cleaning and preprocessing and dashboard.
SAP-samples
This project contains Node.js scripts and Microsoft Excel templates for creating realistic sample fact data in SAP Analytics Cloud or other data sources.
A data analytics project using Python, Excel, and machine learning to forecast retail demand and optimize inventory levels. Includes scalable ETL pipelines, advanced forecasting models, and interactive dashboards, with weekly updates to showcase progress and commitment.
ritesh-29
Data Analytics Projects - analyzed with - (Excel)
Vishal2546
This repository contains my Data Analytics portfolio projects ranging from SQL, Python, Tableau, Excel, and Hadoop (HiveQL).
mesbahiba
Gain the job-ready skills for an entry-level data analyst role through this eight-course Professional Certificate from IBM and position yourself competitively in the thriving job market for data analysts, which will see a 20% growth until 2028 (U.S. Bureau of Labor Statistics). Power your data analyst career by learning the core principles of data analysis and gaining hands-on skills practice. You’ll work with a variety of data sources, project scenarios, and data analysis tools, including Excel, SQL, Python, Jupyter Notebooks, and Cognos Analytics, gaining practical experience with data manipulation and applying analytical techniques.
This repository contains the artifacts and deliverables from my internship project at Accenture North America's Data Analytics and Visualization Job Simulation on Forage. It includes the Excel dashboard, cleaned datasets, PowerPoint presentations, and other relevant documents.
pnguenda
# Pandas Homework - Pandas, Pandas, Pandas ## Background The data dive continues! Now, it's time to take what you've learned about Python Pandas and apply it to new situations. For this assignment, you'll need to complete **one of two** (not both) Data Challenges. Once again, which challenge you take on is your choice. Just be sure to give it your all -- as the skills you hone will become powerful tools in your data analytics tool belt. ### Before You Begin 1. Create a new repository for this project called `pandas-challenge`. **Do not add this homework to an existing repository**. 2. Clone the new repository to your computer. 3. Inside your local git repository, create a directory for the Pandas Challenge you choose. Use folder names corresponding to the challenges: **HeroesOfPymoli** or **PyCitySchools**. 4. Add your Jupyter notebook to this folder. This will be the main script to run for analysis. 5. Push the above changes to GitHub or GitLab. ## Option 1: Heroes of Pymoli  Congratulations! After a lot of hard work in the data munging mines, you've landed a job as Lead Analyst for an independent gaming company. You've been assigned the task of analyzing the data for their most recent fantasy game Heroes of Pymoli. Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights. Your final report should include each of the following: ### Player Count * Total Number of Players ### Purchasing Analysis (Total) * Number of Unique Items * Average Purchase Price * Total Number of Purchases * Total Revenue ### Gender Demographics * Percentage and Count of Male Players * Percentage and Count of Female Players * Percentage and Count of Other / Non-Disclosed ### Purchasing Analysis (Gender) * The below each broken by gender * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Gender ### Age Demographics * The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.) * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Age Group ### Top Spenders * Identify the the top 5 spenders in the game by total purchase value, then list (in a table): * SN * Purchase Count * Average Purchase Price * Total Purchase Value ### Most Popular Items * Identify the 5 most popular items by purchase count, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value ### Most Profitable Items * Identify the 5 most profitable items by total purchase value, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value As final considerations: * You must use the Pandas Library and the Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of three observable trends based on the data. * See [Example Solution](HeroesOfPymoli/HeroesOfPymoli_starter.ipynb) for a reference on expected format. ## Option 2: PyCitySchools  Well done! Having spent years analyzing financial records for big banks, you've finally scratched your idealistic itch and joined the education sector. In your latest role, you've become the Chief Data Scientist for your city's school district. In this capacity, you'll be helping the school board and mayor make strategic decisions regarding future school budgets and priorities. As a first task, you've been asked to analyze the district-wide standardized test results. You'll be given access to every student's math and reading scores, as well as various information on the schools they attend. Your responsibility is to aggregate the data to and showcase obvious trends in school performance. Your final report should include each of the following: ### District Summary * Create a high level snapshot (in table form) of the district's key metrics, including: * Total Schools * Total Students * Total Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### School Summary * Create an overview table that summarizes key metrics about each school, including: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Top Performing Schools (By % Overall Passing) * Create a table that highlights the top 5 performing schools based on % Overall Passing. Include: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Bottom Performing Schools (By % Overall Passing) * Create a table that highlights the bottom 5 performing schools based on % Overall Passing. Include all of the same metrics as above. ### Math Scores by Grade\*\* * Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Reading Scores by Grade * Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Scores by School Spending * Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following: * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Scores by School Size * Repeat the above breakdown, but this time group schools based on a reasonable approximation of school size (Small, Medium, Large). ### Scores by School Type * Repeat the above breakdown, but this time group schools based on school type (Charter vs. District). As final considerations: * Use the pandas library and Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of at least two observable trends based on the data. * See [Example Solution](PyCitySchools/PyCitySchools_starter.ipynb) for a reference on the expected format. ## Hints and Considerations * These are challenging activities for a number of reasons. For one, these activities will require you to analyze thousands of records. Hacking through the data to look for obvious trends in Excel is just not a feasible option. The size of the data may seem daunting, but pandas will allow you to efficiently parse through it. * Second, these activities will also challenge you by requiring you to learn on your feet. Don't fool yourself into thinking: "I need to study pandas more closely before diving in." Get the basic gist of the library and then _immediately_ get to work. When facing a daunting task, it's easy to think: "I'm just not ready to tackle it yet." But that's the surest way to never succeed. Learning to program requires one to constantly tinker, experiment, and learn on the fly. You are doing exactly the _right_ thing, if you find yourself constantly practicing Google-Fu and diving into documentation. There is just no way (or reason) to try and memorize it all. Online references are available for you to use when you need them. So use them! * Take each of these tasks one at a time. Begin your work, answering the basic questions: "How do I import the data?" "How do I convert the data into a DataFrame?" "How do I build the first table?" Don't get intimidated by the number of asks. Many of them are repetitive in nature with just a few tweaks. Be persistent and creative! * Expect these exercises to take time! Don't get discouraged if you find yourself spending hours initially with little progress. Force yourself to deal with the discomfort of not knowing and forge ahead. Consider these hours an investment in your future! * As always, feel encouraged to work in groups and get help from your TAs and Instructor. Just remember, true success comes from mastery and _not_ a completed homework assignment. So challenge yourself to truly succeed! ### Copyright Trilogy Education Services © 2019. All Rights Reserved.
Gokul-Raja84
A collection of Data Analytics and Interactive Dashboards Projects in Excel using Real-world data across various domains
Explore the power of data analysis with this Sales Dashboard project! Gain insights into sales trends, customer behavior, and profitability using Excel. Follow along to dive deep into the world of data analytics!
RajdeepChoudhury
Conducted a comprehensive data analytics project to assess workforce experience distribution, organizational roles, and safety factor criticality within work zone construction projects. Cleaned and structured multiple datasets in Excel, covering respondents’ years of experience in construction and work zone projects.
Rainier-Listanco
First Excel Project tutorized by Luke Barousse
M0hannad
This repository contains my projects for Udacity's Business Analytics Nanodegree Program, Data projects where I analyze data and build models with Excel, query databases using SQL, and create data visualizations with Tableau.
summerchu24
This repository contains business-related data analytics projects using Excel, Python, Tableau, and StatPlus. The projects cover tasks like survey data analysis, brochure demand forecasting, and movie data visualization, showcasing data processing, analysis, and visualization techniques.
elwachira
A collection of beginner-friendly data analysis projects using Python, SQL, Excel, and data visualization tools. Built to demonstrate my skills in data cleaning, exploration, and storytelling as I grow in my data analytics journey.
Benisonjac
A Python Flask-based API that enables uploading, processing, and storing Excel or CSV data into PostgreSQL or MySQL databases. This project features smart data sanitization, automatic table creation, duplicate prevention, monthly data aggregation, and includes a ready-to-use REST API for robust data management and analytics.
rajeevtiwari8055
📊 My first Excel MIS Dashboard project analyzing Hospital ER performance. Built using Excel tools like Pivot Tables, Sparklines, Conditional Formatting, and Charts to track patient count, wait times, referrals, etc. A hands-on project during my data analytics journey.
rajeevtiwari8055
I built a detailed interactive dashboard and a new Excel project on a real-world E-commerce dataset to uncover key business insights, highlight hidden issues, and drive smart, data-based decisions. This project combines practical analytics with clear visuals to turn raw data into strategies.
Luckyyadav02
🚀 Thrilled to share my latest project: An Advanced Sales Analytics Dashboard! Built entirely in Microsoft Excel with VBA Macros, this dashboard analyzes sales data from multiple angles: 📊 Gender trends 🗺️ Top 5 performing states 📈 Channel-wise sales 📦 Order status breakdown 🧑🤝🧑 Age & demographic analysis
Junnielexia
This Excel repository contains various Markdown files where I meticulously document different data analysis projects accomplished using Excel. 📊💼
OlgaStratan
Data Analytics Projects with Excel, SQL and Tableau
Contains notes, projects, and Excel files from the Coursera course "Business Analytics with Excel" covering data analysis, pivot tables, dashboards, regression, and forecasting using Excel
ianjiteshan
Showcase of data analytics projects using SQL, Python, Excel, Power BI and Tableau, with interactive dashboards and business insights.
Ridwan-data
A data analytics project exploring cancer patient survival and diagnosis trends using Excel for data cleaning and Power BI for interactive dashboard visualization.
trinay126
End-to-end Credit Card Fraud Detection project using Excel. Includes fraud pattern analysis, time-based trends, high-value flags, rule-based detection, and forecasting. Features an interactive dashboard with slicers, KPIs, and charts. Ideal for Excel-based data analytics.
petritzeka
An AI-powered Excel + Python project that analyses electrical installation student quiz data, visualises progress with dashboards, and generates personalised study feedback using the OpenAI API. Combines data analytics, automation, and education insight inspired by real classroom teaching.
This project showcases a complete data cleaning and basic analytics workflow on a real-world-style employee health dataset, simulating inconsistencies often found in raw data. It includes both uncleaned and cleaned Excel files, plus a pivot-based dashboard to derive insights.
Sbonelondhlazi
A Google Data Analytics capstone project. Done in Microsoft Excel and Microsoft SQL Server