Found 2,554 repositories(showing 30)
pavan-ahire
A complete end-to-end MySQL project analyzing grocery store operations through database design, complex SQL queries, and business insights. Includes ERD modeling, customer & sales analysis, supplier performance evaluation, and data-driven recommendations for improving retail decision-making.
ShahadShaikh
Problem Statement Introduction So far, in this course, you have learned about the Hadoop Framework, RDBMS design, and Hive Querying. You have understood how to work with an EMR cluster and write optimised queries on Hive. This assignment aims at testing your skills in Hive, and Hadoop concepts learned throughout this course. Similar to Big Data Analysts, you will be required to extract the data, load them into Hive tables, and gather insights from the dataset. Problem Statement With online sales gaining popularity, tech companies are exploring ways to improve their sales by analysing customer behaviour and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging. Needless to say, the role of big data analysts is among the most sought-after job profiles of this decade. Therefore, as part of this assignment, we will be challenging you, as a big data analyst, to extract data and gather insights from a real-life data set of an e-commerce company. In the next video, you will learn the various stages in collecting and processing the e-commerce website data. Play Video2079378 One of the most popular use cases of Big Data is in eCommerce companies such as Amazon or Flipkart. So before we get into the details of the dataset, let us understand how eCommerce companies make use of these concepts to give customers product recommendations. This is done by tracking your clicks on their website and searching for patterns within them. This kind of data is called a clickstream data. Let us understand how it works in detail. The clickstream data contains all the logs as to how you navigated through the website. It also contains other details such as time spent on every page, etc. From this, they make use of data ingesting frameworks such as Apache Kafka or AWS Kinesis in order to store it in frameworks such as Hadoop. From there, machine learning engineers or business analysts use this data to derive valuable insights. In the next video, Kautuk will give you a brief idea on the data that is used in this case study and the kind of analysis you can perform with the same. Play Video2079378 For this assignment, you will be working with a public clickstream dataset of a cosmetics store. Using this dataset, your job is to extract valuable insights which generally data engineers come up within an e-retail company. So now, let us understand the dataset in detail in the next video. Play Video2079378 You will find the data in the link given below. https://e-commerce-events-ml.s3.amazonaws.com/2019-Oct.csv https://e-commerce-events-ml.s3.amazonaws.com/2019-Nov.csv You can find the description of the attributes in the dataset given below. In the next video, you will learn about the various implementation stages involved in this case study. Attribute Description Download Play Video2079378 The implementation phase can be divided into the following parts: Copying the data set into the HDFS: Launch an EMR cluster that utilizes the Hive services, and Move the data from the S3 bucket into the HDFS Creating the database and launching Hive queries on your EMR cluster: Create the structure of your database, Use optimized techniques to run your queries as efficiently as possible Show the improvement of the performance after using optimization on any single query. Run Hive queries to answer the questions given below. Cleaning up Drop your database, and Terminate your cluster You are required to provide answers to the questions given below. Find the total revenue generated due to purchases made in October. Write a query to yield the total sum of purchases per month in a single output. Write a query to find the change in revenue generated due to purchases from October to November. Find distinct categories of products. Categories with null category code can be ignored. Find the total number of products available under each category. Which brand had the maximum sales in October and November combined? Which brands increased their sales from October to November? Your company wants to reward the top 10 users of its website with a Golden Customer plan. Write a query to generate a list of top 10 users who spend the most. Note: To write your queries, please make necessary optimizations, such as selecting the appropriate table format and using partitioned/bucketed tables. You will be awarded marks for enhancing the performance of your queries. Each question should have one query only. Use a 2-node EMR cluster with both the master and core nodes as M4.large. Make sure you terminate the cluster when you are done working with it. Since EMR can only be terminated and cannot be stopped, always have a copy of your queries in a text editor so that you can copy-paste them every time you launch a new cluster. Do not leave PuTTY idle for so long. Do some activity like pressing the space bar at regular intervals. If the terminal becomes inactive, you don't have to start a new cluster. You can reconnect to the master node by opening the puTTY terminal again, giving the host address and loading .ppk key file. For your information, if you are using emr-6.x release, certain queries might take a longer time, we would suggest you use emr-5.29.0 release for this case study. There are different options for storing the data in an EMR cluster. You can briefly explore them in this link. In your previous module on hive querying, you copied the data to the local file system, i.e., to the master node's file system and performed the queries. Since the size of the dataset is large here in this case study, it is a good practice to load the data into the HDFS and not into the local file system. You can revisit the segment on 'Working with HDFS' from the earlier module on 'Introduction to Big data and Cloud'. You may have to use CSVSerde with the default properties value for loading the dataset into a Hive table. You can refer to this link for more details on using CSVSerde. Also, you may want to skip the column names from getting inserted into the Hive table. You can refer to this link on how to skip the headers.
End-to-end retail sales data cleaning, analysis & forecasting using Python, SQL & Tableau. Analyze & forecast retail sales using Global Superstore data: Python (Pandas, Matplotlib), SQLite, Tableau dashboards. Keep it under 100 characters if possible so it displays well.
ashish-jonnada
Exploratory Data Analysis on retail sales data to uncover customer spending patterns, category performance, and seasonal sales insights for business decision-making.
Store sales and profit analysis is the task of analyzing the performance of a retail store in terms of its sales and profits.
shivanand-Mathapati-Analyst
Developed an interactive Retail Sales Performance Dashboard in Microsoft Excel using advanced Pivot Tables to analyze Superstore data (2014–2017), uncovering revenue trends, customer insights, KPIs with YoY & MoM comparisons, and dynamic slicers.
This project tells a story about retail sales performance of cookie company in the United States using Power BI
Srinivasareddyseelam
Retail sales data analysis project using SQL for insights and performance tracking.
MalayBhunia
SQL-based analysis of Apple retail sales data to uncover insights on sales trends, product performance, and warranty claims using large-scale datasets.
SQL-based retail sales analysis project using joins, aggregations, and window functions to uncover revenue trends, customer behavior, and product performance insights.
JamesMatini
Retail Sales Analysis SQL project demonstrating data cleaning, EDA, and business insights. Built a p1_retail_db database, cleaned raw data, ran SQL queries to analyze sales trends, customer behavior, and product performance, and generated reports to support data-driven retail decisions.
adiomuizz
This project involves a Power BI dashboard for retail analysis, offering insights into sales performance, district-level metrics, and new store openings through interactive visualizations and key performance indicators.
Priyanshubit
End-to-End Sales & Customer Analytics for a Retail Chain. Objective is to Build a data pipeline and analysis suite for a retail company to monitor sales performance, customer behavior, and product trends. Deliver a dashboard for business decisions.
virajbhutada
Gain valuable insights into retail sales with the "Walmart Retail Performance Dashboard" in Microsoft Excel. This user- friendly tool facilitates an in-depth analysis of key sales metrics, providing a comprehensive view of Walmart's performance. Make data-driven decisions for informed and strategic business outcomes.
FutureGoose
Analyzing a European bicycle retail business to enhance growth and profitability. Features in-depth EDA, business performance analysis, and strategic insights based on comprehensive sales data.
PiyushLuitel-07
"Unleash the power of data with Superstore Sales Analysis! This Python program utilizes advanced analytics to dissect sales trends, forecast future performance, and provide actionable insights. Elevate your retail strategy and maximize profitability. #DataAnalytics #RetailTech"
Chaitanya8639
This presents a data-driven analysis of retail sales using Tableau. The goal was to address key questions raised by the CEO and CMO regarding revenue trends, customer performance, and regional demand, using 2011 sales data.
adiomuizz
This project analyzes the sales performance of a retail store, revealing total sales of $438K and a profit of $37K. The analysis identifies Maharashtra as the top-performing state and Clothing as the best-selling category.
itzme-vaishu
BlinkIT Retail Analysis Project (EDA+ Power BI) This project analyzes retail data from BlinkIT – India’s Last Minute App, focusing on sales performance, item categories, outlet characteristics, and fat content distribution. It includes: Exploratory Data Analysis (EDA) using Python Interactive dashboard built in Power
BouchentoufOthman
Retail sales analysis project using Python (Jupyter Notebook) for data cleaning, outlier detection, feature engineering, and RFM-based customer segmentation, followed by an interactive Power BI dashboard for visual insights on sales trends, customer segment, and regional performance.
bagdenatasha2001
The Vrinda Store Data Analysis Dashboard is a comprehensive project In Microsoft Excel developed to analyze and visualize the sales and performance data of Vrinda Store, a fictional retail business.
Mayank-Bhatt22
This SQL project analyzes a retail sales dataset by performing data cleaning, exploration, and business problem-solving. Queries include sales trends, top customers, category insights, monthly performance, and shift-wise order analysis. Built in MySQL for data-driven decision-making.
Rushikesh2010
Dashcart is a retail data analysis project focused on sales, profit, and customer insights. Using Excel and Power BI, it visualizes trends, highlights key metrics, and supports data-driven decision-making for business growth and performance optimization.
MOHAMMED-AL-SADEI
This project focuses on analyzing and visualizing the performance of a retail chain to uncover operational inefficiencies and highlight data-driven insights. The analysis was based on real sales and inventory data collected from multiple store branches.
Keerthi-muppulakunta
This repository features the Retail sales and customer insights dashboard demonstrating data cleaning and analysis using Excel, Python, and SQL, along with interactive visualization in Power BI. Explore the code and reports to gain insights into sales performance, customer behavior, and product trends. A complete end-to-end data analytics workflow
Growth of the PIM industry include rising demand for PIM solution from flourishing eCommerce industry and increasing need to offering enhanced customer services are driving the growth of the PIM market globally. The global product information management market accounted for US$ 7.5 billion in 2019 and is anticipated to register a CAGR of 14.5%. The report "Global Product Information Management Market, By Enterprise Type (Large Enterprise, Small & Medium Enterprise), By Industry (BFSI, Healthcare, Telecommunication & IT, Government, Retail, Transportation & Logistics, Management, Energy & Utility, Media & Entertainment, and Others), and By Region (North America, Europe, Asia Pacific, Latin America, and the Middle East & Africa) - Trends, Analysis and Forecast till 2029”. Key Highlights: In October 2020, Pimcore introduced new features and improvements. The company updated its Pimcore platform and added new features, such as an editable dialog box, cache performance improvement, and tree sorting. In June 2020, Winshuttle formed a partnership with ABBYY, a digital intelligence company. The aim behind the partnership is to help organizations and businesses in digital transformation, which involves extracting data from physical documents and automatically loading it into SAP. Analyst View: Increasing investment in product information management Rising demand for centralized data storage of information related to products is driving the product information market. Centralized data storage is helping companies to easily manage and organize all the data related to its products. Data sources are updated with a single change in the centralized data storage, saving time and cost required for data management. Also, compliance and verification requirements are increasing due to the growing number of threats to information security. This provides safe and secure access to information stored in the centralized database. Access is granted only after completing verification of all the security credentials required. Product information management facilitates quick and easy access to the repository of information, at the same time strategic data storage techniques help in maintaining the data quality. Indexing and linking helps in reducing the time required to complete various processes related to data storage, increasing the operational efficiency. Marketing and sales of products are important processes to generate revenue. Growing PIM industry The market enables manifestation of products to achieve client centricity and unified customer view and provides a centralized system for improving the efficiency of promotional activities. All the distribution channels are managed effectively by using this solution. Integration of Big Data and business intelligence applications with cloud storage offers tremendous growth opportunities to the market. Browse 60 market data tables* and 35 figures* through 140 slides and in-depth TOC on “Global Product Information Management Market”, By Enterprise Type (Large Enterprise, Small & Medium Enterprise), By Industry (BFSI, Healthcare, Telecommunication & IT, Government, Retail, Transportation & Logistics, Management, Energy & Utility, Media & Entertainment, and Others), and By Region (North America, Europe, Asia Pacific, Latin America, and the Middle East & Africa) - Trends, Analysis and Forecast till 2029 Key Market Insights from the report: The global product information management market accounted for US$ 7.5 billion in 2019 and is anticipated to register a CAGR of 14.5%. The market report has been segmented on the basis of enterprise type, application, and region. Depending upon enterprise type, the large enterprises shares the highest market due to the adoption of PI solutions and services is higher in large enterprises. The large enterprises heavily invest in advanced technologies to increase their overall productivity and efficiency. By application, the media & entertainment segment holds the largest share in the market. As most of the populations are staying at home, the usage of media and entertainment has increased with double digit growth. Product information offers high visibility, scalability and service optimization that can handle challenges occurred due to sudden increased demand in media and entertainment industry vertical. By region, North America is the largest market for product information management. The emerging demand to maximize value from the centralized master data and reference data, with ongoing demands of gaining meaningful insights from this consolidated master data is expected to further influence the adoption of PIM systems positively in the North American region during the coming years. The market in Asia-Pacific is expected to witness potential growth opportunities owing to the fast adoption of multi-domain PI software which is expected to enable better services in terms of performance, quality and capacity during the forecast period. To know the upcoming trends and insights prevalent in this market, click the link below: https://www.prophecymarketinsights.com/market_insight/Global-Product-Information-Management-Market-4573 Competitive Landscape: The prominent player operating in the global product information management market includes SAP AG, IBM Corporation, Oracle Corporation., Informatica LLC, Riversand Technologies, Inc., Stibo Systems, ADAM Software NV, Agility Multichannel Ltd., InRiverAB and Pimcore GmbH. The market provides detailed information regarding the industrial base, productivity, strengths, manufacturers, and recent trends which will help companies enlarge the businesses and promote financial growth. Furthermore, the report exhibits dynamic factors including segments, sub-segments, regional marketplaces, competition, dominant key players, and market forecasts. In addition, the market includes recent collaborations, mergers, acquisitions, and partnerships along with regulatory frameworks across different regions impacting the market trajectory. Recent technological advances and innovations influencing the global market are included in the report.
aribafatima9000
Retail Sales Performance Analysis
veranhemakinya-4019
End-to-end retail sales analysis project using SQL, Python, Excel, and Power BI to generate business insights and executive-level dashboards.
Tejasphalke123
Retail sales analysis project using Excel PivotTables, slicers, charts, and forecasting to uncover trends, visualize KPIs, and support strategic decision-making.
thecodedcoder
Sales performance analysis for a retail business using Python and Pandas. Identifies a $35K annual profit leak in the Furniture category and provides actionable recommendations.