Found 104 repositories(showing 30)
phphoebe
ECommerce Startup Database Analysis Project with MySQL Workbench | SQL Aggregations, JOINs, CTEs | Web Analytics | Product Analytics
fikrionii
This project will extract, analyze website traffic and performance data from the eCommerce database to quantify the company’s growth and tell story of it.
ShahadShaikh
Problem Statement Introduction So far, in this course, you have learned about the Hadoop Framework, RDBMS design, and Hive Querying. You have understood how to work with an EMR cluster and write optimised queries on Hive. This assignment aims at testing your skills in Hive, and Hadoop concepts learned throughout this course. Similar to Big Data Analysts, you will be required to extract the data, load them into Hive tables, and gather insights from the dataset. Problem Statement With online sales gaining popularity, tech companies are exploring ways to improve their sales by analysing customer behaviour and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging. Needless to say, the role of big data analysts is among the most sought-after job profiles of this decade. Therefore, as part of this assignment, we will be challenging you, as a big data analyst, to extract data and gather insights from a real-life data set of an e-commerce company. In the next video, you will learn the various stages in collecting and processing the e-commerce website data. Play Video2079378 One of the most popular use cases of Big Data is in eCommerce companies such as Amazon or Flipkart. So before we get into the details of the dataset, let us understand how eCommerce companies make use of these concepts to give customers product recommendations. This is done by tracking your clicks on their website and searching for patterns within them. This kind of data is called a clickstream data. Let us understand how it works in detail. The clickstream data contains all the logs as to how you navigated through the website. It also contains other details such as time spent on every page, etc. From this, they make use of data ingesting frameworks such as Apache Kafka or AWS Kinesis in order to store it in frameworks such as Hadoop. From there, machine learning engineers or business analysts use this data to derive valuable insights. In the next video, Kautuk will give you a brief idea on the data that is used in this case study and the kind of analysis you can perform with the same. Play Video2079378 For this assignment, you will be working with a public clickstream dataset of a cosmetics store. Using this dataset, your job is to extract valuable insights which generally data engineers come up within an e-retail company. So now, let us understand the dataset in detail in the next video. Play Video2079378 You will find the data in the link given below. https://e-commerce-events-ml.s3.amazonaws.com/2019-Oct.csv https://e-commerce-events-ml.s3.amazonaws.com/2019-Nov.csv You can find the description of the attributes in the dataset given below. In the next video, you will learn about the various implementation stages involved in this case study. Attribute Description Download Play Video2079378 The implementation phase can be divided into the following parts: Copying the data set into the HDFS: Launch an EMR cluster that utilizes the Hive services, and Move the data from the S3 bucket into the HDFS Creating the database and launching Hive queries on your EMR cluster: Create the structure of your database, Use optimized techniques to run your queries as efficiently as possible Show the improvement of the performance after using optimization on any single query. Run Hive queries to answer the questions given below. Cleaning up Drop your database, and Terminate your cluster You are required to provide answers to the questions given below. Find the total revenue generated due to purchases made in October. Write a query to yield the total sum of purchases per month in a single output. Write a query to find the change in revenue generated due to purchases from October to November. Find distinct categories of products. Categories with null category code can be ignored. Find the total number of products available under each category. Which brand had the maximum sales in October and November combined? Which brands increased their sales from October to November? Your company wants to reward the top 10 users of its website with a Golden Customer plan. Write a query to generate a list of top 10 users who spend the most. Note: To write your queries, please make necessary optimizations, such as selecting the appropriate table format and using partitioned/bucketed tables. You will be awarded marks for enhancing the performance of your queries. Each question should have one query only. Use a 2-node EMR cluster with both the master and core nodes as M4.large. Make sure you terminate the cluster when you are done working with it. Since EMR can only be terminated and cannot be stopped, always have a copy of your queries in a text editor so that you can copy-paste them every time you launch a new cluster. Do not leave PuTTY idle for so long. Do some activity like pressing the space bar at regular intervals. If the terminal becomes inactive, you don't have to start a new cluster. You can reconnect to the master node by opening the puTTY terminal again, giving the host address and loading .ppk key file. For your information, if you are using emr-6.x release, certain queries might take a longer time, we would suggest you use emr-5.29.0 release for this case study. There are different options for storing the data in an EMR cluster. You can briefly explore them in this link. In your previous module on hive querying, you copied the data to the local file system, i.e., to the master node's file system and performed the queries. Since the size of the dataset is large here in this case study, it is a good practice to load the data into the HDFS and not into the local file system. You can revisit the segment on 'Working with HDFS' from the earlier module on 'Introduction to Big data and Cloud'. You may have to use CSVSerde with the default properties value for loading the dataset into a Hive table. You can refer to this link for more details on using CSVSerde. Also, you may want to skip the column names from getting inserted into the Hive table. You can refer to this link on how to skip the headers.
pranitjaiswal
Designed an ER-diagram, coded, populated and analyzed an original database system using Microsoft SQL Server to support the operational and reporting needs of an e-commerce business.
Growth of the PIM industry include rising demand for PIM solution from flourishing eCommerce industry and increasing need to offering enhanced customer services are driving the growth of the PIM market globally. The global product information management market accounted for US$ 7.5 billion in 2019 and is anticipated to register a CAGR of 14.5%. The report "Global Product Information Management Market, By Enterprise Type (Large Enterprise, Small & Medium Enterprise), By Industry (BFSI, Healthcare, Telecommunication & IT, Government, Retail, Transportation & Logistics, Management, Energy & Utility, Media & Entertainment, and Others), and By Region (North America, Europe, Asia Pacific, Latin America, and the Middle East & Africa) - Trends, Analysis and Forecast till 2029”. Key Highlights: In October 2020, Pimcore introduced new features and improvements. The company updated its Pimcore platform and added new features, such as an editable dialog box, cache performance improvement, and tree sorting. In June 2020, Winshuttle formed a partnership with ABBYY, a digital intelligence company. The aim behind the partnership is to help organizations and businesses in digital transformation, which involves extracting data from physical documents and automatically loading it into SAP. Analyst View: Increasing investment in product information management Rising demand for centralized data storage of information related to products is driving the product information market. Centralized data storage is helping companies to easily manage and organize all the data related to its products. Data sources are updated with a single change in the centralized data storage, saving time and cost required for data management. Also, compliance and verification requirements are increasing due to the growing number of threats to information security. This provides safe and secure access to information stored in the centralized database. Access is granted only after completing verification of all the security credentials required. Product information management facilitates quick and easy access to the repository of information, at the same time strategic data storage techniques help in maintaining the data quality. Indexing and linking helps in reducing the time required to complete various processes related to data storage, increasing the operational efficiency. Marketing and sales of products are important processes to generate revenue. Growing PIM industry The market enables manifestation of products to achieve client centricity and unified customer view and provides a centralized system for improving the efficiency of promotional activities. All the distribution channels are managed effectively by using this solution. Integration of Big Data and business intelligence applications with cloud storage offers tremendous growth opportunities to the market. Browse 60 market data tables* and 35 figures* through 140 slides and in-depth TOC on “Global Product Information Management Market”, By Enterprise Type (Large Enterprise, Small & Medium Enterprise), By Industry (BFSI, Healthcare, Telecommunication & IT, Government, Retail, Transportation & Logistics, Management, Energy & Utility, Media & Entertainment, and Others), and By Region (North America, Europe, Asia Pacific, Latin America, and the Middle East & Africa) - Trends, Analysis and Forecast till 2029 Key Market Insights from the report: The global product information management market accounted for US$ 7.5 billion in 2019 and is anticipated to register a CAGR of 14.5%. The market report has been segmented on the basis of enterprise type, application, and region. Depending upon enterprise type, the large enterprises shares the highest market due to the adoption of PI solutions and services is higher in large enterprises. The large enterprises heavily invest in advanced technologies to increase their overall productivity and efficiency. By application, the media & entertainment segment holds the largest share in the market. As most of the populations are staying at home, the usage of media and entertainment has increased with double digit growth. Product information offers high visibility, scalability and service optimization that can handle challenges occurred due to sudden increased demand in media and entertainment industry vertical. By region, North America is the largest market for product information management. The emerging demand to maximize value from the centralized master data and reference data, with ongoing demands of gaining meaningful insights from this consolidated master data is expected to further influence the adoption of PIM systems positively in the North American region during the coming years. The market in Asia-Pacific is expected to witness potential growth opportunities owing to the fast adoption of multi-domain PI software which is expected to enable better services in terms of performance, quality and capacity during the forecast period. To know the upcoming trends and insights prevalent in this market, click the link below: https://www.prophecymarketinsights.com/market_insight/Global-Product-Information-Management-Market-4573 Competitive Landscape: The prominent player operating in the global product information management market includes SAP AG, IBM Corporation, Oracle Corporation., Informatica LLC, Riversand Technologies, Inc., Stibo Systems, ADAM Software NV, Agility Multichannel Ltd., InRiverAB and Pimcore GmbH. The market provides detailed information regarding the industrial base, productivity, strengths, manufacturers, and recent trends which will help companies enlarge the businesses and promote financial growth. Furthermore, the report exhibits dynamic factors including segments, sub-segments, regional marketplaces, competition, dominant key players, and market forecasts. In addition, the market includes recent collaborations, mergers, acquisitions, and partnerships along with regulatory frameworks across different regions impacting the market trajectory. Recent technological advances and innovations influencing the global market are included in the report.
annaz201816
MySQL-based E-Commerce Data Analysis Project — includes database schema and query insights.
Abhilash17br
eCommerce Database Analysis in SQL
sruthivarma05
This project demonstrates SQL for data analysis using an e-commerce dataset. It includes queries with filtering, joins, subqueries, aggregates, views, and indexes to extract insights and optimize performance.
hemantbuchade
No description available
Exploratory data analysis of eCommerce database using SQL and present the insight using python.
sd-brewer
Data cleaning, normalization and analysis of an ecommerce dataset through creation of a postgreSQL database.
tej-patel-yr
Comprehensive SQL project showcasing ecommerce database design, normalization, data integrity checks, analysis & reporting, query optimization, and schema enhancements.
colbystout
Using a Google BigQuery, fictitious ecommerce database, I used SQL and DAX to build a Power BI dashboard for analysis.
Tinor12
This project showcases SQL data analysis on a mock ecommerce database using MySQL. It includes queries for revenue, sales trends, customer behavior, joins, subqueries, views, and indexing. Screenshots and SQL scripts are provided for clear analysis and insights.
dbisiw07
Bonjour, Hi, I'm a freelance web developer based in Montreal, I have bachelor in Computer Science and I am currently pursuing master of Engineering in information systems security at Concordia University. I presently work on secure eCommerce web applications and internet security consultancy. My other specialties are database security, cryptography, forensic analysis using ftk and encase. You can learn more about me from my LinkedIn profile.
lholopainen
No description available
juvi-coder
Using SQL and Tableau, I developed a Business Story based on the Fuzzy Factory Database. Business Concepts: Traffic Analysis, Bounce Rates, A/B Test Analysis, Funnel Analysis, Product Portfolio Analysis. SQL Concepts: Temporary Tables, Sub Queries, CASE and Pivot, Trend Analysis
Jenith2002
No description available
knromaric
SQL scripts that analyse a fictive ecommerce Database by answering many key business questions
PriyanshuChaubey
E-Commerce Data Analytics project analyzing customer behavior, sales trends, and product performance using Python and data visualization.
Chinipanda
No description available
mdebabrata2004
No description available
gupta24aarti
SQL project exploring customer behavior, orders, products, and supplier insights from an eCommerce database. Includes advanced joins, subqueries, aggregate functions, Indexing, Create view and optimization techniques.
sheetalkalburgi
No description available
nour-hatem
End-to-end data analysis project for an e-commerce dataset. Built the database from scratch, performed analysis with SQL, and visualized key insights.
No description available
shreyagupta17
No description available
domivillacis
Extract and analyse data to generate insights with MySQL
lithium3812
No description available
RajLaxmi05
E-commerce sales analysis project using Python to explore monthly trends, category performance, and customer segment insights.