Found 460 repositories(showing 30)
akchaudhary57
I have created Job ready Data Analyst Course in 90 Days which involves analysis in several platforms like Excel, Python, R and SQL along with Visualization in Tableau.
Cryptoaj-hack
Decentralized Finance (DeFi) Development Services & Solutions Eliminate the role of a middleman by availing decentralized finance (DEFI) development services & solutions. Get access to the major financial services through a blockchain network and experience the benefits of automation, a higher level of security, anonymity, interoperability, and transparency. Our wide range of services include Market-Making Consulting We take immense efforts in establishing financial markets that understand the customers’ proprietary algorithms. We aim at improving the access of liquidity to investors and democratize the whole system. We render customized features according to the customer’s expected return on investment. Decentralized Crypto Banking We ensure a frictionless user experience by facilitating the direct transfer of value between the involved parties supported by decentralization. Our ready-to-launch white-label mobile payment apps render a variety of services such as wallet integration, value holding, and detailed transactional analysis. Defi Lottery System Development We provide a no-loss lottery system that benefits our participants completely. We take steps to eliminate the custodianship of the pooled capital. We permit investing your capital in other related dapps and distribute the rewards in form of a major share of the interest earned to a winner randomly selected by the smart contracts. We assure the regular flow of returns. Derivatives Over Defi Platform We ensure seamless access to derivatives and maximize your earning potential by many notches. by establishing robust dapps, we enable traders to hedge their portfolio of investments and minimize risks by directly engaging with their peers through a democratic platform. We are experts in derivatives market-making and Dapp platform development. Decentralized Fund Management All your crypto assets will be managed to yield high performance in a decentralized exchange through smart control and management. with in-depth experience in investment exchanges along with our strong knowledge of defi, we render our services at low fees and avoid potential risks. Defi Insurance System Development We ensure that there are no risks present in our smart contract. With our robust provision of insurance services, we assure you that there will be no chance of uncontrollable liquidity requests. We contain futuristic risks, uncertainties, and emergencies through lucrative insurance deals. Defi Yield Farming Platform Development Yield farming refers to the technique through which one can earn more cryptocurrencies by using his existing holding of cryptos. Liquidity providers play a vital role in the success of yield farming. They stake their assets in liquidity pools and facilitate trading in cryptos by creating a market. Defi Staking Platform Development Defi staking involves a mechanism where crypto assets will be staked on a supported wallet or exchange and passive income will be earned. The rewards can be calculated based on the quantity of staked assets, the staking duration, inflation rate, and the network issuance rate. Defi Lending Platform Development Defi lending platforms have been made popular by the likes of aave and compound. The basic features of a defi lending platform include flash loan facilities, a fiat payment gateway, and an exclusive margin trading facility, the advantages of defi lending include high immutability, better transparency, quick access, and resistance to transaction censorship. Defi Smart Contract Development One of the pivotal reasons behind the tremendous growth of defi services is due to the heavy investments made in robust defi smart contract development. They are created with the solidity programming language, highly encrypted, and automates the tasks to be executed based on certain pre-set terms and conditions. Defi Dapp Development Defi Dapp development plays a critical role to avoid the risk of a central point of failure. They are highly secure when compared to centralized applications due to the absence of a central authority. Defi Tokens Development Defi tokens development has played a critical role in boosting the growth of decentralized applications. Their value is currently higher than bitcoin. it has a huge trading volume and has garnered a lot of attention from the mainstream crowd in recent times. Defi Dex Development Like Uniswap Uniswap is one of the leading defi projects being undertaken. It is an innovative venture as it utilizes incentivized liquidity pools instead of regular order books. every user of uni swap will is rewarded with a percentage of fees incurred on every ethereum transaction for rendering liquidity to the system. Defi Wallet Development Traders will have complete control over their funds through defi wallet development without the interference of any authorities in the system. Supreme security is guaranteed for users without any compromise. By supplying customized private keys to every user, there will not be any chances for any loss of data. DeFi Marketing Services To assist DeFi projects gain user engagement, marketing services are indispensable.From drafting white paper, video and content marketing, to legal advisory, marketing and community management, our DeFi marketing and consulting services are well-versed to get the job done. DeFi Synthetic Asset Development Synthetic assets derive their value from underlying assets and derivatives which are essentially smart contracts. In DeFi, Synthetic assets have gained acclaim as they involve low risks and little chance of price fluctuations. Users can easily invest, trade, and own assets with no hassles. DeFi Solutions For Ecommerce Streamline your Ecommerce business with DeFi and its pragmatic tools. With DeFi’s solutions , benefits like omission of intermediaries, faster shipping, supply chain management, and real time tracking can be integrated with your Ecommerce business, increasing profits. DeFi Tokenization Development Tokenization Development is one of the pragmatic solutions DeFi offers. Users can now convert inoperative and underutilized assets into great profits by simply tokenizing their assets. With our DeFi tokenization, avail of ERC20, ERC721 & NFT tokens for your assets. DeFi Crowdfunding Platform Development Although a relatively new sector, DeFi crowdfunding has become the go-to mode of aggregating funds to support businesses and start-ups. Our DeFi Crowdfunding platform services come with additional benefits in the likes of tax benefits, instant approval, fundraising calendars and more. DeFi Real Estate Platform Development DeFi has revolutionized the ways of real estate management. Now real estate owners and investors, with the help of blockchain based tokens, can make property investment seamless and manageable. With fractional ownership, financial inclusivity is now possible. DeFi ICO Development One of the leading fundraising methods, DeFi ICO services are distinguished. Creating utile tokens, community management, escalating coin value, and launching projects with diligence & guidance from market analysts and blockchain experts is inclusive of our ICO Development. DeFi Exchange Development Offering users a plethora of apparent benefits, DEXs are the prized innovation of DeFi. Offering high-end security, durable liquidity, complete anonymity and financial inclusivity, DEXs make trading and transacting crypto accessible and lucrative for crypto enthusiasts. DeFi Protocol Like Yearn. Finance Yearn. Finance offers the best APY the market has to offer by referring to popular exchanges. This protocol offers its users the best yields in a highly secure network. With in-built smart contracts and an open source code, it supports a range of Stablecoins offering huge returns. DeFi Protocol Like AAve The DeFi protocol Aave offers crypto traders a robust platform for lending and borrowing of crypto for which they earn high interests. The highlight feature of Aave - Flash loans and flexible interest rates make it a profitable platform for crypto traders. DeFi Exchange Like 1inch 1inch exchange now has the reputation of being the DEX offering users the lowest slippage. As an aggregator, 1inch connects several exchanges to one platform in a non-custodial ecosystem. With governance and farming features, trading on 1inch remains prominent.
Devtown-India
Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day. Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used. In this project, you will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC. Day:1 In this project, Students will make use of Python to explore data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. You will write code to import the data and answer interesting questions about it by computing descriptive statistics. They will also write a script that takes in raw input to create an interactive experience in the terminal to present these statistics. Technologies that will be covered are Numpy, Pandas, Matplotlib, Seaborn, Jupyter notebook. We will be giving the students a deep dive into the Data Analytical process Day:2 We will be giving the students an insight into one of the major fields of Machine Learning ie. Time Series forcasting we will be taking them through the relevant theory and make them understand of the importance and different techniques that are available to deal with it. After that we will be working hands on the bike share data set implementing different algorithms and understanding them to the core We aim to provide students an insight into what exactly is the job of a data analyst and get them familiarise to how does the entire data analysis process work. The session will be hosted by Shaurya Sinha a data analyst at Jio and Parag Mittal Software engineer at Microsoft.
Scraped job description and leveraged the concepts of Natural Language Processing (NLP) and GloVe Algorithm to extract the keywords through data and performed analysis. Presenting the vital keywords from data analyst job summary from the Indeed website..
mesbahiba
Gain the job-ready skills for an entry-level data analyst role through this eight-course Professional Certificate from IBM and position yourself competitively in the thriving job market for data analysts, which will see a 20% growth until 2028 (U.S. Bureau of Labor Statistics). Power your data analyst career by learning the core principles of data analysis and gaining hands-on skills practice. You’ll work with a variety of data sources, project scenarios, and data analysis tools, including Excel, SQL, Python, Jupyter Notebooks, and Cognos Analytics, gaining practical experience with data manipulation and applying analytical techniques.
pnguenda
# Pandas Homework - Pandas, Pandas, Pandas ## Background The data dive continues! Now, it's time to take what you've learned about Python Pandas and apply it to new situations. For this assignment, you'll need to complete **one of two** (not both) Data Challenges. Once again, which challenge you take on is your choice. Just be sure to give it your all -- as the skills you hone will become powerful tools in your data analytics tool belt. ### Before You Begin 1. Create a new repository for this project called `pandas-challenge`. **Do not add this homework to an existing repository**. 2. Clone the new repository to your computer. 3. Inside your local git repository, create a directory for the Pandas Challenge you choose. Use folder names corresponding to the challenges: **HeroesOfPymoli** or **PyCitySchools**. 4. Add your Jupyter notebook to this folder. This will be the main script to run for analysis. 5. Push the above changes to GitHub or GitLab. ## Option 1: Heroes of Pymoli  Congratulations! After a lot of hard work in the data munging mines, you've landed a job as Lead Analyst for an independent gaming company. You've been assigned the task of analyzing the data for their most recent fantasy game Heroes of Pymoli. Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights. Your final report should include each of the following: ### Player Count * Total Number of Players ### Purchasing Analysis (Total) * Number of Unique Items * Average Purchase Price * Total Number of Purchases * Total Revenue ### Gender Demographics * Percentage and Count of Male Players * Percentage and Count of Female Players * Percentage and Count of Other / Non-Disclosed ### Purchasing Analysis (Gender) * The below each broken by gender * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Gender ### Age Demographics * The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.) * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Age Group ### Top Spenders * Identify the the top 5 spenders in the game by total purchase value, then list (in a table): * SN * Purchase Count * Average Purchase Price * Total Purchase Value ### Most Popular Items * Identify the 5 most popular items by purchase count, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value ### Most Profitable Items * Identify the 5 most profitable items by total purchase value, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value As final considerations: * You must use the Pandas Library and the Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of three observable trends based on the data. * See [Example Solution](HeroesOfPymoli/HeroesOfPymoli_starter.ipynb) for a reference on expected format. ## Option 2: PyCitySchools  Well done! Having spent years analyzing financial records for big banks, you've finally scratched your idealistic itch and joined the education sector. In your latest role, you've become the Chief Data Scientist for your city's school district. In this capacity, you'll be helping the school board and mayor make strategic decisions regarding future school budgets and priorities. As a first task, you've been asked to analyze the district-wide standardized test results. You'll be given access to every student's math and reading scores, as well as various information on the schools they attend. Your responsibility is to aggregate the data to and showcase obvious trends in school performance. Your final report should include each of the following: ### District Summary * Create a high level snapshot (in table form) of the district's key metrics, including: * Total Schools * Total Students * Total Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### School Summary * Create an overview table that summarizes key metrics about each school, including: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Top Performing Schools (By % Overall Passing) * Create a table that highlights the top 5 performing schools based on % Overall Passing. Include: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Bottom Performing Schools (By % Overall Passing) * Create a table that highlights the bottom 5 performing schools based on % Overall Passing. Include all of the same metrics as above. ### Math Scores by Grade\*\* * Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Reading Scores by Grade * Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Scores by School Spending * Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following: * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Scores by School Size * Repeat the above breakdown, but this time group schools based on a reasonable approximation of school size (Small, Medium, Large). ### Scores by School Type * Repeat the above breakdown, but this time group schools based on school type (Charter vs. District). As final considerations: * Use the pandas library and Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of at least two observable trends based on the data. * See [Example Solution](PyCitySchools/PyCitySchools_starter.ipynb) for a reference on the expected format. ## Hints and Considerations * These are challenging activities for a number of reasons. For one, these activities will require you to analyze thousands of records. Hacking through the data to look for obvious trends in Excel is just not a feasible option. The size of the data may seem daunting, but pandas will allow you to efficiently parse through it. * Second, these activities will also challenge you by requiring you to learn on your feet. Don't fool yourself into thinking: "I need to study pandas more closely before diving in." Get the basic gist of the library and then _immediately_ get to work. When facing a daunting task, it's easy to think: "I'm just not ready to tackle it yet." But that's the surest way to never succeed. Learning to program requires one to constantly tinker, experiment, and learn on the fly. You are doing exactly the _right_ thing, if you find yourself constantly practicing Google-Fu and diving into documentation. There is just no way (or reason) to try and memorize it all. Online references are available for you to use when you need them. So use them! * Take each of these tasks one at a time. Begin your work, answering the basic questions: "How do I import the data?" "How do I convert the data into a DataFrame?" "How do I build the first table?" Don't get intimidated by the number of asks. Many of them are repetitive in nature with just a few tweaks. Be persistent and creative! * Expect these exercises to take time! Don't get discouraged if you find yourself spending hours initially with little progress. Force yourself to deal with the discomfort of not knowing and forge ahead. Consider these hours an investment in your future! * As always, feel encouraged to work in groups and get help from your TAs and Instructor. Just remember, true success comes from mastery and _not_ a completed homework assignment. So challenge yourself to truly succeed! ### Copyright Trilogy Education Services © 2019. All Rights Reserved.
ShahadShaikh
Problem Statement Introduction So far, in this course, you have learned about the Hadoop Framework, RDBMS design, and Hive Querying. You have understood how to work with an EMR cluster and write optimised queries on Hive. This assignment aims at testing your skills in Hive, and Hadoop concepts learned throughout this course. Similar to Big Data Analysts, you will be required to extract the data, load them into Hive tables, and gather insights from the dataset. Problem Statement With online sales gaining popularity, tech companies are exploring ways to improve their sales by analysing customer behaviour and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging. Needless to say, the role of big data analysts is among the most sought-after job profiles of this decade. Therefore, as part of this assignment, we will be challenging you, as a big data analyst, to extract data and gather insights from a real-life data set of an e-commerce company. In the next video, you will learn the various stages in collecting and processing the e-commerce website data. Play Video2079378 One of the most popular use cases of Big Data is in eCommerce companies such as Amazon or Flipkart. So before we get into the details of the dataset, let us understand how eCommerce companies make use of these concepts to give customers product recommendations. This is done by tracking your clicks on their website and searching for patterns within them. This kind of data is called a clickstream data. Let us understand how it works in detail. The clickstream data contains all the logs as to how you navigated through the website. It also contains other details such as time spent on every page, etc. From this, they make use of data ingesting frameworks such as Apache Kafka or AWS Kinesis in order to store it in frameworks such as Hadoop. From there, machine learning engineers or business analysts use this data to derive valuable insights. In the next video, Kautuk will give you a brief idea on the data that is used in this case study and the kind of analysis you can perform with the same. Play Video2079378 For this assignment, you will be working with a public clickstream dataset of a cosmetics store. Using this dataset, your job is to extract valuable insights which generally data engineers come up within an e-retail company. So now, let us understand the dataset in detail in the next video. Play Video2079378 You will find the data in the link given below. https://e-commerce-events-ml.s3.amazonaws.com/2019-Oct.csv https://e-commerce-events-ml.s3.amazonaws.com/2019-Nov.csv You can find the description of the attributes in the dataset given below. In the next video, you will learn about the various implementation stages involved in this case study. Attribute Description Download Play Video2079378 The implementation phase can be divided into the following parts: Copying the data set into the HDFS: Launch an EMR cluster that utilizes the Hive services, and Move the data from the S3 bucket into the HDFS Creating the database and launching Hive queries on your EMR cluster: Create the structure of your database, Use optimized techniques to run your queries as efficiently as possible Show the improvement of the performance after using optimization on any single query. Run Hive queries to answer the questions given below. Cleaning up Drop your database, and Terminate your cluster You are required to provide answers to the questions given below. Find the total revenue generated due to purchases made in October. Write a query to yield the total sum of purchases per month in a single output. Write a query to find the change in revenue generated due to purchases from October to November. Find distinct categories of products. Categories with null category code can be ignored. Find the total number of products available under each category. Which brand had the maximum sales in October and November combined? Which brands increased their sales from October to November? Your company wants to reward the top 10 users of its website with a Golden Customer plan. Write a query to generate a list of top 10 users who spend the most. Note: To write your queries, please make necessary optimizations, such as selecting the appropriate table format and using partitioned/bucketed tables. You will be awarded marks for enhancing the performance of your queries. Each question should have one query only. Use a 2-node EMR cluster with both the master and core nodes as M4.large. Make sure you terminate the cluster when you are done working with it. Since EMR can only be terminated and cannot be stopped, always have a copy of your queries in a text editor so that you can copy-paste them every time you launch a new cluster. Do not leave PuTTY idle for so long. Do some activity like pressing the space bar at regular intervals. If the terminal becomes inactive, you don't have to start a new cluster. You can reconnect to the master node by opening the puTTY terminal again, giving the host address and loading .ppk key file. For your information, if you are using emr-6.x release, certain queries might take a longer time, we would suggest you use emr-5.29.0 release for this case study. There are different options for storing the data in an EMR cluster. You can briefly explore them in this link. In your previous module on hive querying, you copied the data to the local file system, i.e., to the master node's file system and performed the queries. Since the size of the dataset is large here in this case study, it is a good practice to load the data into the HDFS and not into the local file system. You can revisit the segment on 'Working with HDFS' from the earlier module on 'Introduction to Big data and Cloud'. You may have to use CSVSerde with the default properties value for loading the dataset into a Hive table. You can refer to this link for more details on using CSVSerde. Also, you may want to skip the column names from getting inserted into the Hive table. You can refer to this link on how to skip the headers.
MohamedMuneerM
Data Analyst job posting analysis
Analysis of 2,250+ data analyst job postings using Python and Tableau
I analysed the data analyst jobs data set to find the some insights such as the most in demand data anlyst jobs, most competitive, and so on
Slavunia
Analysis of data analyst job postings from 2023 that uncovers key insights about the global and German data analysis job markets
SHIVASHANKAR-V07
SQL-based analysis of Data Analyst jobs (2023) - salaries, skills demand, and career insights using PostgreSQL, inspired by Luke Barousse’s SQL course.
windyguo2046
Task Your task in this assignment is to aggregate the data found in the Citi Bike Trip History Logs to build a data dashboard, story, or report. You may work with a timespan of your choosing. If you're really ambitious, you can merge multiple datasets from different periods. Try to provide answers to the following questions: How many trips have been recorded total during the chosen period? By what percentage has total ridership grown? How has the proportion of short-term customers and annual subscribers changed? What are the peak hours in which bikes are used during summer months (for whatever year of data you selected)? What are the peak hours in which bikes are used during winter months (for whatever year of data you selected)? What are the top 10 stations in the city for starting a journey? (Based on data, why do you hypothesize these are the top locations?) What are the top 10 stations in the city for ending a journey? (Based on data, why?) What are the bottom 10 stations in the city for starting a journey? (Based on data, why?) What are the bottom 10 stations in the city for ending a journey (Based on data, why?) What is the gender breakdown of active participants (Male v. Female)? How does the average trip duration change by age? What is the average distance in miles that a bike is ridden? Which Bikes (by ID) are most likely due for repair or inspection this year? How variable is the utilization by bike ID? Additionally, city officials would like to see the following visualizations: A static map that plots all bike stations with a visual indication of the most popular locations to start and end a journey with zip code data overlaid on top. A dynamic map that shows how each station's popularity changes over time (by month and year) -- with commentary pointing to any interesting events that may be behind these phenomena. Lastly, as a chronic over-achiever, you must also: Find at least two unexpected phenomena in the data and provide a visualization and analysis to document their presence. Considerations Remember, the people reading your analysis will NOT be data analysts. Your audience will be city officials, public administrators, and heads of New York City departments. Your data and analysis needs to be presented in a way that is focused, concise, easy-to-understand, and visually compelling. Your visualizations should be colorful enough to be included in press releases, and your analysis should be thoughtful enough for dictating programmatic changes. Assessment Your final product will be assessed on the following metrics: Completeness of Analysis Analytic Rigor Readability Visual Attraction Professionalism Hints You may need to get creative in how you combine each of the CSVs. Don't just assume Tableau is the right tool for the job. At this point, you have a wealth of technical skills and research abilities. Dig for an approach that works and just go with it. Don't just assume the CSV format hasn't changed since 2013. Subtle changes to the formats in any of your columns can blockade your analysis. Ensure your data is consistent and clean throughout your analysis. (Hint: Start and End Time change at some point in the history logs). Consider building your dashboards with small extracts of the data (i.e. single files) before attempting to import the whole thing. What you will find is that importing all 20+ million records of data will create performance issues quickly. Welcome to "Big Data". While utilizing all of the data may seem like a nice power play, consider the time-course in making your analysis. Is data from 2013 the most relevant for making bike replacement decisions today? Probably not. Don't let overwhelming data fool you. Ground your analysis in common sense. Remember, data alone doesn't "answer" anything. You will need to accompany your data visualizations with clear and directed answers and analysis. As is often the case, your clients are asking for a LOT of answers. Be considerate about their need-to-know and the importance of not "cramming in everything". Of course, answer each question, but do so in a way that is organized and presentable. Since this is a project for the city, spend the appropriate time thinking through decisions on color schemes, fonts, and visual story-telling. The Citi Bike program has a clear visual footprint. As a suggestion, look for ways to have your data visualizations match their aesthetic tones. Pay attention to labels. What exactly is "time duration"? What's the value of "age of birth"? You will almost certainly need calculated fields to get what you need. Keep a close eye for obvious outliers or false data. Not everyone who signs up for the program is answering honestly. In answering the question of "why" a phenomena is happening, consider adding other pieces of information on socioeconomics or other geographic data. Tableau has a map "layer" feature that you may find handy. Don't be afraid to manipulate your data and play with settings in Tableau. Tableau is meant to be explored. We haven't covered all that you need -- so you will need to keep an eye out for new tricks. The final "format" of your deliverable is up to you. It can be an embedded Tableau dashboard, a Tableau Story, a Tableau visualization + PDF -- you name it. The bottom line is: This is your story to tell. Use the medium you deem most effective. (But you should definitely be using Tableau in some way!) Treat this as a serious endeavor! This is an opportunity to show future employers that you have what it takes to be a top-notch analyst.
jennifermarie6sl
Data Analyst Jobs Analysis
Mdkhalidsiddique
No description available
avish-attri
No description available
This project seeks to analyse the data analyst job listings found on LinkedIn. The aim of the jobs analysis is to find hidden insights about data analyst jobs such as what skills are most in demand by employers.
sumraannn29
No description available
praneethkvs
Analysis of more than 5000 "Data Analyst" Jobs posted in 2023 scraped from the internet.
This project involves Python to scrape data from the Naukri employment portal about data scientists and analysts, clean the data using tools like Pandas and Numpy, and then visualise the data using PowerBI.
This project analyzes a dataset of entry-level data analyst job postings in Europe collected from LinkedIn. The analysis identifies top hiring companies, countries with the most opportunities, and the most common job titles for data analysts. The results are visualized using bar charts for easy interpretation.
AnshulSilhare
Python analysis of 100K+ US Data & Business Analyst job postings - skills demand, salary trends, and market insights.
andrewhryn
🐍 🇨🇦 Scrapes job listings from Indeed to extract data analyst positions in Canada. Empower your job search with Python web scraping and data analysis!
ddavid37
End-to-end SQL-based analysis of 2023’s data analyst job market—identifying top-paying roles, in-demand skills, and career growth strategies using PostgreSQL and real job posting data.
AzmeryLaizo
Web scraper developed to extract data from Google job board for the job titles database developer, data analyst, software engineer and web developer and perform text analysis to extract frequency of technical skills.
sharbanee7781
Analyzed the US data job market with Python to uncover in-demand and high-paying skills for Data Analysts. Used Pandas, Seaborn, and Matplotlib for data cleaning and visualization. Insights include skill demand trends, salary analysis, and optimal skills for career growth.
akashjborah97
Target: To analyze the ML joB market in India using Segmentation analysis for finding companies probable of hiring an ML Engineer/Data Analyst in respect to his/her skillset. Techniques and Algorithms used: Machine learning using python with libraries(numpy, pandas, scikit-learn, matplotlib) , elbow method, stability based structure analysis, k means clustering. In this project, we took a dataset form the website naukri.com analysed the skills and companies and using clustering algorithm we performed segmentation. Results : As Segmentation analysis is an important step before we embark on any plan. Hence it is important to learn how to analyze the job market and the demanded skills by the company. By analyzing the trend, we have observed cluster 0 contains companies which are inclined towards hiring people with Python skills on Data Science and Machine Learning. Cluster 1 contains companies which are likely to hire people with skills are not oriented towards Data Analysis. Cluster 2 contains companies which are inclined towards hiring people with Python and R skills on Data Science. Cluster 3 contains companies which are inclined towards hiring people with Python skills on Machine Learning. Cluster 4 contains companies which are likely to hire people with skills are not oriented towards Data Analysis. Cluster 5 contains companies which are likely to hire people with skills of Python, Machine Learning and minimal Data Science. The most demanded skills for the recruiters are Python, Data Science, Machine Learning and other IT skills. For the company’s analysis based on experience demanded, it was observed that Wipro, HiringSign, Global Logic and Gojek etc. didn’t appear in top numbers before the segmentation and appeared after the segmentation was carried out for the minimum, average and maximum experience data.
AdekoyaOlatolokikiAyomide
How to share data with a statistician This is a guide for anyone who needs to share data with a statistician or data scientist. The target audiences I have in mind are: Collaborators who need statisticians or data scientists to analyze data for them Students or postdocs in various disciplines looking for consulting advice Junior statistics students whose job it is to collate/clean/wrangle data sets The goals of this guide are to provide some instruction on the best way to share data to avoid the most common pitfalls and sources of delay in the transition from data collection to data analysis. The Leek group works with a large number of collaborators and the number one source of variation in the speed to results is the status of the data when they arrive at the Leek group. Based on my conversations with other statisticians this is true nearly universally. My strong feeling is that statisticians should be able to handle the data in whatever state they arrive. It is important to see the raw data, understand the steps in the processing pipeline, and be able to incorporate hidden sources of variability in one's data analysis. On the other hand, for many data types, the processing steps are well documented and standardized. So the work of converting the data from raw form to directly analyzable form can be performed before calling on a statistician. This can dramatically speed the turnaround time, since the statistician doesn't have to work through all the pre-processing steps first. What you should deliver to the statistician To facilitate the most efficient and timely analysis this is the information you should pass to a statistician: The raw data. A tidy data set A code book describing each variable and its values in the tidy data set. An explicit and exact recipe you used to go from 1 -> 2,3 Let's look at each part of the data package you will transfer. The raw data It is critical that you include the rawest form of the data that you have access to. This ensures that data provenance can be maintained throughout the workflow. Here are some examples of the raw form of data: The strange binary file your measurement machine spits out The unformatted Excel file with 10 worksheets the company you contracted with sent you The complicated JSON data you got from scraping the Twitter API The hand-entered numbers you collected looking through a microscope You know the raw data are in the right format if you: Ran no software on the data Did not modify any of the data values You did not remove any data from the data set You did not summarize the data in any way If you made any modifications of the raw data it is not the raw form of the data. Reporting modified data as raw data is a very common way to slow down the analysis process, since the analyst will often have to do a forensic study of your data to figure out why the raw data looks weird. (Also imagine what would happen if new data arrived?) The tidy data set The general principles of tidy data are laid out by Hadley Wickham in this paper and this video. While both the paper and the video describe tidy data using R, the principles are more generally applicable: Each variable you measure should be in one column Each different observation of that variable should be in a different row There should be one table for each "kind" of variable If you have multiple tables, they should include a column in the table that allows them to be joined or merged While these are the hard and fast rules, there are a number of other things that will make your data set much easier to handle. First is to include a row at the top of each data table/spreadsheet that contains full row names. So if you measured age at diagnosis for patients, you would head that column with the name AgeAtDiagnosis instead of something like ADx or another abbreviation that may be hard for another person to understand. Here is an example of how this would work from genomics. Suppose that for 20 people you have collected gene expression measurements with RNA-sequencing. You have also collected demographic and clinical information about the patients including their age, treatment, and diagnosis. You would have one table/spreadsheet that contains the clinical/demographic information. It would have four columns (patient id, age, treatment, diagnosis) and 21 rows (a row with variable names, then one row for every patient). You would also have one spreadsheet for the summarized genomic data. Usually this type of data is summarized at the level of the number of counts per exon. Suppose you have 100,000 exons, then you would have a table/spreadsheet that had 21 rows (a row for gene names, and one row for each patient) and 100,001 columns (one row for patient ids and one row for each data type). If you are sharing your data with the collaborator in Excel, the tidy data should be in one Excel file per table. They should not have multiple worksheets, no macros should be applied to the data, and no columns/cells should be highlighted. Alternatively share the data in a CSV or TAB-delimited text file. (Beware however that reading CSV files into Excel can sometimes lead to non-reproducible handling of date and time variables.) The code book For almost any data set, the measurements you calculate will need to be described in more detail than you can or should sneak into the spreadsheet. The code book contains this information. At minimum it should contain: Information about the variables (including units!) in the data set not contained in the tidy data Information about the summary choices you made Information about the experimental study design you used In our genomics example, the analyst would want to know what the unit of measurement for each clinical/demographic variable is (age in years, treatment by name/dose, level of diagnosis and how heterogeneous). They would also want to know how you picked the exons you used for summarizing the genomic data (UCSC/Ensembl, etc.). They would also want to know any other information about how you did the data collection/study design. For example, are these the first 20 patients that walked into the clinic? Are they 20 highly selected patients by some characteristic like age? Are they randomized to treatments? A common format for this document is a Word file. There should be a section called "Study design" that has a thorough description of how you collected the data. There is a section called "Code book" that describes each variable and its units. How to code variables When you put variables into a spreadsheet there are several main categories you will run into depending on their data type: Continuous Ordinal Categorical Missing Censored Continuous variables are anything measured on a quantitative scale that could be any fractional number. An example would be something like weight measured in kg. Ordinal data are data that have a fixed, small (< 100) number of levels but are ordered. This could be for example survey responses where the choices are: poor, fair, good. Categorical data are data where there are multiple categories, but they aren't ordered. One example would be sex: male or female. This coding is attractive because it is self-documenting. Missing data are data that are unobserved and you don't know the mechanism. You should code missing values as NA. Censored data are data where you know the missingness mechanism on some level. Common examples are a measurement being below a detection limit or a patient being lost to follow-up. They should also be coded as NA when you don't have the data. But you should also add a new column to your tidy data called, "VariableNameCensored" which should have values of TRUE if censored and FALSE if not. In the code book you should explain why those values are missing. It is absolutely critical to report to the analyst if there is a reason you know about that some of the data are missing. You should also not impute/make up/ throw away missing observations. In general, try to avoid coding categorical or ordinal variables as numbers. When you enter the value for sex in the tidy data, it should be "male" or "female". The ordinal values in the data set should be "poor", "fair", and "good" not 1, 2 ,3. This will avoid potential mixups about which direction effects go and will help identify coding errors. Always encode every piece of information about your observations using text. For example, if you are storing data in Excel and use a form of colored text or cell background formatting to indicate information about an observation ("red variable entries were observed in experiment 1.") then this information will not be exported (and will be lost!) when the data is exported as raw text. Every piece of data should be encoded as actual text that can be exported. The instruction list/script You may have heard this before, but reproducibility is a big deal in computational science. That means, when you submit your paper, the reviewers and the rest of the world should be able to exactly replicate the analyses from raw data all the way to final results. If you are trying to be efficient, you will likely perform some summarization/data analysis steps before the data can be considered tidy. The ideal thing for you to do when performing summarization is to create a computer script (in R, Python, or something else) that takes the raw data as input and produces the tidy data you are sharing as output. You can try running your script a couple of times and see if the code produces the same output. In many cases, the person who collected the data has incentive to make it tidy for a statistician to speed the process of collaboration. They may not know how to code in a scripting language. In that case, what you should provide the statistician is something called pseudocode. It should look something like: Step 1 - take the raw file, run version 3.1.2 of summarize software with parameters a=1, b=2, c=3 Step 2 - run the software separately for each sample Step 3 - take column three of outputfile.out for each sample and that is the corresponding row in the output data set You should also include information about which system (Mac/Windows/Linux) you used the software on and whether you tried it more than once to confirm it gave the same results. Ideally, you will run this by a fellow student/labmate to confirm that they can obtain the same output file you did. What you should expect from the analyst When you turn over a properly tidied data set it dramatically decreases the workload on the statistician. So hopefully they will get back to you much sooner. But most careful statisticians will check your recipe, ask questions about steps you performed, and try to confirm that they can obtain the same tidy data that you did with, at minimum, spot checks. You should then expect from the statistician: An analysis script that performs each of the analyses (not just instructions) The exact computer code they used to run the analysis All output files/figures they generated. This is the information you will use in the supplement to establish reproducibility and precision of your results. Each of the steps in the analysis should be clearly explained and you should ask questions when you don't understand what the analyst did. It is the responsibility of both the statistician and the scientist to understand the statistical analysis. You may not be able to perform the exact analyses without the statistician's code, but you should be able to explain why the statistician performed each step to a labmate/your principal investigator. Contributors Jeff Leek - Wrote the initial version. L. Collado-Torres - Fixed typos, added links. Nick Reich - Added tips on storing data as text. Nick Horton - Minor wording suggestions.
Lspringer24
# Tableau Homework - Citi Bike Analytics ### Before You Begin * This assignment will be saved to your tableau public account rather than github. * If you haven't already, be sure to create a tableau public account [here](https://public.tableau.com/s/). * The free tier of tableau only lets you save to their public server. This means that each time you save your file it will be uploaded to your tableau public profile. * You are able to load and continue working on the same workbook. * When you are finished with your assignment, you will turn in the URL to your tableau public workbook along with any additional files used for your analysis. ## Background  Congratulations on your new job! As the new lead analyst for the [New York Citi Bike](https://en.wikipedia.org/wiki/Citi_Bike) Program, you are now responsible for overseeing the largest bike sharing program in the United States. In your new role, you will be expected to generate regular reports for city officials looking to publicize and improve the city program. Since 2013, the Citi Bike Program has implemented a robust infrastructure for collecting data on the program's utilization. Through the team's efforts, each month bike data is collected, organized, and made public on the [Citi Bike Data](https://www.citibikenyc.com/system-data) webpage. However, while the data has been regularly updated, the team has yet to implement a dashboard or sophisticated reporting process. City officials have a number of questions on the program, so your first task on the job is to build a set of data reports to provide the answers. ## Task **Your task in this assignment is to aggregate the data found in the Citi Bike Trip History Logs and find two unexpected phenomena.** **Design 2-5 visualizations for each discovered phenomena (4-10 total). You may work with a timespan of your choosing. Optionally, you may merge multiple datasets from different periods.** **The following are some questions you may wish to tackle. Do not limit yourself to these questions; they are suggestions for a starting point. Be creative!** * How many trips have been recorded total during the chosen period? * By what percentage has total ridership grown? * How has the proportion of short-term customers and annual subscribers changed? * What are the peak hours in which bikes are used during summer months? * What are the peak hours in which bikes are used during winter months? * Today, what are the top 10 stations in the city for starting a journey? (Based on data, why do you hypothesize these are the top locations?) * Today, what are the top 10 stations in the city for ending a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for starting a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for ending a journey (Based on data, why?) * Today, what is the gender breakdown of active participants (Male v. Female)? * How effective has gender outreach been in increasing female ridership over the timespan? * How does the average trip duration change by age? * What is the average distance in miles that a bike is ridden? * Which bikes (by ID) are most likely due for repair or inspection in the timespan? * How variable is the utilization by bike ID? **Next, as a chronic over-achiever:** * Use your visualizations (does not have to be all of them) to design a dashboard for each phenomena. * The dashboards should be accompanied with an analysis explaining why the phenomena may be occuring. **City officials would also like to see one of the following visualizations:** * **Basic:** A static map that plots all bike stations with a visual indication of the most popular locations to start and end a journey with zip code data overlaid on top. * **Advanced:** A dynamic map that shows how each station's popularity changes over time (by month and year). Again, with zip code data overlaid on the map. * The map you choose should also be accompanied by a write-up unveiling any trends that were noticed during your analysis. **Finally, create your final presentation** * Create a Tableau story that brings together the visualizations, requested maps, and dashboards. * This is what will be presented to the officials, so be sure to make it professional, logical, and visually appealing. ## Considerations Remember, the people reading your analysis will **NOT** be data analysts. Your audience will be city officials, public administrators, and heads of New York City departments. Your data and analysis needs to be presented in a way that is focused, concise, easy-to-understand, and visually compelling. Your visualizations should be colorful enough to be included in press releases, and your analysis should be thoughtful enough for dictating programmatic changes. ## Submission Your final submission should include: * A link to your Tableau Public workbook that includes: * 4-10 Total "Phenomenon" Visualizations * 2 Dashboards * 1 City Official Map * 1 Story * A text or markdown file with your analysis on the phenomenons you uncovered from the data. ## Assessment Your final product will be assessed on the following metrics: * Analytic Rigor * Readability * Visual Attraction ## Hints * You may need to get creative in how you combine each of the CSV files. Don't just assume Tableau is the right tool for the job. At this point, you have a wealth of technical skills and research abilities. Dig for an approach that works and just go with it. * Don't just assume the CSV format hasn't changed since 2013. Subtle changes to the formats in any of your columns can blockade your analysis. Ensure your data is consistent and clean throughout your analysis. (Hint: Start and End Time change at some point in the history logs). * Consider building your visualizations with small extracts of the data (i.e. single files) before attempting to import the whole thing. What you will find is that importing all 20+ million records of data will create performance issues quickly. Welcome to "Big Data." * While utilizing all of the data may seem like a nice power play, consider the time-course in making your analysis. Is data from 2013 the most relevant for making bike replacement decisions today? Probably not. Don't let overwhelming data fool you. Ground your analysis in common sense. * Remember, data alone doesn't "answer" anything. You will need to accompany your data visualizations with clear and directed answers and analysis. * As is often the case, your clients are asking for a LOT of answers. Be considerate about their need-to-know and the importance of not "cramming in everything". Of course, answer each question, but do so in a way that is organized and presentable. * Since this is a project for the city, spend the appropriate time thinking through decisions on color schemes, fonts, and visual story-telling. The Citi Bike program has a clear visual footprint. As a suggestion, look for ways to have your data visualizations match their aesthetic tones. * Pay attention to labels. What exactly is "time duration"? What's the value of "age of birth"? You will almost certainly need calculated fields to get what you need. * Keep a close eye for obvious outliers or false data. Not everyone who signs up for the program is answering honestly. * In answering the question of "why" a phenomenon is occurring, consider adding other pieces of information on socioeconomic or other geographic data. Tableau has a map "layer" feature that you may find handy. * Don't be afraid to manipulate your data and play with settings in Tableau. Tableau is meant to be explored. We haven't covered all that you need -- so you will need to keep an eye out for new tricks. * Treat this as a serious endeavor! This is an opportunity to show future employers that you have what it takes to be a top-notch analyst. * Good luck! ### Copyright Data Boot Camp (C) 2019. All Rights Reserved.
stefanopedicinogit
Project description:You work as an analyst for the telecom operator Megaline. The company offers its clients two prepaid plans, Surf and Ultimate. The commercial department wants to know which of the plans brings in more revenue in order to adjust the advertising budget. You are going to carry out a preliminary analysis of the plans based on a relatively small client selection. You'll have the data on 500 Megaline clients: who the clients are, where they're from, which plan they use, and the number of calls they made and text messages they sent in 2018. Your job is to analyze clients' behavior and determine which prepaid plan brings in more revenue. Description of the plans Note: Megaline rounds seconds up to minutes, and megabytes to gigabytes. For calls, each individual call is rounded up: even if the call lasted just one second, it will be counted as one minute. For web traffic, individual web sessions are not rounded up. Instead, the total for the month is rounded up. If someone uses 1025 megabytes this month, they will be charged for 2 gigabytes.