Found 389 repositories(showing 30)
mesbahiba
Gain the job-ready skills for an entry-level data analyst role through this eight-course Professional Certificate from IBM and position yourself competitively in the thriving job market for data analysts, which will see a 20% growth until 2028 (U.S. Bureau of Labor Statistics). Power your data analyst career by learning the core principles of data analysis and gaining hands-on skills practice. You’ll work with a variety of data sources, project scenarios, and data analysis tools, including Excel, SQL, Python, Jupyter Notebooks, and Cognos Analytics, gaining practical experience with data manipulation and applying analytical techniques.
abhilashvijayannair
For this project, you will assume the role of a Data Scientist / Data Analyst working for a new startup investment firm that helps customers invest their money in stocks. Your job is to extract financial data like historical share price and quarterly revenue reportings from various sources using Python libraries and webscraping on popular stocks. After collecting this data you will visualize it in a dashboard to identify patterns or trends. The stocks we will work with are Tesla, Amazon, AMD, and GameStop. Dashboard Analytics Displayed A dashboard often provides a view of key performance indicators in a clear way. Analyzing a data set and extracting key performance indicators will be practiced. Prompts will be used to support learning in accessing and displaying data in dashboards. Learning how to display key performance indicators on a dashboard will be included in this assignment. We will be using Plotly in this course for data visualization and is not a requirement to take this course. Watson Studio In the Python for Data Science, AI and Development course you utilized Skills Network Labs for hands-on labs. For this project you will use Skills Network Labs and Watson Studio. Skills Network Labs is a sandbox environment for learning and completing labs in courses. Whereas Watson Studio, a component of IBM Cloud Pak for Data, is a suite of tools and a collaborative environment for data scientists, data analysts, AI and machine learning engineers and domain experts to develop and deploy your projects. Review criteria There are two hands-on labs on Extracting Stock Data and one assignment to complete. You will be judged by completing two quizzes and one peer review assignment. The quizzes will test you based on the output of the hands-on labs. In the peer review assignment you will share and take screen shots of the outcomes of your assignment.
pnguenda
# Pandas Homework - Pandas, Pandas, Pandas ## Background The data dive continues! Now, it's time to take what you've learned about Python Pandas and apply it to new situations. For this assignment, you'll need to complete **one of two** (not both) Data Challenges. Once again, which challenge you take on is your choice. Just be sure to give it your all -- as the skills you hone will become powerful tools in your data analytics tool belt. ### Before You Begin 1. Create a new repository for this project called `pandas-challenge`. **Do not add this homework to an existing repository**. 2. Clone the new repository to your computer. 3. Inside your local git repository, create a directory for the Pandas Challenge you choose. Use folder names corresponding to the challenges: **HeroesOfPymoli** or **PyCitySchools**. 4. Add your Jupyter notebook to this folder. This will be the main script to run for analysis. 5. Push the above changes to GitHub or GitLab. ## Option 1: Heroes of Pymoli  Congratulations! After a lot of hard work in the data munging mines, you've landed a job as Lead Analyst for an independent gaming company. You've been assigned the task of analyzing the data for their most recent fantasy game Heroes of Pymoli. Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights. Your final report should include each of the following: ### Player Count * Total Number of Players ### Purchasing Analysis (Total) * Number of Unique Items * Average Purchase Price * Total Number of Purchases * Total Revenue ### Gender Demographics * Percentage and Count of Male Players * Percentage and Count of Female Players * Percentage and Count of Other / Non-Disclosed ### Purchasing Analysis (Gender) * The below each broken by gender * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Gender ### Age Demographics * The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.) * Purchase Count * Average Purchase Price * Total Purchase Value * Average Purchase Total per Person by Age Group ### Top Spenders * Identify the the top 5 spenders in the game by total purchase value, then list (in a table): * SN * Purchase Count * Average Purchase Price * Total Purchase Value ### Most Popular Items * Identify the 5 most popular items by purchase count, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value ### Most Profitable Items * Identify the 5 most profitable items by total purchase value, then list (in a table): * Item ID * Item Name * Purchase Count * Item Price * Total Purchase Value As final considerations: * You must use the Pandas Library and the Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of three observable trends based on the data. * See [Example Solution](HeroesOfPymoli/HeroesOfPymoli_starter.ipynb) for a reference on expected format. ## Option 2: PyCitySchools  Well done! Having spent years analyzing financial records for big banks, you've finally scratched your idealistic itch and joined the education sector. In your latest role, you've become the Chief Data Scientist for your city's school district. In this capacity, you'll be helping the school board and mayor make strategic decisions regarding future school budgets and priorities. As a first task, you've been asked to analyze the district-wide standardized test results. You'll be given access to every student's math and reading scores, as well as various information on the schools they attend. Your responsibility is to aggregate the data to and showcase obvious trends in school performance. Your final report should include each of the following: ### District Summary * Create a high level snapshot (in table form) of the district's key metrics, including: * Total Schools * Total Students * Total Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### School Summary * Create an overview table that summarizes key metrics about each school, including: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Top Performing Schools (By % Overall Passing) * Create a table that highlights the top 5 performing schools based on % Overall Passing. Include: * School Name * School Type * Total Students * Total School Budget * Per Student Budget * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Bottom Performing Schools (By % Overall Passing) * Create a table that highlights the bottom 5 performing schools based on % Overall Passing. Include all of the same metrics as above. ### Math Scores by Grade\*\* * Create a table that lists the average Math Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Reading Scores by Grade * Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school. ### Scores by School Spending * Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following: * Average Math Score * Average Reading Score * % Passing Math (The percentage of students that passed math.) * % Passing Reading (The percentage of students that passed reading.) * % Overall Passing (The percentage of students that passed math **and** reading.) ### Scores by School Size * Repeat the above breakdown, but this time group schools based on a reasonable approximation of school size (Small, Medium, Large). ### Scores by School Type * Repeat the above breakdown, but this time group schools based on school type (Charter vs. District). As final considerations: * Use the pandas library and Jupyter Notebook. * You must submit a link to your Jupyter Notebook with the viewable Data Frames. * You must include a written description of at least two observable trends based on the data. * See [Example Solution](PyCitySchools/PyCitySchools_starter.ipynb) for a reference on the expected format. ## Hints and Considerations * These are challenging activities for a number of reasons. For one, these activities will require you to analyze thousands of records. Hacking through the data to look for obvious trends in Excel is just not a feasible option. The size of the data may seem daunting, but pandas will allow you to efficiently parse through it. * Second, these activities will also challenge you by requiring you to learn on your feet. Don't fool yourself into thinking: "I need to study pandas more closely before diving in." Get the basic gist of the library and then _immediately_ get to work. When facing a daunting task, it's easy to think: "I'm just not ready to tackle it yet." But that's the surest way to never succeed. Learning to program requires one to constantly tinker, experiment, and learn on the fly. You are doing exactly the _right_ thing, if you find yourself constantly practicing Google-Fu and diving into documentation. There is just no way (or reason) to try and memorize it all. Online references are available for you to use when you need them. So use them! * Take each of these tasks one at a time. Begin your work, answering the basic questions: "How do I import the data?" "How do I convert the data into a DataFrame?" "How do I build the first table?" Don't get intimidated by the number of asks. Many of them are repetitive in nature with just a few tweaks. Be persistent and creative! * Expect these exercises to take time! Don't get discouraged if you find yourself spending hours initially with little progress. Force yourself to deal with the discomfort of not knowing and forge ahead. Consider these hours an investment in your future! * As always, feel encouraged to work in groups and get help from your TAs and Instructor. Just remember, true success comes from mastery and _not_ a completed homework assignment. So challenge yourself to truly succeed! ### Copyright Trilogy Education Services © 2019. All Rights Reserved.
ShahadShaikh
Problem Statement Introduction So far, in this course, you have learned about the Hadoop Framework, RDBMS design, and Hive Querying. You have understood how to work with an EMR cluster and write optimised queries on Hive. This assignment aims at testing your skills in Hive, and Hadoop concepts learned throughout this course. Similar to Big Data Analysts, you will be required to extract the data, load them into Hive tables, and gather insights from the dataset. Problem Statement With online sales gaining popularity, tech companies are exploring ways to improve their sales by analysing customer behaviour and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging. Needless to say, the role of big data analysts is among the most sought-after job profiles of this decade. Therefore, as part of this assignment, we will be challenging you, as a big data analyst, to extract data and gather insights from a real-life data set of an e-commerce company. In the next video, you will learn the various stages in collecting and processing the e-commerce website data. Play Video2079378 One of the most popular use cases of Big Data is in eCommerce companies such as Amazon or Flipkart. So before we get into the details of the dataset, let us understand how eCommerce companies make use of these concepts to give customers product recommendations. This is done by tracking your clicks on their website and searching for patterns within them. This kind of data is called a clickstream data. Let us understand how it works in detail. The clickstream data contains all the logs as to how you navigated through the website. It also contains other details such as time spent on every page, etc. From this, they make use of data ingesting frameworks such as Apache Kafka or AWS Kinesis in order to store it in frameworks such as Hadoop. From there, machine learning engineers or business analysts use this data to derive valuable insights. In the next video, Kautuk will give you a brief idea on the data that is used in this case study and the kind of analysis you can perform with the same. Play Video2079378 For this assignment, you will be working with a public clickstream dataset of a cosmetics store. Using this dataset, your job is to extract valuable insights which generally data engineers come up within an e-retail company. So now, let us understand the dataset in detail in the next video. Play Video2079378 You will find the data in the link given below. https://e-commerce-events-ml.s3.amazonaws.com/2019-Oct.csv https://e-commerce-events-ml.s3.amazonaws.com/2019-Nov.csv You can find the description of the attributes in the dataset given below. In the next video, you will learn about the various implementation stages involved in this case study. Attribute Description Download Play Video2079378 The implementation phase can be divided into the following parts: Copying the data set into the HDFS: Launch an EMR cluster that utilizes the Hive services, and Move the data from the S3 bucket into the HDFS Creating the database and launching Hive queries on your EMR cluster: Create the structure of your database, Use optimized techniques to run your queries as efficiently as possible Show the improvement of the performance after using optimization on any single query. Run Hive queries to answer the questions given below. Cleaning up Drop your database, and Terminate your cluster You are required to provide answers to the questions given below. Find the total revenue generated due to purchases made in October. Write a query to yield the total sum of purchases per month in a single output. Write a query to find the change in revenue generated due to purchases from October to November. Find distinct categories of products. Categories with null category code can be ignored. Find the total number of products available under each category. Which brand had the maximum sales in October and November combined? Which brands increased their sales from October to November? Your company wants to reward the top 10 users of its website with a Golden Customer plan. Write a query to generate a list of top 10 users who spend the most. Note: To write your queries, please make necessary optimizations, such as selecting the appropriate table format and using partitioned/bucketed tables. You will be awarded marks for enhancing the performance of your queries. Each question should have one query only. Use a 2-node EMR cluster with both the master and core nodes as M4.large. Make sure you terminate the cluster when you are done working with it. Since EMR can only be terminated and cannot be stopped, always have a copy of your queries in a text editor so that you can copy-paste them every time you launch a new cluster. Do not leave PuTTY idle for so long. Do some activity like pressing the space bar at regular intervals. If the terminal becomes inactive, you don't have to start a new cluster. You can reconnect to the master node by opening the puTTY terminal again, giving the host address and loading .ppk key file. For your information, if you are using emr-6.x release, certain queries might take a longer time, we would suggest you use emr-5.29.0 release for this case study. There are different options for storing the data in an EMR cluster. You can briefly explore them in this link. In your previous module on hive querying, you copied the data to the local file system, i.e., to the master node's file system and performed the queries. Since the size of the dataset is large here in this case study, it is a good practice to load the data into the HDFS and not into the local file system. You can revisit the segment on 'Working with HDFS' from the earlier module on 'Introduction to Big data and Cloud'. You may have to use CSVSerde with the default properties value for loading the dataset into a Hive table. You can refer to this link for more details on using CSVSerde. Also, you may want to skip the column names from getting inserted into the Hive table. You can refer to this link on how to skip the headers.
andrewhryn
"📊 🇺🇸 Explore Data Analyst job trends across the USA with SQL! Uncover top-paying jobs, in-demand skills, and key market trends in the Data Analyst field.
rafabelokurows
Insights on skills and salaries using real data (scraped from Linkedin) - https://rafabelokurows.github.io/data-analyst-job-skills/
GaurabKundu1
As a Data Analyst, I have been tasked with collecting data from various sources and identifying trends for this year's report on emerging skills. My first task is to collect the top programming skills that are most in demand from various sources including: Job postings Training portals Surveys Once we have collected enough data, you will begin analyzing the data and identify insights and trends that may include the following: What are the top programming languages in demand? What are the top database skills in demand? What are the popular IDEs?
SHIVASHANKAR-V07
SQL-based analysis of Data Analyst jobs (2023) - salaries, skills demand, and career insights using PostgreSQL, inspired by Luke Barousse’s SQL course.
windyguo2046
Task Your task in this assignment is to aggregate the data found in the Citi Bike Trip History Logs to build a data dashboard, story, or report. You may work with a timespan of your choosing. If you're really ambitious, you can merge multiple datasets from different periods. Try to provide answers to the following questions: How many trips have been recorded total during the chosen period? By what percentage has total ridership grown? How has the proportion of short-term customers and annual subscribers changed? What are the peak hours in which bikes are used during summer months (for whatever year of data you selected)? What are the peak hours in which bikes are used during winter months (for whatever year of data you selected)? What are the top 10 stations in the city for starting a journey? (Based on data, why do you hypothesize these are the top locations?) What are the top 10 stations in the city for ending a journey? (Based on data, why?) What are the bottom 10 stations in the city for starting a journey? (Based on data, why?) What are the bottom 10 stations in the city for ending a journey (Based on data, why?) What is the gender breakdown of active participants (Male v. Female)? How does the average trip duration change by age? What is the average distance in miles that a bike is ridden? Which Bikes (by ID) are most likely due for repair or inspection this year? How variable is the utilization by bike ID? Additionally, city officials would like to see the following visualizations: A static map that plots all bike stations with a visual indication of the most popular locations to start and end a journey with zip code data overlaid on top. A dynamic map that shows how each station's popularity changes over time (by month and year) -- with commentary pointing to any interesting events that may be behind these phenomena. Lastly, as a chronic over-achiever, you must also: Find at least two unexpected phenomena in the data and provide a visualization and analysis to document their presence. Considerations Remember, the people reading your analysis will NOT be data analysts. Your audience will be city officials, public administrators, and heads of New York City departments. Your data and analysis needs to be presented in a way that is focused, concise, easy-to-understand, and visually compelling. Your visualizations should be colorful enough to be included in press releases, and your analysis should be thoughtful enough for dictating programmatic changes. Assessment Your final product will be assessed on the following metrics: Completeness of Analysis Analytic Rigor Readability Visual Attraction Professionalism Hints You may need to get creative in how you combine each of the CSVs. Don't just assume Tableau is the right tool for the job. At this point, you have a wealth of technical skills and research abilities. Dig for an approach that works and just go with it. Don't just assume the CSV format hasn't changed since 2013. Subtle changes to the formats in any of your columns can blockade your analysis. Ensure your data is consistent and clean throughout your analysis. (Hint: Start and End Time change at some point in the history logs). Consider building your dashboards with small extracts of the data (i.e. single files) before attempting to import the whole thing. What you will find is that importing all 20+ million records of data will create performance issues quickly. Welcome to "Big Data". While utilizing all of the data may seem like a nice power play, consider the time-course in making your analysis. Is data from 2013 the most relevant for making bike replacement decisions today? Probably not. Don't let overwhelming data fool you. Ground your analysis in common sense. Remember, data alone doesn't "answer" anything. You will need to accompany your data visualizations with clear and directed answers and analysis. As is often the case, your clients are asking for a LOT of answers. Be considerate about their need-to-know and the importance of not "cramming in everything". Of course, answer each question, but do so in a way that is organized and presentable. Since this is a project for the city, spend the appropriate time thinking through decisions on color schemes, fonts, and visual story-telling. The Citi Bike program has a clear visual footprint. As a suggestion, look for ways to have your data visualizations match their aesthetic tones. Pay attention to labels. What exactly is "time duration"? What's the value of "age of birth"? You will almost certainly need calculated fields to get what you need. Keep a close eye for obvious outliers or false data. Not everyone who signs up for the program is answering honestly. In answering the question of "why" a phenomena is happening, consider adding other pieces of information on socioeconomics or other geographic data. Tableau has a map "layer" feature that you may find handy. Don't be afraid to manipulate your data and play with settings in Tableau. Tableau is meant to be explored. We haven't covered all that you need -- so you will need to keep an eye out for new tricks. The final "format" of your deliverable is up to you. It can be an embedded Tableau dashboard, a Tableau Story, a Tableau visualization + PDF -- you name it. The bottom line is: This is your story to tell. Use the medium you deem most effective. (But you should definitely be using Tableau in some way!) Treat this as a serious endeavor! This is an opportunity to show future employers that you have what it takes to be a top-notch analyst.
Crmitsolution
Quick Start to Field Service Lightning With Field Service Lightning, you can streamline operations across the full service chain on a single platform, resulting in a more integrated client experience. You can redefine the connected customer experience by giving your agents, dispatchers, and mobile staff the tools they need to provide a strong customer experience anytime, anywhere, and on any mobile device with Field Service Lightning. How does Field Service Lightning Benefit the Workforce Field Service Lightning is a single platform that links your entire workforce, allowing you to provide your clients with faster, smarter, and more personalized on-site service. It not only allows your employees to stay on top of things, but it also allows customers to easily book their own appointments through a customer community. From the bottom of the food chain to the top, Field Service Lightning delivers vital tools for all team members. Mobile Employees Use the all-in-one mobile app to get all of the information they need to execute each job properly. They can see the parts they'll need, the steps they'll need to complete each assignment, and even get directions to the location. Support Agents Have access to appointment scheduling, which allows them to see detailed case feeds, real-time milestone monitoring, and worker skills and knowledge. Dispatchers From the dispatcher dashboard, you can view and manage all scheduled tasks, coordinate resources, and use Map View and Field Service Management Tools for real-time monitoring of all field employees. Managers Capable of managing field resource management, as well as critical customer and employee KPIs, in order to ensure that operations are as effective and productive as possible in the salesforce service cloud. Capture Data and Focus on Success The appointment scheduling procedure is automated with salesforce field service management. As a result, field service requests are resolved faster and the appropriate service resource is allocated to the project, resulting in increased customer satisfaction. The productivity of service resources is boosted by having all of the tools needed on mobile devices to handle work orders and service reports rapidly. Everything is stored in one location, from installations and repairs to preventative maintenance, in order to retain that important 360-degree client view. Consider several salesforce field service lightning pricing before integration. Outcomes Create work orders quickly from any case. Create and manage field service work orders, as well as access Knowledge articles and track SLA compliance with Milestones. Work orders are linked to Accounts, Contacts, Assets, Cases, Entitlements, and other Salesforce Field Service Lightning objects, allowing you to pull data from several sources. Optimize scheduling and assign jobs sensibly. From the Service Console or a Customer Community, you may book truth-based service appointments right away with Field Service Lightning Implementation. To boost employee productivity, jobs are automatically assigned to the proper resource based on time, talents, location, and any business rules with intelligent scheduling. Companies have incorporated sophisticated scheduling and tracking of employees, equipment, and trucks to guarantee the proper parts are accessible for the job with economical field service salesforce pricing. Increase your first-time repair rate by leveraging the power of AI. Field Service Lightning Consulting analysts may utilize image recognition with AI Vision to quickly identify assets and parts in photos. To avoid confusion and extra trips back to the office, make sure the correct product part is repaired or replaced. You can automatically prescribe the right set up steps for technicians based on an image's classification, so they can provide faster, smarter Salesforce Field Service.
TheDevNick
A scheduling desktop user interface application for WGU course C195 Task 1: Java Application Development Introduction: Throughout your career in software design and development, you will be asked to create applications with various features and criteria based on a variety of business requirements. For this assessment, you will create your own Java application with requirements that mirror those you will encounter in a real-world job assignment. The skills you will showcase in this assessment are also directly relevant to technical interview questions for future employment. This application should become a portfolio piece for you to show to future employers. Several attachments and links have been included to help you complete this task. Refer to the “MySQL Virtual Access Instructions” attachment for help accessing the database for your application. Note that this database is for functional purposes only and does not include any pre-existing data. The attached “Database ERD” shows the entity relationship diagram (ERD) for this database, which you can reference as you create your application. The preferred integrated development environment (IDE) for this assignment is NetBeans. Use the web link “NetBeans Installation Instructions” to install this connector. If you choose to use another IDE, you must export your project into NetBeans format for submission. When you have completed this task, you must submit a zip file with all the necessary code files to compile, support, and run your application. Scenario: You are working for a software company that has been contracted to develop a scheduling desktop user interface application. The contract is with a global consulting organization that conducts business in multiple languages and has main offices in Phoenix, Arizona; New York, New York; and London, England. The consulting organization has provided a MySQL database that your application must pull data from. The database is used for other systems and therefore its structure cannot be modified. The organization outlined specific business requirements that must be included as part of the application. From these requirements, a system analyst at your company created solution statements for you to implement in developing the application. These statements are listed in the requirements section. Requirements: Your submission must be your original work. No more than a combined total of 30% of the submission and no more than a 10% match to any one individual source can be directly quoted or closely paraphrased from sources, even if cited correctly. Use the Turnitin Originality Report available in Taskstream as a guide for this measure of originality. You must use the rubric to direct the creation of your submission because it provides detailed criteria that will be used to evaluate your work. Each requirement below may be evaluated by more than one rubric aspect. The rubric aspect titles may contain hyperlinks to relevant portions of the course. A. Create a log-in form that can determine the user’s location and translate log-in and error control messages (e.g., “The username and password did not match.”) into two languages. B. Provide the ability to enter and maintain customer records in the database, including name, address, and phone number. C. Write lambda expression(s) to schedule and maintain appointments, capturing the type of appointment and a link to the specific customer record in the database. D. Provide the ability to view the calendar by month and by week. E. Provide the ability to automatically adjust appointment times based on user time zones and daylight saving time. F. Write exception controls to prevent each of the following. You may use the same mechanism of exception control more than once, but you must incorporate at least two different mechanisms of exception control. • scheduling an appointment outside business hours • scheduling overlapping appointments • entering nonexistent or invalid customer data • entering an incorrect username and password G. Use lambda expressions to create standard pop-up and alert messages. H. Write code to provide reminders and alerts 15 minutes in advance of an appointment, based on the user’s log-in. I. Provide the ability to generate each of the following reports: • number of appointment types by month • the schedule for each consultant • one additional report of your choice J. Provide the ability to track user activity by recording timestamps for user log-ins in a .txt file. Each new record should be appended to the log file, if the file already exists.
AnshulSilhare
Python analysis of 100K+ US Data & Business Analyst job postings - skills demand, salary trends, and market insights.
Exploring job market trends for data analyst roles: skills in demand, salary shifts, and hiring patterns over time.
This project seeks to analyse the data analyst job listings found on LinkedIn. The aim of the jobs analysis is to find hidden insights about data analyst jobs such as what skills are most in demand by employers.
ddavid37
End-to-end SQL-based analysis of 2023’s data analyst job market—identifying top-paying roles, in-demand skills, and career growth strategies using PostgreSQL and real job posting data.
AzmeryLaizo
Web scraper developed to extract data from Google job board for the job titles database developer, data analyst, software engineer and web developer and perform text analysis to extract frequency of technical skills.
sharbanee7781
Analyzed the US data job market with Python to uncover in-demand and high-paying skills for Data Analysts. Used Pandas, Seaborn, and Matplotlib for data cleaning and visualization. Insights include skill demand trends, salary analysis, and optimal skills for career growth.
akashjborah97
Target: To analyze the ML joB market in India using Segmentation analysis for finding companies probable of hiring an ML Engineer/Data Analyst in respect to his/her skillset. Techniques and Algorithms used: Machine learning using python with libraries(numpy, pandas, scikit-learn, matplotlib) , elbow method, stability based structure analysis, k means clustering. In this project, we took a dataset form the website naukri.com analysed the skills and companies and using clustering algorithm we performed segmentation. Results : As Segmentation analysis is an important step before we embark on any plan. Hence it is important to learn how to analyze the job market and the demanded skills by the company. By analyzing the trend, we have observed cluster 0 contains companies which are inclined towards hiring people with Python skills on Data Science and Machine Learning. Cluster 1 contains companies which are likely to hire people with skills are not oriented towards Data Analysis. Cluster 2 contains companies which are inclined towards hiring people with Python and R skills on Data Science. Cluster 3 contains companies which are inclined towards hiring people with Python skills on Machine Learning. Cluster 4 contains companies which are likely to hire people with skills are not oriented towards Data Analysis. Cluster 5 contains companies which are likely to hire people with skills of Python, Machine Learning and minimal Data Science. The most demanded skills for the recruiters are Python, Data Science, Machine Learning and other IT skills. For the company’s analysis based on experience demanded, it was observed that Wipro, HiringSign, Global Logic and Gojek etc. didn’t appear in top numbers before the segmentation and appeared after the segmentation was carried out for the minimum, average and maximum experience data.
Lspringer24
# Tableau Homework - Citi Bike Analytics ### Before You Begin * This assignment will be saved to your tableau public account rather than github. * If you haven't already, be sure to create a tableau public account [here](https://public.tableau.com/s/). * The free tier of tableau only lets you save to their public server. This means that each time you save your file it will be uploaded to your tableau public profile. * You are able to load and continue working on the same workbook. * When you are finished with your assignment, you will turn in the URL to your tableau public workbook along with any additional files used for your analysis. ## Background  Congratulations on your new job! As the new lead analyst for the [New York Citi Bike](https://en.wikipedia.org/wiki/Citi_Bike) Program, you are now responsible for overseeing the largest bike sharing program in the United States. In your new role, you will be expected to generate regular reports for city officials looking to publicize and improve the city program. Since 2013, the Citi Bike Program has implemented a robust infrastructure for collecting data on the program's utilization. Through the team's efforts, each month bike data is collected, organized, and made public on the [Citi Bike Data](https://www.citibikenyc.com/system-data) webpage. However, while the data has been regularly updated, the team has yet to implement a dashboard or sophisticated reporting process. City officials have a number of questions on the program, so your first task on the job is to build a set of data reports to provide the answers. ## Task **Your task in this assignment is to aggregate the data found in the Citi Bike Trip History Logs and find two unexpected phenomena.** **Design 2-5 visualizations for each discovered phenomena (4-10 total). You may work with a timespan of your choosing. Optionally, you may merge multiple datasets from different periods.** **The following are some questions you may wish to tackle. Do not limit yourself to these questions; they are suggestions for a starting point. Be creative!** * How many trips have been recorded total during the chosen period? * By what percentage has total ridership grown? * How has the proportion of short-term customers and annual subscribers changed? * What are the peak hours in which bikes are used during summer months? * What are the peak hours in which bikes are used during winter months? * Today, what are the top 10 stations in the city for starting a journey? (Based on data, why do you hypothesize these are the top locations?) * Today, what are the top 10 stations in the city for ending a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for starting a journey? (Based on data, why?) * Today, what are the bottom 10 stations in the city for ending a journey (Based on data, why?) * Today, what is the gender breakdown of active participants (Male v. Female)? * How effective has gender outreach been in increasing female ridership over the timespan? * How does the average trip duration change by age? * What is the average distance in miles that a bike is ridden? * Which bikes (by ID) are most likely due for repair or inspection in the timespan? * How variable is the utilization by bike ID? **Next, as a chronic over-achiever:** * Use your visualizations (does not have to be all of them) to design a dashboard for each phenomena. * The dashboards should be accompanied with an analysis explaining why the phenomena may be occuring. **City officials would also like to see one of the following visualizations:** * **Basic:** A static map that plots all bike stations with a visual indication of the most popular locations to start and end a journey with zip code data overlaid on top. * **Advanced:** A dynamic map that shows how each station's popularity changes over time (by month and year). Again, with zip code data overlaid on the map. * The map you choose should also be accompanied by a write-up unveiling any trends that were noticed during your analysis. **Finally, create your final presentation** * Create a Tableau story that brings together the visualizations, requested maps, and dashboards. * This is what will be presented to the officials, so be sure to make it professional, logical, and visually appealing. ## Considerations Remember, the people reading your analysis will **NOT** be data analysts. Your audience will be city officials, public administrators, and heads of New York City departments. Your data and analysis needs to be presented in a way that is focused, concise, easy-to-understand, and visually compelling. Your visualizations should be colorful enough to be included in press releases, and your analysis should be thoughtful enough for dictating programmatic changes. ## Submission Your final submission should include: * A link to your Tableau Public workbook that includes: * 4-10 Total "Phenomenon" Visualizations * 2 Dashboards * 1 City Official Map * 1 Story * A text or markdown file with your analysis on the phenomenons you uncovered from the data. ## Assessment Your final product will be assessed on the following metrics: * Analytic Rigor * Readability * Visual Attraction ## Hints * You may need to get creative in how you combine each of the CSV files. Don't just assume Tableau is the right tool for the job. At this point, you have a wealth of technical skills and research abilities. Dig for an approach that works and just go with it. * Don't just assume the CSV format hasn't changed since 2013. Subtle changes to the formats in any of your columns can blockade your analysis. Ensure your data is consistent and clean throughout your analysis. (Hint: Start and End Time change at some point in the history logs). * Consider building your visualizations with small extracts of the data (i.e. single files) before attempting to import the whole thing. What you will find is that importing all 20+ million records of data will create performance issues quickly. Welcome to "Big Data." * While utilizing all of the data may seem like a nice power play, consider the time-course in making your analysis. Is data from 2013 the most relevant for making bike replacement decisions today? Probably not. Don't let overwhelming data fool you. Ground your analysis in common sense. * Remember, data alone doesn't "answer" anything. You will need to accompany your data visualizations with clear and directed answers and analysis. * As is often the case, your clients are asking for a LOT of answers. Be considerate about their need-to-know and the importance of not "cramming in everything". Of course, answer each question, but do so in a way that is organized and presentable. * Since this is a project for the city, spend the appropriate time thinking through decisions on color schemes, fonts, and visual story-telling. The Citi Bike program has a clear visual footprint. As a suggestion, look for ways to have your data visualizations match their aesthetic tones. * Pay attention to labels. What exactly is "time duration"? What's the value of "age of birth"? You will almost certainly need calculated fields to get what you need. * Keep a close eye for obvious outliers or false data. Not everyone who signs up for the program is answering honestly. * In answering the question of "why" a phenomenon is occurring, consider adding other pieces of information on socioeconomic or other geographic data. Tableau has a map "layer" feature that you may find handy. * Don't be afraid to manipulate your data and play with settings in Tableau. Tableau is meant to be explored. We haven't covered all that you need -- so you will need to keep an eye out for new tricks. * Treat this as a serious endeavor! This is an opportunity to show future employers that you have what it takes to be a top-notch analyst. * Good luck! ### Copyright Data Boot Camp (C) 2019. All Rights Reserved.
Sujalk63
Jobs & Skills Analysis of India using Python - A Data Analyst Project
fabenp
Data skills needed according to data analyst jobs description gathered with google API
Python
itisWasp
Repo for analyzing trending skills in data oriented jobs such as data science, data analyst, data engineer, etc... from google jobs portal.
umavybhavi-netizen
SQL project exploring high-paying data analyst jobs and the most valuable skills in demand.
NadiaRozman
Comprehensive Python analysis of job market trends, in‑demand skills, and pay for data analysts.
zhanglll-yh
Everything about jobs regarding data sience,classification of data analyst,what do they do mainly,skills need,how to find jobs ,etc.
Examined skills required for data analyst positions advertised on Google Jobs using real-world data from the API.
aparna190417
Data Analyst job market analysis, salary trends, skills demand, and prediction using Python, ML, and Tableau.
wolathedataguy
SQL project analyzing data analyst job market, highlighting skills, salaries, demand, and insights for career guidance.
NadiaRozman
SQL analysis project exploring job market trends for data analysts: salaries, in‑demand skills and career insights.