Found 4,325 repositories(showing 30)
ananas-analytics
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
dimitrinicolas
Automated image Editing, Optimization and Analysis via CLI and a web interface. You give to lepto your input and output directories, the plugins you want to use and their options. Then lepto does his job, you keep your original files and the structure of the input directory. Some plugins can even collect data (like primary colors) from your images and save them in a JSON file.
JustForFunnnn
A website of IT position data & analysis, helps you to get a better understanding of the requirements and trends of the IT job market
xploitspeeds
* READ THE README FOR INFO!! * Incoming Tags- z score statistics,find mean median mode statistics in ms excel,variance,standard deviation,linear regression,data processing,confidence intervals,average value,probability theory,binomial distribution,matrix,random numbers,error propagation,t statistics analysis,hypothesis testing,theorem,chi square,time series,data collection,sampling,p value,scatterplots,statistics lectures,statistics tutorials,business mathematics statistics,share stock market statistics in calculator,business analytics,GTA,continuous frequency distribution,statistics mathematics in real life,modal class,n is even,n is odd,median mean of series of numbers,math help,Sujoy Krishna Das,n+1/2 element,measurement of variation,measurement of central tendency,range of numbers,interquartile range,casio fx991,casio fx82,casio fx570,casio fx115es,casio 9860,casio 9750,casio 83gt,TI BAII+ financial,casio piano,casio calculator tricks and hacks,how to cheat in exam and not get caught,grouped interval data,equation of triangle rectangle curve parabola hyperbola,graph theory,operation research(OR),numerical methods,decision making,pie chart,bar graph,computer data analysis,histogram,statistics formula,matlab tutorial,find arithmetic mean geometric mean,find population standard deviation,find sample standard deviation,how to use a graphic calculator,pre algebra,pre calculus,absolute deviation,TI Nspire,TI 84 TI83 calculator tutorial,texas instruments calculator,grouped data,set theory,IIT JEE,AIEEE,GCSE,CAT,MAT,SAT,GMAT,MBBS,JELET,JEXPO,VOCLET,Indiastudychannel,IAS,IPS,IFS,GATE,B-Tech,M-Tech,AMIE,MBA,BBA,BCA,MCA,XAT,TOEFL,CBSE,ICSE,HS,WBUT,SSC,IUPAC,Narendra Modi,Sachin Tendulkar Farewell Speech,Dhoom 3,Arvind Kejriwal,maths revision,how to score good marks in exams,how to pass math exams easily,JEE 12th physics chemistry maths PCM,JEE maths shortcut techniques,quadratic equations,competition exams tips and ticks,competition maths,govt job,JEE KOTA,college math,mean value theorem,L hospital rule,tech guru awaaz,derivation,cryptography,iphone 5 fingerprint hack,crash course,CCNA,converting fractions,solve word problem,cipher,game theory,GDP,how to earn money online on youtube,demand curve,computer science,prime factorization,LCM & GCF,gauss elimination,vector,complex numbers,number systems,vector algebra,logarithm,trigonometry,organic chemistry,electrical math problem,eigen value eigen vectors,runge kutta,gauss jordan,simpson 1/3 3/8 trapezoidal rule,solved problem example,newton raphson,interpolation,integration,differentiation,regula falsi,programming,algorithm,gauss seidal,gauss jacobi,taylor series,iteration,binary arithmetic,logic gates,matrix inverse,determinant of matrix,matrix calculator program,sex in ranchi,sex in kolkata,vogel approximation VAM optimization problem,North west NWCR,Matrix minima,Modi method,assignment problem,transportation problem,simplex,k map,boolean algebra,android,casio FC 200v 100v financial,management mathematics tutorials,net present value NPV,time value of money TVM,internal rate of return IRR Bond price,present value PV and future value FV of annuity casio,simple interest SI & compound interest CI casio,break even point,amortization calculation,HP 10b financial calculator,banking and money,income tax e filing,economics,finance,profit & loss,yield of investment bond,Sharp EL 735S,cash flow casio,re finance,insurance and financial planning,investment appraisal,shortcut keys,depreciation,discounting
jobright-ai
Collection of 2026 New Grad Jobs in Data Analysis!
akchaudhary57
I have created Job ready Data Analyst Course in 90 Days which involves analysis in several platforms like Excel, Python, R and SQL along with Visualization in Tableau.
Aryia-Behroziuan
An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[68] Decision trees Main article: Decision tree learning Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making. Support vector machines Main article: Support vector machines Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.[69] An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Illustration of linear regression on a data set. Regression analysis Main article: Regression analysis Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel[70]), logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher-dimensional space. Bayesian networks Main article: Bayesian network A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet. A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Genetic algorithms Main article: Genetic algorithm A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.[71][72] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[73] Training models Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model. Federated learning Main article: Federated learning Federated learning is an adapted form of distributed artificial intelligence to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.[74] Applications There are many applications for machine learning, including: Agriculture Anatomy Adaptive websites Affective computing Banking Bioinformatics Brain–machine interfaces Cheminformatics Citizen science Computer networks Computer vision Credit-card fraud detection Data quality DNA sequence classification Economics Financial market analysis[75] General game playing Handwriting recognition Information retrieval Insurance Internet fraud detection Linguistics Machine learning control Machine perception Machine translation Marketing Medical diagnosis Natural language processing Natural language understanding Online advertising Optimization Recommender systems Robot locomotion Search engines Sentiment analysis Sequence mining Software engineering Speech recognition Structural health monitoring Syntactic pattern recognition Telecommunication Theorem proving Time series forecasting User behavior analytics In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy of its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.[76] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.[77] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis.[78] In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[79] In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings and that it may have revealed previously unrecognized influences among artists.[80] In 2019 Springer Nature published the first research book created using machine learning.[81] Limitations Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results.[82][83][84] Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[85] In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a collision.[86] Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of dollars invested.[87][88] Bias Main article: Algorithmic bias Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.[89] Language models learned from data have been shown to contain human-like biases.[90][91] Machine learning systems used for criminal risk assessment have been found to be biased against black people.[92][93] In 2015, Google photos would often tag black people as gorillas,[94] and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all.[95] Similar issues with recognizing non-white people have been found in many other systems.[96] In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language.[97] Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.[98] Concern for fairness in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including Fei-Fei Li, who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”[99] Model assessments Classification of machine learning models can be validated by accuracy estimation techniques like the holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[100] In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the false positive rate (FPR) as well as the false negative rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The total operating characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used receiver operating characteristic (ROC) and ROC's associated area under the curve (AUC).[101] Ethics Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[102] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[103][104] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning. Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.[105][106] Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.[107] Hardware Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training deep neural networks (a particular narrow subdomain of machine learning) that contain many layers of non-linear hidden units.[108] By 2019, graphic processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.[109] OpenAI estimated the hardware compute used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.[110][111] Software Software suites containing a variety of machine learning algorithms include the following: Free and open-source so
sharmaroshan
Data Visualizations is emerging as one of the most essential skills in almost all of the IT and Non IT Background Sectors and Jobs. Using Data Visualizations to make wiser decisions which could land the Business to make bigger profits and understand the root cause and behavioral analysis of people and customers associated to it. In this Repository I have deeply discussed about Line Plots, Bar plots, Scatter Plots, and Pie Charts, Apart from that I have Discussed scientific plots, 3d plots, animated plots, interactive plots to visualize any kind of business problem and that too of any complexity.
discdiver
Jupyter notebook for scraping and analysis of most in demand job technologies skills for data scientists.
Atharva-Phatak
Data Analysis of Job Postings on Glassdoor.
RafaelCartenet
Model Context Protocol (MCP) server for Databricks that empowers AI agents to autonomously interact with Unity Catalog metadata. Enables data discovery, lineage analysis, and intelligent SQL execution. Agents explore catalogs/schemas/tables, understand relationships, discover notebooks/jobs, and execute queries - greatly reducing ad-hoc query time.
jay-johnson
Create and manage multiple Kubernetes clusters using KVM on a bare metal Fedora 29 server. Includes helm + rook-ceph + nginx ingress + the stock analysis engine (jupyter + redis cluster + minio + automated cron jobs for data collection) - works on Kubernetes version v1.16.0 - 1.16.3 was not working
Data Science has been ranked as one of the hottest professions and the demand for data practitioners is booming. This Professional Certificate from IBM is intended for anyone interested in developing skills and experience to pursue a career in Data Science or Machine Learning. This program consists of 9 courses providing you with latest job-ready skills and techniques covering a wide array of data science topics including: open source tools and libraries, methodologies, Python, databases, SQL, data visualization, data analysis, and machine learning. You will practice hands-on in the IBM Cloud using real data science tools and real-world data sets. It is a myth that to become a data scientist you need a Ph.D. This Professional Certificate is suitable for anyone who has some computer skills and a passion for self-learning. No prior computer science or programming knowledge is necessary. We start small, re-enforce applied learning, and build up to more complex topics. Upon successfully completing these courses you will have done several hands-on assignments and built a portfolio of data science projects to provide you with the confidence to plunge into an exciting profession in Data Science. In addition to earning a Professional Certificate from Coursera, you will also receive a digital Badge from IBM recognizing your proficiency in Data Science.
Cryptoaj-hack
Decentralized Finance (DeFi) Development Services & Solutions Eliminate the role of a middleman by availing decentralized finance (DEFI) development services & solutions. Get access to the major financial services through a blockchain network and experience the benefits of automation, a higher level of security, anonymity, interoperability, and transparency. Our wide range of services include Market-Making Consulting We take immense efforts in establishing financial markets that understand the customers’ proprietary algorithms. We aim at improving the access of liquidity to investors and democratize the whole system. We render customized features according to the customer’s expected return on investment. Decentralized Crypto Banking We ensure a frictionless user experience by facilitating the direct transfer of value between the involved parties supported by decentralization. Our ready-to-launch white-label mobile payment apps render a variety of services such as wallet integration, value holding, and detailed transactional analysis. Defi Lottery System Development We provide a no-loss lottery system that benefits our participants completely. We take steps to eliminate the custodianship of the pooled capital. We permit investing your capital in other related dapps and distribute the rewards in form of a major share of the interest earned to a winner randomly selected by the smart contracts. We assure the regular flow of returns. Derivatives Over Defi Platform We ensure seamless access to derivatives and maximize your earning potential by many notches. by establishing robust dapps, we enable traders to hedge their portfolio of investments and minimize risks by directly engaging with their peers through a democratic platform. We are experts in derivatives market-making and Dapp platform development. Decentralized Fund Management All your crypto assets will be managed to yield high performance in a decentralized exchange through smart control and management. with in-depth experience in investment exchanges along with our strong knowledge of defi, we render our services at low fees and avoid potential risks. Defi Insurance System Development We ensure that there are no risks present in our smart contract. With our robust provision of insurance services, we assure you that there will be no chance of uncontrollable liquidity requests. We contain futuristic risks, uncertainties, and emergencies through lucrative insurance deals. Defi Yield Farming Platform Development Yield farming refers to the technique through which one can earn more cryptocurrencies by using his existing holding of cryptos. Liquidity providers play a vital role in the success of yield farming. They stake their assets in liquidity pools and facilitate trading in cryptos by creating a market. Defi Staking Platform Development Defi staking involves a mechanism where crypto assets will be staked on a supported wallet or exchange and passive income will be earned. The rewards can be calculated based on the quantity of staked assets, the staking duration, inflation rate, and the network issuance rate. Defi Lending Platform Development Defi lending platforms have been made popular by the likes of aave and compound. The basic features of a defi lending platform include flash loan facilities, a fiat payment gateway, and an exclusive margin trading facility, the advantages of defi lending include high immutability, better transparency, quick access, and resistance to transaction censorship. Defi Smart Contract Development One of the pivotal reasons behind the tremendous growth of defi services is due to the heavy investments made in robust defi smart contract development. They are created with the solidity programming language, highly encrypted, and automates the tasks to be executed based on certain pre-set terms and conditions. Defi Dapp Development Defi Dapp development plays a critical role to avoid the risk of a central point of failure. They are highly secure when compared to centralized applications due to the absence of a central authority. Defi Tokens Development Defi tokens development has played a critical role in boosting the growth of decentralized applications. Their value is currently higher than bitcoin. it has a huge trading volume and has garnered a lot of attention from the mainstream crowd in recent times. Defi Dex Development Like Uniswap Uniswap is one of the leading defi projects being undertaken. It is an innovative venture as it utilizes incentivized liquidity pools instead of regular order books. every user of uni swap will is rewarded with a percentage of fees incurred on every ethereum transaction for rendering liquidity to the system. Defi Wallet Development Traders will have complete control over their funds through defi wallet development without the interference of any authorities in the system. Supreme security is guaranteed for users without any compromise. By supplying customized private keys to every user, there will not be any chances for any loss of data. DeFi Marketing Services To assist DeFi projects gain user engagement, marketing services are indispensable.From drafting white paper, video and content marketing, to legal advisory, marketing and community management, our DeFi marketing and consulting services are well-versed to get the job done. DeFi Synthetic Asset Development Synthetic assets derive their value from underlying assets and derivatives which are essentially smart contracts. In DeFi, Synthetic assets have gained acclaim as they involve low risks and little chance of price fluctuations. Users can easily invest, trade, and own assets with no hassles. DeFi Solutions For Ecommerce Streamline your Ecommerce business with DeFi and its pragmatic tools. With DeFi’s solutions , benefits like omission of intermediaries, faster shipping, supply chain management, and real time tracking can be integrated with your Ecommerce business, increasing profits. DeFi Tokenization Development Tokenization Development is one of the pragmatic solutions DeFi offers. Users can now convert inoperative and underutilized assets into great profits by simply tokenizing their assets. With our DeFi tokenization, avail of ERC20, ERC721 & NFT tokens for your assets. DeFi Crowdfunding Platform Development Although a relatively new sector, DeFi crowdfunding has become the go-to mode of aggregating funds to support businesses and start-ups. Our DeFi Crowdfunding platform services come with additional benefits in the likes of tax benefits, instant approval, fundraising calendars and more. DeFi Real Estate Platform Development DeFi has revolutionized the ways of real estate management. Now real estate owners and investors, with the help of blockchain based tokens, can make property investment seamless and manageable. With fractional ownership, financial inclusivity is now possible. DeFi ICO Development One of the leading fundraising methods, DeFi ICO services are distinguished. Creating utile tokens, community management, escalating coin value, and launching projects with diligence & guidance from market analysts and blockchain experts is inclusive of our ICO Development. DeFi Exchange Development Offering users a plethora of apparent benefits, DEXs are the prized innovation of DeFi. Offering high-end security, durable liquidity, complete anonymity and financial inclusivity, DEXs make trading and transacting crypto accessible and lucrative for crypto enthusiasts. DeFi Protocol Like Yearn. Finance Yearn. Finance offers the best APY the market has to offer by referring to popular exchanges. This protocol offers its users the best yields in a highly secure network. With in-built smart contracts and an open source code, it supports a range of Stablecoins offering huge returns. DeFi Protocol Like AAve The DeFi protocol Aave offers crypto traders a robust platform for lending and borrowing of crypto for which they earn high interests. The highlight feature of Aave - Flash loans and flexible interest rates make it a profitable platform for crypto traders. DeFi Exchange Like 1inch 1inch exchange now has the reputation of being the DEX offering users the lowest slippage. As an aggregator, 1inch connects several exchanges to one platform in a non-custodial ecosystem. With governance and farming features, trading on 1inch remains prominent.
Devtown-India
Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day. Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used. In this project, you will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC. Day:1 In this project, Students will make use of Python to explore data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. You will write code to import the data and answer interesting questions about it by computing descriptive statistics. They will also write a script that takes in raw input to create an interactive experience in the terminal to present these statistics. Technologies that will be covered are Numpy, Pandas, Matplotlib, Seaborn, Jupyter notebook. We will be giving the students a deep dive into the Data Analytical process Day:2 We will be giving the students an insight into one of the major fields of Machine Learning ie. Time Series forcasting we will be taking them through the relevant theory and make them understand of the importance and different techniques that are available to deal with it. After that we will be working hands on the bike share data set implementing different algorithms and understanding them to the core We aim to provide students an insight into what exactly is the job of a data analyst and get them familiarise to how does the entire data analysis process work. The session will be hosted by Shaurya Sinha a data analyst at Jio and Parag Mittal Software engineer at Microsoft.
PhantomInsights
Data ETL & Analysis on thousands of job listings from the official Mexican job board (2020 edition).
Scraped job description and leveraged the concepts of Natural Language Processing (NLP) and GloVe Algorithm to extract the keywords through data and performed analysis. Presenting the vital keywords from data analyst job summary from the Indeed website..
alexander-n-thomas
This project is for the notebooks, code, and data for the "Vocabulary Analysis of Job Descriptions" tutorial at PyData 2017 Seattle
sch-paulo
Data analysis of the Brazilian data job market through LinkedIn scraping. Extracts and standardizes job titles and skills, with insights visualized in Power BI dashboards.
dan-grant-hunter
An analysis of abilities, skills and tech skills data from the O*NET database as well as classification of around 500 random LinkedIn job titles.
Hyperion101010
A scraper using python requests library , that can scrape over 4,00,000 job profiles over naukri.com .Also it creates excel files automatically from scraped data facilitating data analysis .
By learning and using prediction for failures, it is one of the important steps to improve the reliability of the cloud computing system. Furthermore, gave the ability to avoid incidents of failure and costs overhead of the system. It created a wonderful opportunity with the breakthroughs of machine learning and cloud storage that utilize generated huge data that provide pathways to predict when the system or hardware malfunction or fails. It can be used to improve the reliability of the system with the help of insights of using statistical analysis on the workload data from the cloud providers. This research will discuss regarding job usage data of tasks on the large “Google Cluster Workload Traces 2019” dataset, using multiple resampling techniques such as “Random Under Sampling, Random Oversampling and Synthetic Minority Oversampling Technique” to handle the imbalanced dataset. Furthermore, using multiple machine learning algorithm which is for traditional machine learning algorithm are “Logistic Regression, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier and Extreme Gradient Boosting Classifier” while deep learning algorithm using “Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)” for job failure prediction between imbalanced and balanced dataset. Then, to have a comparison of imbalanced and balanced in terms of model accuracy, error rate, sensitivity, f – measure, and precision. The results are Extreme Gradient Boosting Classifier and Gradient Boosting Classifier is the most performing algorithm with and without imbalanced handling techniques. It showcases that SMOTE is the best method to choose from for handling imbalanced data. The deep learning model of LSTM and Gated Recurrent Unit may be not the best for the in terms of accuracy, based on the ROC Curve its better than the XGBoost Classifier and Gradient Boosting Classifier.
aws-samples
In this pattern, data records are ingested and then modified with simple transformations such as field level substitutions and data enrichment from relatively small and static data sets. A Lambda function is invoked by Kinesis Data Firehose as records are received by the delivery stream. the Lambda function then performs a simple processing job and returns the transformed or enriched records back to Kinesis Data Firehose. Firehose then buffers and sends the modified records to the configured destinations. A copy of the source records is saved in S3 as a backup and for future analysis.
theaifutureguy
A comprehensive web scraping and data analysis platform that extracts job listings from theprotocol.it to analyze technology demand trends for developers in the Polish job market.
MNC-Aubin
No description available
AymaneSab
As a Data Developer, this project aims to conduct a comprehensive analysis of the emerging job market in data-related fields. The primary goals include targeted recruitment initiatives, talent acquisition, and skill development strategies. Additionally, the project focuses on creating clear and informative visualizations to enhance stakeholder unde
marcgarnica13
Understanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.
cdeweyx
Exploratory data analysis of job rejection emails
maciejzj
Data pipeline and meta-analysis dashboard for IT job postings from the web.
chesterking123
This repository contains my work done on the virtual internship provided by Australia's NSW Government on part-time job analysis of the work force data.