Found 340 repositories(showing 30)
Datavault-UK
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
st0pp3r
Online resources related to Detection Engineering. Detection rules, detection logic, attack samples, detection tests and emulation tools, logging configuration and best practices, event log references, resources, labs, data manipulation online tools, blogs, newsletters, good reads, books, trainings, podcasts, videos and twitter/x accounts.
Aryia-Behroziuan
An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[68] Decision trees Main article: Decision tree learning Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making. Support vector machines Main article: Support vector machines Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.[69] An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Illustration of linear regression on a data set. Regression analysis Main article: Regression analysis Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel[70]), logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher-dimensional space. Bayesian networks Main article: Bayesian network A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet. A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Genetic algorithms Main article: Genetic algorithm A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.[71][72] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[73] Training models Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model. Federated learning Main article: Federated learning Federated learning is an adapted form of distributed artificial intelligence to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.[74] Applications There are many applications for machine learning, including: Agriculture Anatomy Adaptive websites Affective computing Banking Bioinformatics Brain–machine interfaces Cheminformatics Citizen science Computer networks Computer vision Credit-card fraud detection Data quality DNA sequence classification Economics Financial market analysis[75] General game playing Handwriting recognition Information retrieval Insurance Internet fraud detection Linguistics Machine learning control Machine perception Machine translation Marketing Medical diagnosis Natural language processing Natural language understanding Online advertising Optimization Recommender systems Robot locomotion Search engines Sentiment analysis Sequence mining Software engineering Speech recognition Structural health monitoring Syntactic pattern recognition Telecommunication Theorem proving Time series forecasting User behavior analytics In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy of its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.[76] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.[77] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis.[78] In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[79] In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings and that it may have revealed previously unrecognized influences among artists.[80] In 2019 Springer Nature published the first research book created using machine learning.[81] Limitations Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results.[82][83][84] Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[85] In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a collision.[86] Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of dollars invested.[87][88] Bias Main article: Algorithmic bias Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.[89] Language models learned from data have been shown to contain human-like biases.[90][91] Machine learning systems used for criminal risk assessment have been found to be biased against black people.[92][93] In 2015, Google photos would often tag black people as gorillas,[94] and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all.[95] Similar issues with recognizing non-white people have been found in many other systems.[96] In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language.[97] Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.[98] Concern for fairness in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including Fei-Fei Li, who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”[99] Model assessments Classification of machine learning models can be validated by accuracy estimation techniques like the holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[100] In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the false positive rate (FPR) as well as the false negative rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The total operating characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used receiver operating characteristic (ROC) and ROC's associated area under the curve (AUC).[101] Ethics Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[102] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[103][104] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning. Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.[105][106] Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.[107] Hardware Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training deep neural networks (a particular narrow subdomain of machine learning) that contain many layers of non-linear hidden units.[108] By 2019, graphic processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.[109] OpenAI estimated the hardware compute used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.[110][111] Software Software suites containing a variety of machine learning algorithms include the following: Free and open-source so
iobruno
Data Engineering examples for Airflow, Prefect; dbt for BigQuery, Redshift, ClickHouse, Postgres, DuckDB; PySpark for Batch processing; Kafka for Stream processing
Repository này chứa toàn bộ tài liệu và bài lab cho khóa học Big Data Engineering, bao gồm các công nghệ và công cụ hiện đại trong Data Engineering stack. Tất cả các labs được thiết kế để học viên có thể thực hành từ cơ bản đến nâng cao với các use cases thực tế.
bhuiyanmobasshir94
This program provides the skills you need to advance your career in data engineering and recommends training to support your preparation for the industry-recognized Google Cloud Professional Data Engineer certification. Through a combination of presentations, demos, and labs, you will enable data-driven decision making by collecting, transforming, and publishing data; and you'll gain real world experience through a number of hands-on Qwiklabs projects.
DibyaRath
Production-grade distributed data engineering labs focused on Spark internals, large-scale performance tuning, skew handling, streaming state management, and FAANG-level system design.
Contents and labs for the Data Engineering Specialization offered by deeplearning.ai
PacktPublishing
This is the code repository for Ultimate AWS Data Engineering Bootcamp - 15 Real-World Labs, published by Packt
k9luo
ECE 155 | Engineering Design with Embedded Systems | Winter 2015 All labs are Android applications and were tested on Nexus 5 and Nexus 7 devices. Lab 1 - Sensor Reader Displays informaton from an android phone sensors. Sensors include Light sensor: Displays light intensity Accelerometer: Displays X,Y,Z acceleration and displays it on a graph with respect to time Magnetic Field Sensor: Displays X,Y,Z field values Rotation Sensor: Displays X,Y,Z rotation values Lab 2 - Step Counter Detects and counts user steps (device held in both hands with screen facing upwards) Uses accelerometer data to detect a step in the user's motion. Step counter is then incremented and displayed on screen. Step counter can be reset by pressing the reset button.> Lab 3 - Displacement Tracker Measures user displacement in NS/EW coordinates Uses existing code from Lab 2 and adds the user's direction to track their displacement. North has to be calibrated by facing North and hitting the calibrate button. Lab 4 - Map Guide Gives directions to the user to move from one point of a room to another. Uses existing code from Lab 3 to track the user's displacement. A map in an SVG format must be added to the app's files directory. The user can set their initial location and direction by pressing and holding on the map.
kfprugger
Azure Machine Learning, Data Engineering, and Data Collection Labs for 2019 and Beyond!
Sri Venkateshwara University (SVU) strives to create professionals who are not only adept in academics but also in application for the benefit of humanity. We foster a culture of learning by doing. We believe in nurturing students who are at the forefront of innovation by offering an environment of research & development to make us Best University in Uttar Pradesh (UP). SVU believes in experiential learning. To facilitate this, we have an ultra-modern infrastructure that motivates students to experiment & excel in their area of interest. The Best University of Moradabad has laboratories & workshops that signify our commitment to core research, thus enabling innovation. SVU is the only institution to have set up labs in collaboration with the industry. This way we can train our students on the latest skills & make them employable. Students sharpen their practical skills under the watch full eyes of trainers & become competent professionals. For the overall development of the students, we organize cultural programs. Students take part in these programs & exhibit their talent to become confident professionals. The annual fest attracts students from all over the country & showcase their talent to make us the Top University in India. We equipped the computing labs with the latest software & hardware to augment the technical skills of the students. SVU’s library is an epitome of knowledge. It has over 3000 books & journals that ensure the students are never short on intellectual input. The team of industry trainers educate them on the key skills so crucial for employment & make us the Best University in Gajraula. The specially created engineering labs assist engineers to refine their technical acumen so much needed for the country. The Chairman Dr. Sudhir Giri believes in removing all the economic & social barriers that can hinder education. Hence, SVU provides many scholarships & grants to meritorious students. Up till now, the college has enabled over 500000 students to attain their academic desires to make us the Best Private University in Uttar Pradesh (UP). The group is running a dozen educational institutions that include medical colleges in India & abroad. Our commitment towards education & healthcare has enabled Dr Sudhir Giri to win the International Glory Man of the year Award 2021. The Best Private University in Moradabad is on the Delhi Moradabad highway, well connected with rail & road. The green surroundings provide peace of mind that enables research based learning. The carefully recruited faculty is the pride of the university. They have years of industrial & academic experience so vital for the students. They transfer key skills & make us the Best Private University in Gajraula. The faculty encourages students to undertake research & sharpen their skills that will enable them to get jobs. Majority of the faculty members are doctorates who educate the students to become competent professionals. The faculty takes part in FDP in order to develop a culture of research. The specialty of SVU is the internship. We have partnered with leading industries for providing internship to the students. We believe that education without applicability is incomplete. Students gain hands on exposure through internship & become job ready. We place most of the students during internship to make us the Top University in India. SVU, the Best University in Uttar Pradesh (UP), adopts a futuristic teaching pedagogy. We strive for experiential learning of our students through role plays, projects & presentation. The students take part in the learning activity & imbibe concepts that enable their placements. The AC seminar & conference halls allow knowledge dispersion for the development of the students. The University is running over 150 undergraduate (UG), postgraduate (PG) courses, (Ph.D.), diploma and certificate courses in various fields of Applied Sciences, Medical Science, Humanities & Social Sciences. We also run courses in Languages, Design, Agriculture, Engineering & Technology, Nursing, Pharmacy, Paramedical, Commerce & Management, Law, Library & information Sciences, Mass Comm. & Journalism to enhance the employability of the youth. SVU has a culture of project based learning. Students do projects in each semester under the guidance of faculty. They complete these projects in earmarked industries to garner hands-on skills. Through these projects, we train students on the hot skills so crucial for employment to make us the Best University in Moradabad. SVU’s Research & Development (R&D) wing encourages students to work on research areas important for the country. We have partnered with leading research institutions to undertake research. The breath-taking infrastructure of the best university in Gajraula motivates researchers to achieve their goals for research. Owing to our dedication, SVU has received grants from GOI for research on areas of national importance. The faculty members provide guidance to the scholars until they achieve their aim. We have set up the incubation center to provide fillip to new ideas that foster entrepreneurship. We want to be an institution that supports the ‘Make in India’ vision of the government. The center supports new ideas that enable the young entrepreneurs to create startups & become successful. Under the strong leadership of Dr. Sudhir Giri, till date we have successfully incubated 150 start-ups. This speaks of our exemplary education & make us the Best Private University in Uttar Pradesh (UP). These startups are not only creating wealth but also providing employment to the needy. The industrialists have lamented that the epicenter for entrepreneurship will be the educational institutions. We need to provide them with the support & infrastructure for this. The annual hackathon attracts individuals who showcase their business acumen to make us the Best Private University in Moradabad. SVU has a dedicated International Research & collaboration Cell (IRCC) that collaborates with universities abroad. Faculty & students who want to pursue studies abroad the IRCC starts admission formalities for them. We have partnered with reputed institutions for providing excellent research collaborations. Those who wish to do P. HD abroad the IRCC help them gain admission & make us the Top University in India. A lot of our faculty members are pursuing their research internationally & contributing to the welfare of humanity. SVU strives to make our students feel comfortable at the campus. Separate hostel for boys & girls with 24 hour security is available at SVU. The cafeteria serves nutritious food to the students. Gym, recreation hall & the sports ground help to relax our students & make us the Best University in Uttar Pradesh (UP). The campus has an in house ATM & convenience store for the benefit of the students. SVU enables placement through exemplary training. We train on communication & interpersonal skills in order to refine the personality of the students. We make them practice mock interviews & group discussion that help to clear placement tests. Ninety percent of the students get placed before their last semester to make us the best university in Moradabad. We have hired industrial trainers in order to provide training on block chain, machine learning, artificial intelligence (AI), and python & data science. These trainers have years of experience that enables them in training the students. The students gain key insights on these technologies & sharpen their acumen to make us the Best University in Gajraula.
piyushpatel2005
This repository contains course labs and quiz from Deeplearning.ai Data Engineering specialization
themightywolfie
Students in engineering are required to create a docx file for all the practicals they perform in labs. The most annoying and tedious task is to format the data. To save time, I created a Python program to generate a docx file with predefined formatting and also generate a pdf of the same docx
hamzaamr21
Labs pour le cours Big Data Engineering
ankitchoudhary-vcf
Data Structures and Algorithms , Oops , Software Engineering Labs
whoiswelliton
Some of the projects, algorithms, encryptions, data structures, databases, labs and simulations developed during my bachelor degree in Computer Engineering at UTFPR-PB
ZipCodeCore
No description available
ZipCodeCore
No description available
oliviaslee
UC Berkeley Data-X (Industrial Engineering 135: Applied Data Science with Venture Application) semester project in partnership with Honda's 99P Labs + SHARE Mobility
d-laub
Data cleaning, analysis, and visualization for Chemical Engineering labs.
jhoogmoed
This ROS package maps user's gaze data from a Pupil Labs eye tracker, onto a different video feed, using markers, homography, and feature matching. This package is made by T.A.B. De Boer, J. Hoogmoed, N.M. Looye , J.R.P. van der Toorn, and R.P. de Vos as bachelor design project. Delf University of Technology, Mechanical Engineering, department of BioMechanical Engineering.
ali-john
This repo contains all data structures labs and assignments given at National University of science and Technology(NUST) in computer engineering department for batch 2023.
A comprehensive learning path and hands-on resource for the AWS Certified Data Engineer - Associate (DEA-C01) exam. This repository contains structured lessons, practical labs, detailed exam notes, and a collection of practice questions to help you master the core data engineering concepts on AWS.
ZipCodeCore
No description available
MohamedHossamEl-Din
This is a repository containing hands-on labs and exercises from ( Python Project for Data Engineering) course, a part of Coursera IBM Data Engineering Professional Certificate..
Zu1uDe1ta
No description available
STMM-Hub
A collection of my hands-on labs and assignments from the IBM Data Engineering Professional Certificate on Coursera.
BaidarSamir
Software Engineering For Data Science Labs
datatweets
AWS Data Engineering training focused on AWS Glue, Amazon Athena, and PySpark. Includes local notebooks, low-cost AWS setup, sample banking datasets, Glue job demos, Athena SQL labs, best practices, and a capstone credit-risk pipeline.