Found 24 repositories(showing 24)
GoogleCloudPlatform
No description available
connor-hitchcock
During this full year university project course I worked within a team of eight people to develop a web application to prevent kiwi’s from throwing away one third of our food. By providing food companies with an e-commerce platform to sell food products close to expiring to cost conscious individuals, our team hopes to address this issue. The project ran within an scrum and agile processing framework, where we closely communicated with the product owner to develop the application he had envisioned. The project consisted of six sprints. During each sprint we would initially plan out what stories we would take on, split them up into tasks in substantial detail and log the time and completion on Jira. Additionally, to keep everyone on the same page we had two standups a week with our scrum master. Furthermore, we used a range of strategies on our workflow to improve code quality and minimise risk. This Included the use of code reviews before finishing each task, the use of task branching to prevent merge conflicts, substantial automated unit and integration testing with Junit, and automated acceptance testing with cucumber. Moreover, to keep our team members accountable for their mistakes we created a wiki that includes strict code styles, decision making policies, definition of done, yellow card policy, git policy, user manuals, and our testing procedures. Our technology stack used a client-server pattern for our web application, where VueJS was used on the frontend, spring boot for the backend, RESTful APIs to connect the two, MarinaDB for storing data externally, and gradle, sonarqube, npm, git, and a CI/CD pipeline to improve code quality and for seamless collaboration within the team. I took on a leadership role within the team by helping teammates solve complex problems, completing admin tasks such as setting up the CI/CD pipeline and cucumber, and providing a bridge between our team and the product owner and scrum team. This project is still ongoing and will be finished in October.
An advanced, open-source framework for retrieving, processing, and visualizing diverse cloud data. Built with Python, Docker, and integrated CI/CD workflows, this solution offers RESTful API integration, high-performance data analytics, and interactive visualization capabilities for scalable cloud data management.
Srilekha-1106
Implemented Azure Databricks for real-time data processing and governance using Unity Catalog, Spark Structured Streaming, Delta Lake features, Medallion Architecture, and end-to-end CI/CD pipelines. Focused on incremental loading, compute cluster management, maintaining data quality, and creating workflows.
PreethamVA
This MLOps project showcases an end-to-end pipeline for vehicle insurance data, covering data processing, model training, deployment, and CI/CD automation. It highlights real-world ML workflows using modern tools and best practices, making it ideal for recruiters and developers exploring production-ready ML systems.
daminasaws
Create a toolkit or library that offers a collection of pre-built automation scripts or tools for common tasks. This can include tasks like file processing, data manipulation, system administration, or CI/CD workflows. You can use technologies like Python or Bash scripting to build the toolkit.
No description available
kumarm-foxtel
No description available
victorgmrqs
No description available
chungtseng96
No description available
GDBSD
Setting up a CI/CD pipeline for data-processing workflow
DavidEMDias
A complete data pipeline demonstrating cloud storage, data processing, data transformations, CI/CD workflows, and visualizations. Provides a practical foundation for building data engineering and analytics solutions.
An automated, serverless bioinformatics pipeline designed for secure genomic data processing. This project demonstrates the integration of **Financial-grade DevOps (CI/CD, Guardrails)** into **Biotech data workflows**.
thunchanokbow
The idea of CI/CD pipeline is becoming more and more important part especially for Data Engineer. GitHub Actions and GitLab CI allow us to have workflow to automate that process.
etl-kenobi
A modular, reusable Data Quality validation framework designed for enterprise-scale data pipelines. Built on Azure with PySpark and Delta Lake, this project demonstrates CI/CD integration, batch ingestion workflows, and extensible DQ checks for production-ready data processing.
kamal-marouane
Automated cloud-based data pipeline using Apache Spark and Kafka for large-scale cluster analysis. Infrastructure provisioned with Terraform, leveraging Google Cloud Data processing for batch jobs and real-time streaming. Includes CI/CD workflows and monitoring for optimized performance.
codeSmithDave
Full-stack plant inventory system built with Next.js/React + ASP.NET Core 9. Features large CSV processing, paginated APIs, EF Core/SQL Server integration, and CI/CD workflows. Designed for scalable data management (1M+ records).
S-Delowar
Building a production-ready ETL pipeline with automated workflows, cloud integration, and CI/CD deployment. By leveraging Airflow, Docker, and AWS services, the pipeline ensures scalability, automation, and reliability for handling large-scale data processing tasks.
AliGaffarToksoy
Enterprise-grade real-time event and log processing pipeline designed to handle high-throughput data streams. Built with Kafka for scalable ingestion, OpenSearch (ELK) for indexing and visualization, Terraform for infrastructure automation, and Jenkins for CI/CD, enabling reliable, automated and observable data workflows.
shivkhurana
Automated Data Processing Pipeline designed to detect and redact PII (Personally Identifiable Information) from server logs using NLP (Spacy) and Regex. Containerized with Docker and integrated into a GitHub Actions CI/CD workflow for automated compliance testing.
hector-en
This project demonstrates deploying a classification model using Azure DevOps, focusing on predicting customer license status. It covers the CI/CD pipeline setup, Docker containerization, and integration with Azure services for real-time data processing, enhancing operational workflows in the licensing domain.
aniruddhapal
A content-based movie recommender system built with scikit-learn and deployed as a live Streamlit app on Render. This project demonstrates an end-to-end MLOps workflow, including data processing pipelines, artifact optimization for a 512MB RAM limit, and robust deployment with CI/CD.
tahsinac
NLP ML pipeline includes data processing, model training, and evaluation for text summarization with HuggingFace Transformers. Deploy predictions via FastAPI and implement a strong CI/CD workflow using GitHub Actions. This involves containerization, image pushing to Amazon ECR, and continuous deployment on EC2 for API serving.
ryanheng99
A full-stack data engineering project that ingests real-time Bitcoin price data from CoinGecko, processes it, trains an ARIMA model for forecasting, and serves predictions via a FastAPI endpoint. The entire workflow is automated using CI/CD with GitHub Actions and containerized with Docker.
All 24 repositories loaded