Found 811 repositories(showing 30)
HolmesGPT
SRE Agent - CNCF Sandbox Project
SmythOS
The SmythOS Runtime Environment (SRE) is an open-source, cloud-native runtime for agentic AI. Secure, modular, and production-ready, it lets developers build, run, and manage intelligent agents across local, cloud, and edge environments.
Tracer-Cloud
Build your own AI SRE agents. The open source toolkit for the AI era ✨
a2wio
A2W's SRE agent for Kubernetes
axiomhq
An SRE agent that does what you can't. Queries your observability stack. Finds root causes. Doesn't panic. Doesn't guess. Doesn't care about your feelings. You're welcome.
fuzzylabs
A Site Reliability Engineer AI agent that can monitor application and infrastructure logs, diagnose issues, and report on diagnostics.
Arvo-AI
Aurora — Open source AI-powered agentic incident management & root cause analysis for SREs. LangGraph agents investigate across AWS, Azure, GCP, Kubernetes. Integrates with PagerDuty, Datadog, Grafana, Slack. Apache 2.0.
scitix
AI-powered SRE platform — read-only infrastructure diagnostics with deep investigation, security governance, and team collaboration
microsoft
Azure SRE Agent is an AI-powered reliability assistant that helps teams diagnose and resolve production issues, reduce operational toil, and lower mean time to resolution
qicesun
An Autonomous AI SRE Agent for Kubernetes, built with Java Spring Boot & LangChain4j. Implements OODA loop for self-healing.
last9
A curated list of AI-powered DevOps & SRE (Site Reliability Engineering) agents, tools, and resources for automating and enhancing reliability practices
matthansen0
Fully automated deployment of Azure SRE Agent, multi-pod AKS, and "break scenarios" with prompt guides to experiment with Azure SRE Agent.
aptible
Unpage is the open source framework for building SRE agents with infrastructure context and secure access to any dev tool.
avivl
An autonomous SRE agent that monitors cloud logs across multiple platforms, leveraging AI models from various providers to detect anomalies, perform root cause analysis, and automate remediation by creating GitHub Pull Requests.
vapering
Deep SRE Agent is a cutting-edge intelligent SRE (Site Reliability Engineering) experimental platform designed to explore the application of LLMs (Large Language Models) in the field of SRE.
benmoggee
SRE Agent — An AI-powered MCP server for production incident triage. Takes natural-language symptom reports, plans structured investigations using Gemini, executes parallel workers (logs, metrics, deploys, runbooks), synthesizes root-cause reports, and proposes remediation patches with human approval gates.
microsoft
GitAGU (Git Agent Unblock) - A centralized platform for discovering, configuring, and integrating AI agents into your development workflow. Simplifies AI adoption across the entire Software Development Life Cycle
boshu2
Operational patterns for AI agents and infrastructure from solo to enterprise scale. Applies DevOps + SRE principles to the intersection: infrastructure FOR AI + AI FOR infrastructure. Alpha framework extracting meta-patterns from mission critical environments.
paulasilvatech
Agentic Operations & Observability Workshop: Hands-on guide for implementing AI-enhanced observability in cloud apps using Azure Monitor, Application Insights, and Azure SRE Agent. Covers metrics, logs, traces, multi-cloud monitoring, and agentic DevOps best practices.
itbench-hub
Code repository for SRE agent as part of ITBench
BagelHole
Agent-ready DevOps, security, infrastructure, and compliance knowledge base with 80+ skills across Kubernetes, Terraform, AWS/Azure/GCP, AI platform operations, container hardening, SOC2/ISO27001, and incident response—plus ready-to-run scripts, templates, and playbooks for SRE, platform, and security teams.
73ai
InfraGPT is an AI SRE Copilot for the Cloud that provides infrastructure management agents through Slack integration. The system consists of multiple services that work together to deliver intelligent DevOps workflows.
martinimarcello00
Autonomous agent for Kubernetes incident detection, diagnosis, and mitigation using LLMs and modular workflows. Integrates LangChain, LangGraph, and MCP servers to enable automated SRE tasks in cloud-native environments.
mstrYoda
An autonomous Kubernetes troubleshooting and healing agent powered by AI Agents and LLMs
Lethe044
Autonomous SRE agent built on Hermes - detects, heals, and learns from production incidents. Uses Memory + Skills + Cron + Gateway + Subagents + Atropos RL.
neeltom92
AI Powered SRE Observability agent
aryan877
devops sre agent, winner of wemakedevs future stack gen ai hackathon 2025
sid-rp
Self-improving Kubernetes SRE agent trained on real cluster incidents using GRPO + OpenEnv v0.2.1
sre-agents
✨ An SRE agent skilled in ECS operations. Make the experiences of ECS Ops fast 🚀, stable ⚖️, and easy ✔️.
imran-siddique
[DEPRECATED] Moved to microsoft/agent-governance-toolkit