Warpspeed 2025 to Riquell: The Future of On-Call Without Burnout

The Beginning: A Shot in the Dark

When we first heard about Warpspeed 2025, an agentic AI hackathon organized by Devfolio and Lightspeed India in Bangalore, we knew we had to be there. The numbers were intimidating - over 2000+ registrations for a hackathon where only around 65 teams would make it to the final round. But sometimes, the best adventures begin with the longest odds.

Our team came together almost serendipitously: Akash Singh, Harsh Kumar Gupta, Himanshu, and myself decided to take on this challenge together. Shubhang Sinha joined us later as our AI engineer, bringing additional expertise to strengthen our technical foundation. What we didn't know then was that we were about to turn a shared frustration into a grand prize-winning solution.

The Spark of an Idea: DreamOps

The idea for DreamOps didn't come from a brainstorming session or market research - it came from pain. Real, 3 AM, production-is-down, your-phone-is-buzzing pain.

Akash Singh, being a DevOps engineer himself, had lived through countless nights of being jolted awake by PagerDuty alerts. Picture this: It's 3 AM, your production database is down, users are angry, and you're stumbling in the dark trying to diagnose what went wrong while half-asleep. This was Akash's reality, and the reality of thousands of on-call engineers worldwide.

The problems were clear:

  • Constant sleep interruptions and alert fatigue
  • Manual log analysis across multiple systems under pressure
  • 30-60 minutes of stressful debugging for common issues
  • Inconsistent remediation quality when exhausted
  • Burnout from repetitive tasks that could be automated

So Akash proposed a solution: DreamOps - an AI-powered on-call partner that would handle routine incidents automatically, letting engineers actually sleep through the night.

Special Thanks to Point Blank Club

Before diving into the technical journey, I want to express our heartfelt gratitude to all the seniors at Point Blank Club who took the time to validate our idea in the early stages. Your insights, feedback, and encouragement gave us the confidence to push forward with DreamOps. The validation from experienced developers and mentors was invaluable in shaping our approach and believing in the potential impact of our solution.

The Technical Challenge: Building the Impossible in 24 Hours

What we were attempting was ambitious - building an AI agent that could automatically triage and resolve infrastructure issues using Claude AI and advanced integrations. For many of us, including myself, this was uncharted territory.

I'll be honest - I didn't have enough knowledge about MCPs (Model Context Protocol) and AI agents when we started. But that's the beauty of hackathons - they push you beyond your comfort zone and force rapid learning under pressure.

The Architecture We Built

DreamOps became an intelligent incident response platform with these core components:

  • AI-First Architecture: Claude AI integration for advanced reasoning and root cause analysis
  • Model Context Protocol (MCP): Seamless integration with 10+ tools
  • Confidence Scoring: Only auto-executes actions with ≥80% confidence
  • Risk Assessment: Categorizes commands as low/medium/high risk
  • Production-Ready Stack: Python FastAPI backend, Next.js frontend
  • Deep Integrations: Kubernetes, PagerDuty, Grafana, GitHub, Slack, Notion

How It Actually Works

When PagerDuty sends an alert, our AI agent:

  1. Instantly analyzes the incident with full Kubernetes context
  2. Diagnoses root cause using logs, metrics, and documentation
  3. Executes remediation commands automatically (with safety checks)
  4. Only escalates truly complex issues that need human intervention

We even implemented what we playfully called "YOLO Mode" - when enabled, DreamOps autonomously executes remediation commands for common issues like pod crashes, memory issues, and deployment failures. Don't worry though, every action is risk-assessed and confidence-scored!

The Team Behind the Magic

Let me properly introduce the incredible team that made this possible:

  • Akash Singh - The visionary and lead developer who conceived DreamOps from his real-world DevOps pain points. His deep understanding of infrastructure challenges was the foundation of our solution.
  • Harsh Kumar Gupta - Our full-stack developer who worked across both frontend and backend systems to create a cohesive user experience.
  • Himanshu - Our backend developer who focused on the core server infrastructure and data processing pipelines.
  • Myself - As the AI engineer, I managed the backend systems, alert processing, and all the complex integrations despite initially being unfamiliar with MCPs and AI agents. The learning curve was steep but rewarding.
  • Shubhang Sinha - Our additional AI engineer who joined us later, bringing specialized knowledge in machine learning and AI systems that helped refine our agent's capabilities.

Each team member brought unique strengths, but more importantly, we shared the same vision of making on-call duty humane again.

The Results That Blew Everyone Away

The numbers spoke for themselves:

  • 80% faster incident resolution (2-5 minutes vs 30-60 minutes)
  • 2-4 hours saved per on-call shift
  • Zero 3 AM wake-up calls for routine issues
  • Consistent remediation quality regardless of time of day
  • 90% reduction in middle-of-night escalations

Victory at Warpspeed 2025

After 24 hours of intense building, debugging, and refining, we presented DreamOps to the judges. The moment they announced us as the Grand Prize winners was surreal.

The official announcement read: "Grand Prize goes to DreamOps by Akash Singh, Inchara J, Himanshu Singh, and Harsh Kumar Gupta. DreamOps is an AI agent that tackles late-night debugging. It automatically triages and resolves common programming issues, cutting debugging time from 30-60 minutes to just 2-5 minutes. Engineers can now rest easy while AI handles routine problems, escalating only complex ones."

We won $3,000 USD, but more importantly, we had validation that we'd solved a problem that resonated with every engineer in the room.

Beyond the Hackathon: Evolution to Riquell

Winning Warpspeed 2025 was just the beginning. What started as DreamOps has evolved into Riquell - a more sophisticated AI copilot that helps DevOps and SRE teams find and fix production issues faster, without needing to write complex scripts or dig through dozens of dashboards.

The hackathon judges were blown away by our approach to solving a problem that every engineer in the room had experienced. While other teams built incremental improvements, we reimagined incident response from the ground up with AI at the core.

Technical Evolution: From Prototype to Production

Since the hackathon, we've rebuilt our architecture with significant improvements:

How Riquell Works:
Right now, when a pager alert fires, engineers have to jump between logs, metrics, and tracing tools to figure out what went wrong. It's a stressful, manual process that can take hours and often hits at the worst possible time.

Riquell connects directly to systems like PagerDuty and starts triaging incidents the moment an alert comes in. It pulls real-time telemetry, routes signals to specialized AI agents for logs, metrics, and traces, and uses retrieval-augmented generation along with a system knowledge graph to understand the full context of the issue.

Three-Tiered Resolution System:
Once the issue is analyzed, Riquell offers three resolution modes depending on confidence level and risk:

  1. YOLO Mode: For low-risk, high-confidence issues like pod crashes or restarts, Riquell acts on its own. There's a built-in rollback mechanism to undo changes if the fix doesn't stabilize things.
  2. Approval Mode: Riquell prepares the complete fix and shows it to the engineer first. Once approved, it executes the steps automatically.
  3. Human-in-the-loop Mode: For more complex cases, Riquell guides the engineer step-by-step, offering context-rich suggestions and reasoning.

All of this happens inside the tools teams already use, with overlays added to existing dashboards to simplify investigation and resolution.

Advanced Tech Stack:

  • Frontend: Next.js for the SaaS interface and real-time dashboards
  • Backend: Go as the primary language for backend systems, with Python FastAPI for the incident response workflow that receives PagerDuty webhooks
  • AI & LLM Stack:
    • Claude AI as the primary high-capability reasoning engine for sophisticated root cause analysis and remediation planning
    • Agno framework used to build the AI agent
    • Vector-based search implemented to enable semantic search of on-call notes and incident data
    • Knowledge Graph and RAG to turn the codebase into a knowledge graph, making it easier for the agent to make edits and suggest relevant knowledge base articles
    • Reinforcement Learning with RAG integrated to lessen reliance on Vector DB during production to reduce cost and complexity

Deep Integration Ecosystem:
Our MCP server integrations include:

  • AWS ECS/EKS: Deployment infrastructure
  • PagerDuty: Source of incident alerts that trigger workflows via webhook
  • Grafana: Gathering context and validating alerts with quantitative data
  • Kubernetes: Investigating live status of services, describing pods, and pulling logs
  • GitHub: Correlating production issues with recent code modifications
  • Notion: Knowledge base for runbooks and architectural diagrams
  • Datadog: Performance monitoring and tracing
  • Atlassian: Issue tracking and team collaboration
  • Slack: Team communication and notifications

Continuous Learning:
Riquell also learns continuously. It observes how incidents are handled, gathers feedback from engineers, and uses reinforcement learning to improve its accuracy and decision-making over time.

Future Technical Innovations

The roadmap ahead includes some exciting technical challenges we're exploring:

Predictive Incident Prevention:

  • Machine learning models that analyze historical patterns and system metrics to predict issues before they occur
  • Anomaly detection algorithms that can identify subtle drift in system behavior
  • Proactive scaling and resource optimization based on predicted load patterns

Advanced Observability Integration:

  • Deep integration with planned APM tools like New Relic, Pyroscope, and OpenTelemetry
  • Custom instrumentation that provides richer context to our AI agents
  • Real-time correlation between business metrics and infrastructure health

Multi-Cloud Intelligence:

  • Cross-cloud incident correlation and resolution across AWS, GCP, and Azure
  • Cloud-agnostic infrastructure abstractions for universal deployment
  • Cost-impact analysis that factors incident resolution into infrastructure spending

What We Learned

This experience taught us several valuable lessons:

  1. Real problems make the best products - Our solution resonated because it addressed genuine pain points that every engineer in the room had experienced
  2. Technical challenges are surmountable - Even without deep expertise in certain areas, determination and rapid learning can bridge gaps
  3. Team diversity is strength - Each member's unique background contributed to our comprehensive solution
  4. Validation matters - Getting feedback from experienced developers (shoutout to Point Blank Club seniors!) helped refine our approach

Looking Forward

Riquell isn't stopping at the hackathon victory. We continue pushing the boundaries of what's possible when AI meets DevOps, exploring new frontiers in intelligent infrastructure management.

We started this as a hackathon project called DreamOps, won the Warpspeed hackathon, and have been building ever since. Now, Riquell is becoming a full product aimed at making incident response faster, safer, and a lot less stressful.

The Real Victory

Yes, we won $3,000 and the Grand Prize at Lightspeed Warpspeed 2025. But honestly? The real win is what we built and the problem we're solving.

We've created something that lets engineers sleep through the night instead of being woken up by routine production issues. We've built a platform that transforms the most stressful part of being a developer into something manageable and automated. From DreamOps to Riquell, we're continuously evolving our approach to make incident response not just faster, but fundamentally more intelligent.

Conclusion

From 2000+ registrations to 65 finalist teams to Grand Prize winners - the journey of Warpspeed 2025 taught us that with the right idea, the right team, and enough determination, you can build something that truly matters.

To the Point Blank Club seniors who believed in our idea from the early stages, to the judges who recognized the potential of DreamOps, and to every engineer who has ever been woken up by a 3 AM alert - this one's for you.

Riquell is real. It's happening. And it's just the beginning of making on-call duty humane again.

Because 3 AM debugging sessions should be a thing of the past.


Connect with the team:

Check out our journey:

The future of incident response is here. Ready to dream easy while AI takes care of your on-call duty?