Posts

How We Architected Karpenter NodePools for a Multi-Workload EKS Cluster

A practical guide to designing NodePool tiers, weights, taints, and disruption policies based on what actually worked for us in production. Why I Am Writing This Most Karpenter content online falls into two buckets. One is the AWS documentation, which is accurate but does not help you make architectural decisions. The other is the "Hello World" blog post that shows a single NodePool with one instance category and stops there. Neither helped us when we had to run a cluster with mixed workloads, where some pods needed memory-heavy nodes for Elasticsearch and JVM apps, some needed compute-heavy nodes for batch processing, some were latency sensitive web services, and a few were GPU workloads that we did not want anywhere near the rest of the cluster. So we ended up designing a five-tier NodePool architecture. It has been running in production for a while now and has survived enough incidents that I trust it. This post walks through the design, why each decision was made,...

Introducing ExamLyftAI — A Modern Assessment Platform for Indian Coaching Institutes

 Wanted to take a moment to introduce a new product I've been building over the last several months — ExamLyftAI  ( https://examlyftai.com ). ExamLyftAI is an assessment platform designed for Indian students preparing for competitive examinations and academic studies — JEE, NEET, GATE, CAT, UPSC, CBSE/ISC boards, state boards, and other competitive exams. It is offered as a B2B SaaS product to coaching institutes, schools, and tutoring centres that prepare these students. The problem it solves Indian coaching institutes today face a growing operational gap. Teachers are excellent. Students are motivated. But the workflow connecting the two is largely manual: - Question papers are written by hand and photocopied overnight - Tests are graded one paper at a time, two days after they're taken - Performance is tracked in spreadsheets that rarely get opened between months - Solutions are written on the board because there is no way to render the teacher's method digitally - Weak ...

Prometheus Devops Interview Questions Part-3

  Q9: Explain PromQL and provide examples of common queries. PromQL (Prometheus Query Language) is Prometheus's powerful functional language for querying time series data. Think of it as SQL for metrics - it helps you extract meaningful insights from your monitoring data. Core Data Types: Instant Vector : Current value of metrics at a specific time Range Vector : Values over a time period Scalar : Simple numeric values Essential Query Examples: Basic Metric Selection: promql # Get all HTTP requests http_requests_total # Filter by specific conditions http_requests_total { method = "GET" , status = "200" } # Use regex for flexible matching http_requests_total { status =~ "2.." } # All 2xx status codes Rate Calculations: promql # Requests per second over 5 minutes rate ( http_requests_total [ 5m ] ) # Total increase over 1 hour increase ( http_requests_total [ 1h ] ) Aggregations: promql # Sum all requests across instances sum...