Deliver full-stack visibility, SLIs/SLOs, and proactive capacity/performance insights.

Key Responsibilities

  • Architect and operate Prometheus/Thanos, Grafana, Loki, Tempo/Jaeger, Blackbox exporter.
  • Build service-level indicators (latency, error rate, saturation) aligned with KPIs.
  • Automate capacity forecasts and right-sizing via Kubernetes metrics and Kubecost APIs.
  • Aligning IT with Business Goals: By developing SLIs and SLOs that reflect business KPIs, the role bridges the gap between technical operations and business objectives.

Core Skills & Tools
Prometheus, Thanos, Grafana Loki/Tempo, Elastic APM, Kubecost, KEDA, Grafana IRM.

Requirements

Qualifications

  • 4+ yrs monitoring/performance in micro-services environments.
  • Experience integrating with ServiceNow or Remedy for event correlation.

Location

Dubai, United Arab Emirates

Job Overview
Job Posted:
1 week ago
Job Expires:
Job Type
Full Time

Share This Job: