We are seeking an experienced Observability SME with deep expertise in observability architectures and leading monitoring platforms. This role will be responsible for designing, implementing, and optimizing end-to-end observability solutions for applications, infrastructure, and networks. The ideal candidate will have extensive hands-on experience with platforms such as ELK (Elasticsearch, Logstash, Kibana), Dynatrace, BMC TrueSight, and SolarWinds, ensuring seamless monitoring, alerting, and analytics to enhance IT operations and service reliability.

Key Responsibilities:

• Observability Strategy & Architecture: Design and implement comprehensive observability solutions to monitor applications, infrastructure, and network performance.

• Monitoring Tool Implementation & Optimization: Deploy and fine-tune monitoring solutions using ELK, Dynatrace, BMC TrueSight, and SolarWinds.

• Log Management & Analysis: Establish centralized logging, log parsing, and correlation for improved event detection and troubleshooting.

• Metrics & Performance Monitoring: Define KPIs, dashboards, and alerts for proactive IT service monitoring.

• Incident Management & Root Cause Analysis: Collaborate with IT operations, DevOps, and SRE teams to diagnose and resolve performance issues.

• Automation & Integration: Integrate monitoring tools with ITSM platforms, AIOps solutions, and automation frameworks for enhanced efficiency.

• Capacity Planning & Optimization: Analyze historical trends and real-time data to optimize resource allocation and performance.

• Stakeholder Collaboration: Work closely with developers, network engineers, system administrators, and business units to ensure observability best practices are followed.

• Continuous Improvement: Stay updated on emerging observability technologies and recommend improvements to existing processes and tools

Requirements

• Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent experience).

• Expertise in Observability & Monitoring Platforms: 8+ Years Hands-on experience with ELK Stack, Dynatrace, BMC TrueSight, SolarWinds, and similar platforms.

• Strong Knowledge of Infrastructure & Application Monitoring: Experience monitoring cloud, on-premise, and hybrid environments.

• Experience with Log & Event Correlation: Ability to configure and analyze logs for anomaly detection and security insights.

• Automation & Scripting: Proficiency in scripting languages such as Python, PowerShell, or Bash for automation.

• Cloud & DevOps Understanding: Experience with cloud platforms (AWS, Azure, GCP) and CI/CD pipelines.

• ITIL & Incident Management Exposure: Understanding of ITIL processes and IT service management (ITSM) practices.

• Networking & Security Awareness: Knowledge of network monitoring, SNMP, and security monitoring practices.

• Excellent Communication & Documentation Skills: Ability to present findings, create technical documentation, and train teams on observability best practices.

Preferred Qualifications:

• Certifications in Dynatrace, ELK, BMC TrueSight, or SolarWinds.

• Experience with AIOps, Machine Learning for Anomaly Detection, or AI-driven Observability.

• Background in Site Reliability Engineering (SRE) or DevOps.

• Familiarity with Infrastructure as Code (IaC) tools such as Terraform, Ansible.

Location

Riyadh, Riyadh Province, Saudi Arabia

Job Overview
Job Posted:
2 weeks ago
Job Expires:
Job Type
Full Time

Share This Job: