Senior Site Reliability Engineer

at Akamai Technologies

Full Time

Do you enjoy collaborating with teams to solve complex challenges?

Do you have a passion for cutting edge technologies and tackling distributed system problems?

Join our highly skilled Storage Team!

We design, deploy, and manage applications and infrastructure that supports Akamai's internal and customer-facing cloud storage platforms. We do this while maintaining Akamai's mission to make life better for billions of people, billions of times a day.

Partner with the bestPartner with the best

As a Senior Site Reliability Engineer, collaborate to build and manage storage platforms like Block Storage and Object Storage. Develop tools to automate lifecycle processes for petabyte-scale systems. Utilise open-source technologies, including Ceph and Kubernetes, ensuring storage systems remain reliable, available, and optimised for performance.

As a Site Reliability Engineer Senior, you will be responsible for:

Architecting new highly available storage systems and infrastructure, supporting a variety of workloads from compute customers
Automating workflows and deployments using Bash/Python/Go, Saltstack/Ansible, and coding Kubernetes operators for reliability.
Supporting a world wide large scale deployed Kubernetes clusters with 1000s of nodes and their deployed applications.
Improving observability and monitoring tooling, dashboards for deep behaviour analysis on platform and application behaviour.
Collaborating with various teams for coordination, knowledge sharing, or feedback, including developers and planners.
Improving performance and reliability by identifying bottlenecks and troubleshooting microservices, Kubernetes, OSI model, Linux, Ceph.

Do what you love

To be successful in this role you will:

Have professional experience in a Site Reliability, Development, or Systems Engineering role with large scale distributed systems
Have professional experience with Kubernetes with Operators knowledge, Istio, Cilium, CertManager and ArgoCD.
Be familiar with observability tooling such as complex Grafana queries, percentiles, SLOs, LogQL and monitoring best practices
Be familiar with benchmarking tools for storage and web requests with concepts like IOPS, throughput, 99th percentile latency and object/block size.
Have experience with automation tools such as Terraform, Ansible, Github Actions, Jenkins, or Salt Stack
Have experience troubleshooting Linux systems
Be comfortable with OnCall rotations

Build your career at Akamai

Our ability to shape digital life today relies on developing exceptional people like you. The kind that can turn impossible into possible. We’re doing everything we can to make Akamai a great place to work. A place where you can learn, grow and have a meaningful impact.

With our company moving so fast, it’s important that you’re able to build new skills, explore new roles, and try out different opportunities. There are so many different ways to build your career at Akamai, and we want to support you as much as possible. We have all kinds of development opportunities available, from programs such as GROW and Mentoring, to internal events like the APEX Expo and tools such as Linkedin Learning, all to help you expand your knowledge and experience here.

Learn more

Not sure if this job is the right match for you or want to learn more about the job before you apply? Schedule a 15-minute exploratory call with the Recruiter and they would be happy to share more details.

Location

Poland

Engineer

Job Overview

Job Posted:

3 months ago

Job Expires:

Job Type

Full Time

Location

Share This Job:

AI Jobs

Companies

Support

Job Details

Location

Share This Job:

Related Jobs

AI Jobs

Companies

Support