Do you enjoy collaborating with teams to solve complex challenges?

Do you have a passion for cutting edge technologies and tackling distributed system problems?

Join our highly skilled Storage Team!

We design, deploy, and manage applications and infrastructure that supports Akamai's internal and customer-facing cloud storage platforms. We do this while maintaining Akamai's mission to make life better for billions of people, billions of times a day.

Partner with the bestPartner with the best

As a Senior Site Reliability Engineer, collaborate to build and manage storage platforms like Block Storage and Object Storage. Develop tools to automate lifecycle processes for petabyte-scale systems. Utilise open-source technologies, including Ceph and Kubernetes, ensuring storage systems remain reliable, available, and optimised for performance.

As a Site Reliability Engineer Senior, you will be responsible for:

  • Architecting new highly available storage systems and infrastructure, supporting a variety of workloads from compute customers
  • Automating workflows and deployments using Bash/Python/Go, Saltstack/Ansible, and coding Kubernetes operators for reliability.
  • Supporting a world wide large scale deployed Kubernetes clusters with 1000s of nodes and their deployed applications.
  • Improving observability and monitoring tooling, dashboards for deep behaviour analysis on platform and application behaviour.
  • Collaborating with various teams for coordination, knowledge sharing, or feedback, including developers and planners.
  • Improving performance and reliability by identifying bottlenecks and troubleshooting microservices, Kubernetes, OSI model, Linux, Ceph.

Do what you love

To be successful in this role you will:

  • Have professional experience in a Site Reliability, Development, or Systems Engineering role with large scale distributed systems
  • Have professional experience with Kubernetes with Operators knowledge, Istio, Cilium, CertManager and ArgoCD.
  • Be familiar with observability tooling such as complex Grafana queries, percentiles, SLOs, LogQL and monitoring best practices
  • Be familiar with benchmarking tools for storage and web requests with concepts like IOPS, throughput, 99th percentile latency and object/block size.
  • Have experience with automation tools such as Terraform, Ansible, Github Actions, Jenkins, or Salt Stack
  • Have experience troubleshooting Linux systems
  • Be comfortable with OnCall rotations

Build your career at Akamai

Our ability to shape digital life today relies on developing exceptional people like you. The kind that can turn impossible into possible. We’re doing everything we can to make Akamai a great place to work. A place where you can learn, grow and have a meaningful impact.

With our company moving so fast, it’s important that you’re able to build new skills, explore new roles, and try out different opportunities. There are so many different ways to build your career at Akamai, and we want to support you as much as possible. We have all kinds of development opportunities available, from programs such as GROW and Mentoring, to internal events like the APEX Expo and tools such as Linkedin Learning, all to help you expand your knowledge and experience here.

Learn more

Not sure if this job is the right match for you or want to learn more about the job before you apply? Schedule a 15-minute exploratory call with the Recruiter and they would be happy to share more details.

Location

Poland

Job Overview
Job Posted:
1 week ago
Job Expires:
Job Type
Full Time

Share This Job: