AI/ML DevOps Engineer

Our Infrastructure AI and Data Engineering Team is responsible for providing the foundational firm-wide AI Enablement platform. We are transitioning this platform onto K8s and we are seeking an experienced DevOps Engineer to lead this effort.  The ideal candidate will help drive our cloud-native infrastructure initiatives and lead the implementation of DevOps best practices across our organization. This is a unique opportunity to not only join one of the leading hedge funds in the world, but to provide leadership on the core AI Enablement platform which is used by every aspect of the business on a daily basis.

Key Responsibilities:

  • Design and implement high-availability solutions for critical AI infrastructure
  • Partner with AI/ML teams to optimize platform performance and scalability
  • Drive architectural decisions for the next generation of the AI platform
  • Lead the development and maintenance of CI/CD pipelines using tools like Jenkins or GitHub Actions
  • Architect and implement Infrastructure as Code (IaC) solutions using Terraform or similar tools
  • Optimize container orchestration platforms (Kubernetes) and microservices architecture
  • Improve and maintain monitoring, alerting and incident response systems (Datadog, OpsGenie)
  • Lead incident response and participate in on-call rotation
  • Mentor junior team members and contribute to technical documentation
  • Collaborate with development team to improve deployment processes and system reliability

Required Qualifications:

  • 5+ years of experience in DevOps, Site Reliability Engineering, or similar roles
  • Strong experience with cloud platforms (AWS/GCP/Azure)
  • Expert knowledge of containerization (Docker) and orchestration (Kubernetes and Helm)
  • Proficiency in Infrastructure as Code and configuration management tools
  • Experience with high-performance, low-latency systems
  • Track record of successfully delivering large-scale infrastructure projects
  • Experience with CI/CD tools and methodologies
  • Deep understanding of networking, security, and system architecture
  • Excellent troubleshooting and analytical skills.
  • Strong communication skills to collaborate with various stakeholders

Preferred Qualifications:

  • Experience in financial services or hedge fund environment
  • Experience with Python (FastAPI)
  • Knowledge of machine learning operations (MLOps)
  • Experience with data processing frameworks and big data technologies
  • Experience with MultiCloud and/or On-Prem Kubernetes
  • Experience running CUDA-enabled accelerated workloads

Location

London - 62 Buckingham Gate, United Kingdom

Job Overview
Job Posted:
5 days ago
Job Expires:
Job Type
Full Time

Share This Job: