Lead MLOps Engineer

We live in dynamic times where technology is reshaping how we live, and GXS is committed to redefining financial services by leveraging technology. Our aim is to enable underserved groups to easily access transparent financial services embedded in their everyday activities, helping them achieve a better quality of life. To do this, we're building a cutting-edge digital bank with a strong foundation in data, technology, and trust, to solve problems and serve our customers.

As a Lead MLOps Engineer (Individual Contributor), you will be a key player in deploying, monitoring, and maintaining machine learning models in production environments. This role requires deep expertise in bridging the gap between data science and engineering, ensuring seamless integration of machine learning models into operational workflows. You will work closely with data scientists, software engineers and DevOps teams to automate and streamline the model lifecycle, from development to deployment and monitoring.

Responsibilities:

  • Design, develop, and implement end-to-end MLOps pipelines for machine learning projects, including data pipelines, model training environments, and deployment mechanisms using cloud services and container orchestration tools.

  • Drive the implementation of automation solutions for continuous integration, continuous delivery, and continuous training (CI/CD/CT) of machine learning models to streamline the development and deployment processes.

  • Collaborate with machine learning engineers to understand model requirements and optimize deployment processes.

  • Implement and oversee monitoring solutions for machine learning applications in production, ensuring high availability, performance, and reliability. Lead incident response, root cause analysis, and implement robust fixes.

  • Drive initiatives to continuously assess and optimize the performance of machine learning models’ infrastructure in production, including resource allocation, cost reduction, and latency improvements.

  • Manage the end-to-end lifecycle of machine learning models in production, including updates, version control, and retirement of models that no longer meet the performance criteria.

  • Establish and maintain comprehensive documentation for operational procedures, system configurations, and best practices.

  • Develop automation scripts and tools to improve the efficiency and reliability of ML workflows.

Skills and knowledge

  • 7+ years of strong practical experience with AWS services, particularly those related to computing, storage, networking, and security.

  • Strong experience with Containerization Technology such as Docker, Kubernetes & Helm.

  • Deep understanding of MLOps principles and experience with tools such as MLflow, Kubeflow, or Vertex AI/SageMaker.

  • Proficiency in infrastructure as code (IaC) using Terraform, or similar.

  • Solid background in CI/CD methodologies and tools (e.g., GitLab CI/CD).

  • Programming skills in Python, with familiarity in ML libraries and frameworks (TensorFlow, PyTorch).

  • Demonstrated experience in deploying and maintaining ML models in a production environment.

Location

Singapore - OneNorth

Job Overview
Job Posted:
4 days ago
Job Expires:
Job Type
Full Time

Share This Job: