AI/ML Infrastructure Engineer

at Rakuten Asia Pte Ltd

Full Time

Job Description:

Situated in the heart of Singapore's Central Business District, Rakuten Asia Pte. Ltd. is Rakuten's Asia Regional headquarters. Established in August 2012 as part of Rakuten's global expansion strategy, Rakuten Asia comprises various businesses that provide essential value-added services to Rakuten's global ecosystem. Through advertisement product development, product strategy, and data management, among others, Rakuten Asia is strengthening Rakuten Group's core competencies to take the lead in an increasingly digitalized world.

The Machine learning and Deep learning Engineering Department (MDE) is a group of engineers and scientists who specialize in natural language processing (NLP), search, and recommendation systems. We conduct state-of-the-art research and apply cutting-edge technologies, such as transformer model, dense retrieval, distributed GPU training, and large-scale machine learning, to a variety of Rakuten products and services. We are looking for passionate experts in machine learning research and engineering to join us in our journey to define the next-generation e-commerce experience.

The GPU Engineering team is at the forefront of delivering a robust GPU infrastructure and cutting-edge ML platforms that powers the development and deployment of ML models across various teams of ML engineers and researchers within Rakuten. Use cases include semantic search, visual search, recommendation, LLMs, and more.

As an MLOps Engineer in the GPU Engineering team, you will be at the heart of Rakuten's ML operations, focusing on the deployment, monitoring, and management of ML models. You'll work closely with ML Engineers across the department to provide a reliable infrastructure that supports rapid model development, training, and deployment. Your expertise will contribute to the efficiency and scalability of our ML projects, directly impacting Rakuten's product innovation and service excellence.

Responsibility

Design, implement, and maintain ML pipelines for automated training, testing, and deployment of machine learning models, ensuring scalability and efficiency.
Work collaboratively with ML engineers to troubleshoot and optimize model performance, ensuring models are production-ready and meet defined SLAs.
Manage and monitor Kubernetes clusters and related infrastructure to support high-volume ML workloads, implementing best practices for security and resilience.
Develop and maintain documentation on ML infrastructure, tools, and best practices, providing guidance and support to ML teams.
Continuously evaluate and incorporate new technologies and tools to enhance the ML platform's capabilities and performance.

Qualifications

Minimum 1 year or more of experience in MLOps, with a proven track record of managing ML infrastructure and pipelines.
Education: Bachelor’s or higher degree in Computer Science, Engineering, or a related technical discipline.
Kubernetes Proficiency: Deep understanding of Kubernetes (K8s) infrastructure and its application in managing ML workloads.
Programming Skills: Proficiency in Python and familiarity with ML frameworks (e.g., TensorFlow, PyTorch).
CI/CD Tools: Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI) and container technologies (e.g., Docker).
Strong communication and teamwork skills.
Passion for technology and solving challenging problems.

Rakuten is an equal opportunities employer and welcomes applications regardless of sex, marital status, ethnic origin, sexual orientation, religious belief, or age.

Location

Crimson House Singapore

Engineer Machine Learning

Job Overview

Job Posted:

2 months ago

Job Expires:

Job Type

Full Time

Location

Share This Job:

AI Jobs

Companies

Support

Job Details

Location

Share This Job:

Related Jobs

AI Jobs

Companies

Support