(Location: Remote US)

About role: 

We are currently looking for a skilled Sr. ML Platform Engineer with specialized focus on Cloud Infrastructure that includes API development to facilitate seamless integration and interaction between cloud-based services and High-Performance Computing (HPC) environments. The successful candidate will play a pivotal role in designing and implementing APIs that enable efficient communication and data exchange between cloud platforms and HPC systems.

Responsibilities:

  • Design, develop, and maintain robust APIs that facilitate communication and data exchange between cloud-based services, particularly AWS, and HPC environments
  • Collaborate with cross-functional teams to understand the unique requirements of both cloud based services and HPC systems, ensuring that the APIs developed meet the specific needs of these environments
  • Implement best practices for API design, including security, scalability, and performance optimization to ensure efficient interaction between cloud services and HPC clusters
  • Utilize services such as Cloudflare to enhance API performance, security, and reliability in the cloud-to-HPC communication, optimizing for speed and resilience
  • Work closely with HPC engineers to identify and address integration challenges, striving for seamless connectivity between diverse systems and cloud-based platforms
  • Drive innovation by proposing and implementing new API strategies, enhancing the efficiency and functionality of data exchange between AWS, Cloudflare workers, on-premise HPC environments
  • Create comprehensive documentation and provide training to internal teams on the use and integration of developed APIs, focusing on AWS and Cloudflare environments
  • Monitor API performance and address issues related to data transfer, ensuring reliability and consistent operation between AWS, Cloudflare, and HPC systems (Slurm/AWS HyperPod)
  • Collaborate with the security team to ensure that the APIs comply with industry standards and best practices for data privacy and protection, especially in AWS and Cloudflare environments
  • Participating in incident management and root cause analysis to improve system reliability
  • Build containers with REST APIs for Gen AI functionality and host them on AWS and Azure

Requirements:

  • 8 years of experience in cloud computing, API development, and a deep understanding of High-Performance Computing environments, particularly in an AWS setting
  • Strong knowledge of HPC cluster management and job scheduling with Slurm and AWS HyperPod
  • Proficiency in programming languages such as Python and Typescript, essential for API development and integration within AWS and/or Cloudflare worker environments
  • Demonstrated expertise in API design, implementation, and maintenance, ensuring security and performance best practices within AWS and Cloudflare
  • Knowledge of containerization technologies (e.g., Docker, Kubernetes) for deployment of APIs within AWS, Cloudflare, and HPC systems
  • Experience with automating CI/CD pipelines
  • Familiarity with authentication and authorization protocols (e.g., OAuth, JWT) to ensure secure data exchange between AWS, Cloudflare, and HPC environments
  • Strong problem-solving skills and the ability to troubleshoot complex issues related to API integrations in a hybrid cloud-HPC setup, particularly in AWS and Cloudflare environments
  • Excellent communication and collaboration skills to work effectively with diverse teams and stakeholders in AWS and Cloudflare ecosystems

Compensation

The salary range for this role is between $130,000 and $190,000. Individual pay within the range is based on factors like job-related skills and experience. Total compensation also includes stock options and benefits

Equal Employment Opportunity:

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

Salary

$130,000 - $190,000

Yearly based

Location

United States

Job Overview
Job Posted:
1 week ago
Job Expires:
Job Type
Full Time

Share This Job: