Head of ML Infrastructure

at Hippocratic AI

Full Time

About Us:

Hippocratic AI is developing the first safety-focused Large Language Model (LLM) for healthcare. Our mission is to dramatically improve healthcare accessibility and outcomes by bringing deep healthcare expertise to every person. No other technology has the potential for this level of global impact on health.

Why Join Our Team:

Innovative mission: We are creating a safe, healthcare-focused LLM that can transform health outcomes on a global scale.
Visionary leadership: Hippocratic AI was co-founded by CEO Munjal Shah alongside physicians, hospital administrators, healthcare professionals, and AI researchers from top institutions including Johns Hopkins, Stanford, Google, Meta, Microsoft and NVIDIA.
Strategic investors: Raised $137 million from top investors including General Catalyst, Andreessen Horowitz, Premji Invest, SV Angel, NVentures (Nvidia Venture Capital), and Greycroft.
Team and expertise: We are working with top experts in healthcare and artificial intelligence to ensure the safety and efficacy of our technology.

For more information, visit www.HippocraticAI.com.

We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job description

Position Overview:

We are seeking a highly skilled and innovative Head of ML Infrastructure to lead the design, development, and operation of our orchestration platform for a heterogeneous constellation of Large Language Models (LLMs). The ideal candidate will have deep expertise in infrastructure orchestration, multi-cloud environments, and tools such as Kubernetes and Terraform. This role is critical to ensuring that our AI systems are scalable, reliable, and seamlessly integrated into our broader technology ecosystem.

Key Responsibilities:

Orchestration Platform Development:

• Architect and implement an advanced orchestration platform to manage a diverse set of LLMs efficiently.
• Design solutions to optimize performance, scalability, and availability across various deployment environments.

Infrastructure Management:

• Utilize Kubernetes, Terraform, and other Infrastructure as Code (IAC) tools to automate and manage ML infrastructure.
• Collaborate with DevOps and cloud engineering teams to ensure seamless integration with CI/CD pipelines.
• Establish robust monitoring, logging, and alerting systems for ML infrastructure.

Multi-Cloud Strategy:

• Design and execute strategies to leverage multiple cloud providers for cost optimization, redundancy, and compliance.
• Manage cloud-native services to support model deployment and orchestration at scale.

Performance Optimization:

• Work closely with ML engineers to fine-tune model deployment strategies, focusing on latency, throughput, and fault tolerance.
• Conduct capacity planning and develop tools for model lifecycle management.

Leadership & Collaboration:

• Lead a team of infrastructure engineers, fostering a culture of innovation, collaboration, and excellence.
• Act as a bridge between ML research, engineering, and operations teams to align infrastructure capabilities with business needs.
• Stay abreast of emerging technologies and methodologies in ML infrastructure and orchestration.

Qualifications:

Technical Skills:

• Proven experience in building and managing ML infrastructure platforms, particularly for LLMs or other advanced AI systems.
• Expertise in Kubernetes, Terraform, and other IAC tools.
• Deep understanding of multi-cloud architectures (e.g., AWS, Azure, Google Cloud) and hybrid cloud solutions.
• Strong programming skills in Python, Go, or a similar language, with experience in building automation and orchestration tools.
• Familiarity with modern ML frameworks and tools (e.g., TensorFlow, PyTorch, Hugging Face).

Leadership & Communication:

Demonstrated success in leading infrastructure teams and managing large-scale projects
Excellent problem-solving and decision-making skills.

Strong communication skills, with the ability to convey complex technical ideas to non-technical stakeholders.

Education & Experience:

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent work experience).
8+ years of experience in infrastructure engineering, with at least 3 years in a leadership

Location

Palo Alto

Machine Learning

Job Overview

Job Posted:

7 months ago

Job Expires:

Job Type

Full Time

Position Overview:

Key Responsibilities:

Qualifications:

Location

Share This Job:

AI Jobs

Companies

Support

Job Details

Position Overview:

Key Responsibilities:

Qualifications:

Location

Share This Job:

Related Jobs

AI Jobs

Companies

Support