Join the team building software which will be used by the entire world. Work with high-class software engineers to implement a large scale toolset that tests deep learning models and frameworks on the most powerful computers. The ability to work in a multifaceted, fast-paced environment is required as well as strong social skills. In this role you will interact with internal partners, users, and members of the open source community to implement solutions for building, testing, integrating, and releasing NVIDIA Deep Learning Frameworks on the most powerful, enterprise-grade GPU clusters capable of hundreds of Peta FLOPS. Are you ready for this challenge?

What You’ll Be Doing:

  • Automating and optimizing testing of Deep Learning models from different data domains.

  • Developing shared utilities for setting up systems, running tests, and recording results.

  • Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. GitLab, Kubernetes, Docker, Terraform).

  • Be part of the architecture and design decisions for backend, infrastructure and software release.

  • Leading best-practices for building, testing, and releasing software.

  • Identifying infrastructure needs and translating them into action.

  • Building tools for automatic content generation mechanisms that saves dozens of engineering hours.

What We Need To See:

  • BSc or MSc degree in Computer Science, Computer Architecture or related technical field, or equivalent experience.

  • 6+ years of work experience in software development.

  • Excellent Python programming skills.

  • Knowledge and love for DevOps/MLOps practices.

  • Experience in architecture and system design.

  • Strong experience in setting up, maintaining, and automating continuous integration systems.

  • Willing to take action and have strong analytical skills.

  • Strong time-management and organization skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very complex projects.

  • Algorithms and AI fundamentals.

  • Good communication and documentation habits.

Ways To Stand Out From The Crowd:

  • Solid understanding of Linux environments

  • Experience with containerization technologies such as Docker

  • Hands-on in creating integration, delivery and deployment pipelines for ML/DL products

  • Familiarity with large-scale distributed computing systems and cloud platforms.

  • Experience with HPC based compute clusters and scheduling solutions like Slurm

The base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.#deeplearning

Salary

$180,000 - $339,250

Yearly based

Location

US, CA, Santa Clara

Job Overview
Job Posted:
2 months ago
Job Expires:
Job Type
Full Time

Share This Job: