NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for a talented deep learning performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance analysis, and optimization on state-of-the-art hardware architectures for various LLM and Multi-model workloads. You will make your contributions to our dynamic technology focused company.
What you'll be doing:
Analyze state-of-the-art DL networks (LLM, VLA, and Multimodal model etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.
Develop analytical models for the state-of-the-art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.
Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uniprocessor and multiprocessor configurations.
Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
What we need to see:
Pursuing BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).
Strong programming skills in Python, C, C++.
Strong background in computer architecture.
Ways to stand out from the crowd:
Experience with GPU Computing and parallel programming models such as CUDA and OpenCL.
Experience with workload analysis on other deep learning accelerators.
Background with deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, Tensorflow, TensorRT).