In this role, the candidate will be required to understand Deep learning workload characteristics and have the hands-on ability to measure, analyze and use the data to project and estimate the power and performance of the latest DL workloads.
Responsibilities
The ideal candidate will have both software and hardware background to do sensitivity analysis for both hardware knobs and understand how to measure and improve the performance of DL workloads.
The candidate should have worked on simulators and have experience with benchmarking DL models.
The ideal candidate should have at least 5+ years of experience working on performance analysis of DL workloads running workloads on accelerators and improving them.
Programming and debugging code written in python/C++/CUDA/HIP/OpenCL will be required as well as ability to model and work with the hardware teams to measure power and performance of key kernels running on RTL and performance simulators
Knowledge of performance and power modeling is a plus.
Solid understanding of the fundamentals of computer architecture, memory hierarchy, caches and fabrics is a prerequisite for the role.
Requirements
Excellent skills in problem solving, written and verbal communication, excellent organization skills, and highly self-motivated.
Ability to work well in a team and be productive under aggressive schedules
Education and Experience
PhD, Master’s Degree in Computer Engineering / Computer science with 5+ years of experience working on DL models.
Coursework on computer architecture, parallel computing , compilers and digital design is required.
Location
(US) Santa Clara CA , Austin TX, PORtland OR FORt Collins CO