The Efficiency Engineering team is all about our passion for crafting innovative tools and applications that empower IT operations and devops teams to achieve new levels of efficiency. We're a tight-knit crew of experienced developers, engineers and problem solvers fueled by a shared vision: streamlining operations, reducing manual workload, and empowering teams to do their best work.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Responsibilities:
- Responsible for the design and development of large-scale ML system architecture such as solving technical system problems on high concurrency, reliability, scalability, etc
- Develop end-to-end solutions on deep model inference for internal business units such as Search and relevant Large Language Model (LLM) based systems etc
- Provide highly automated and extremely performant model optimization solutions for frameworks such as PyTorch and TensorFlow. Some technical solutions includes subgraph matching, compilation optimization, model quantization, heterogeneous hardware, etc.
- Manage the large-scale GPU computing power cluster for our global businesses by improving utilization rates of the computing power through methods such as elastic scheduling, GPU overselling, and task orchestration;
- Engage in cross functional collaboration with the algorithm department to conduct joint optimization of algorithms and systems.

Location

San Jose, California, United States

Job Overview
Job Posted:
3 days ago
Job Expires:
Job Type
Full Time

Share This Job: