We are seeking a highly skilled GPU Compute Architect with a strong background in microarchitecture (uArch) and Register-Transfer Level (RTL) design to join our team. This individual will play a critical role in prototyping and designing advanced compute arithmetic components such as MAC (Multiply-Accumulate) arrays and ALUs (Arithmetic Logic Units) for GPUs tailored to AI applications, with a focus on delivering comprehensive power and area estimates for each design option.
Key Responsibilities:
Design and prototype advanced compute arithmetic units (e.g., MAC arrays, ALUs) for GPUs targeting AI and deep learning workloads.
Develop and optimize GPU microarchitectures to enhance performance, energy efficiency, and scalability for AI-specific applications.
Create and refine RTL implementations to validate and benchmark new design concepts.
Conduct detailed performance modelling and analysis to identify bottlenecks and propose innovative solutions for next-generation GPU designs.
Produce comprehensive power and area estimates for proposed designs, enabling informed trade-off analysis and decision-making.
Collaborate with cross-functional teams, including software, hardware, and machine learning experts, to align architecture design with application requirements.
Research and integrate emerging technologies and methodologies in GPU compute design for AI workloads.
Lead the evaluation of design trade-offs in terms of performance, area, and power metrics.
Drive innovation in custom compute unit design, ensuring compatibility with broader GPU pipeline architecture.
Required:
Master's or Ph.D. in Electrical Engineering, Computer Engineering, Computer Science, or a related field.
Proven related experience in GPU/ASIC architecture design, with a focus on compute arithmetic via course work or relevant projects.
Expertise in microarchitecture design and RTL coding (e.g., SystemVerilog).Strong understanding of GPU pipelines, parallel computing concepts, and AI/ML workloads.
Proven experience in designing and optimizing MAC arrays, ALUs, or similar compute units.
Solid knowledge of hardware modelling and simulation tools (e.g., VCS, Synopsys, ModelSim).Experience in producing and interpreting power and area estimates for complex hardware designs.
Proficiency in performance analysis tools and techniques.
Strong problem-solving skills with the ability to innovate and think out of the box.
Preferred:
Familiarity with high-level synthesis (HLS) tools and methodologies.
Background in machine learning algorithms and their hardware acceleration.
Understanding of power optimization techniques and methodologies for compute-intensive hardware.
Requirements listed would be obtained through a combination of industry relevant job experience, internship experiences and or schoolwork/classes/research.
Work Model for this Role
This role will be eligible for our hybrid work model which allows employees to split their time between working on-site at their assigned Intel site and off-site. * Job posting details (such as work model, location or time type) are subject to change.