Principal Backend Engineer (AI Infra)
Coupang is reimagining the shopping experience with the goal of wowing each customer from the instant they open the Coupang app to the moment an order is delivered to their door.
Powered by an outstanding end-to-end e-commerce and logistics network and a fanatical culture of customer centricity, Coupang has broken tradeoffs around speed, selection and price. Today, we provide exceedingly fast shipping speeds on millions of items including fresh groceries, delivered within hours nationwide, 365 days a year.
We are doing this for millions of consumers in Korea. Korea is home to one of the largest and fastest growing e-commerce opportunities anywhere in the world.
Job Overview:
As a Principal Backend Engineer (AI Infra) in Cloud Platform team, this role will focus on Hybrid Cloud Platform Infrastructure, Services and processes at its best to enhance Multi Cloud infrastructure & system Efficiencies, Productivity, Availability, Scalability and system quality at 10X scale for ML workloads.
This role will be responsible for diving deep into business problems, understanding current customer pain points, system domain, current Platform limitations (if exists) and then work closely with engineering teams / leaders to design and architect new system / services leveraging both on premise and cloud infrastructure that can scale 10X better in terms of availability, reliable, performance and efficiency.
This includes providing system architecture artifacts, accountability on infrastructure choices that can perform and scale at 10X level but are still efficient, guidance on using the right machine type, Storage, network designs etc . You will be part of a Global team working from different locations but still works as a central connected unit.
Key Skills and Role Responsibilities:
Strong experience managing and deploying large scale cloud infrastructure and services in Hybrid Cloud environments including OS, Network, Storage, Security, Monitoring, Logging.
Strong experience in managing and deploying ML workloads both Cloud and On-premise infrastructure.
Strong Experience on DevOps tools and coding CI/CD, IaC, Terraform Ansible, Python and Automation.
Strong Experience on Kubernetes, Docker, Linux and cloud services integrations.
Experience in managing Storage, network and security in Hybrid Cloud environments.
Provide a roadmap and vision for scalable and robust growth for your cloud infrastructure and platform services
Strong technical analytical & design capability to understand Common & shared platforms, web, & services API with underlying data to provide appropriate network, infrastructure recommendations 100% accountable for the reliable, scalable and optimized cloud architecture, design and platform services.
Solves complex technical problems in Public Clouds (AWS, Azure, GCP) and Private Cloud environment and improves Operational Excellence
Strong ability to deep dive to understand current cloud infra and systems architecture to provide strategic partnership and improvements on network usage, role management, CDN, domains, Hybrid Cloud
Analyze complex distributed production deployments and recommend ways to optimize performance and/or automate processes by managing continuous integration servers, utilizing monitoring and testing tools
Identify opportunities to make disruptive improvements in cloud infrastructure usage, operations and services with a high degree of systematic automation.
Possess expert knowledge in performance (sub millisecond latencies), scalability, availability (99.99% uptime), enterprise architecture best practices
Expert technical influence over multiple teams, increasing their productivity and effectiveness by sharing your deep knowledge and experience.
Strong problem-solving skills, analytical capabilities, and attention to detail
Provide a roadmap and vision for scalable and robust growth for cloud and on-premise infrastructure and platform services
Collaborate with stakeholders and lead engineers on key mission-critical projects
Qualifications:
Preferred:
Req# R0052119