Bloomberg runs on data. It's our business and our product. From the biggest banks to elite hedge funds, financial institutions need timely, accurate data to capture opportunities and evaluate risk in fast-moving markets. With petabytes of data available, a solution to transform and analyze the data is critical to our success.
Bloomberg’s Data Science Platform was established to support development efforts around data-driven science, machine learning, and business analytics. The solution aims to provide scalable compute, specialized hardware and first-class support for a variety of workloads such as ML training job and inference services, Spark, and Jupyter. The solution was developed to provide a standard set of tooling for addressing the Model Development Life Cycle from experimentation and training to inference. The solution is built using containerization, container orchestration and cloud architecture and built on top of 100% open source foundations.
Production Inference is a critical step on the MDLC to realize the business value for Bloomberg AI applications and the advent of large language models (LLMs) presents new opportunities for expanding NLP capabilities in our products. The inference solution is powered by open source project KServe which is a production ready inference solution for both generative and predictive AI applications. We are poised for enormous user growth this year and have an ambitious roadmap in terms of new features as well as improved user experience. That’s where you come in. As a member of the inference team, you’ll have the opportunity to design and implement scalable, low latency, high throughput model inference solutions in a hybrid cloud environment. We are founding members of the KServe project to standardize ML Inference within the Kubernetes ecosystem. As part of that, we regularly upstream features we develop, present at conferences and collaborate with our peers in the industry. Open source is at the heart of our team. It's not just something we do in our free time, it is how we work.
Interact with data scientists to understand their production use cases and requirements to advise the next set of GenAI features for the inference platform.
Design solutions for problems such as scalable model deployment, low latency/high throughput inference, GPU resource optimizations and autoscaling.
Automate operation and improve telemetry of the inference platform in our infrastructure stack.
Design solutions for multi-cloud strategy.
Innovate and design solutions that keep in mind strict production SLA: low latency/high throughput, multi-tenancy, high availability, reliability across clusters/data centers, etc.
Fix and optimize generative inference application performance.
Provide developer and operational documentation.
Provide performance analysis and capacity planning for clusters.
String communication and collaboration skills, with the ability to work effective with multi-functional teams
Have a passion for providing reliable and scalable infrastructure.
4+ years programming experience in two or more languages (e.g., Python, Go, C++)
A Degree in Computer Science, Engineering or similar field of study or equivalent work experience
Experience designing and implementing low-latency, high-scalability inference platform.
Design, develop, test and deploy inference solutions for LLMs
Explore emerging inference optimization techniques
Experience with debugging performance issues with distributed tracing.
Experience working with a distributed multi-tenancy and multi-cluster system.
Experience with distributed systems eg. Kubernetes, Kafka, RabbitMQ, Zookeeper/Etcd.
Strong knowledge of data structures and algorithms.
Linux systems experience (Network, OS, Filesystems).
Experience Large Language Model Inference, especially vLLM, TensorRT-LLM runtimes.
Experience with Kubeflow/KServe, MLFlow, Sagemaker.
Experience working with GPU compute software and hardware.
Ability to identify and perform OS and hardware-level optimizations.
Open source involvement such as a well-curated blog, accepted contribution, or community presence.
Experience with cloud LLM providers such as AWS Redrock, Gemini or Azure OpenAI.
Experience with configuration management systems (Terraform, Ansible)
Experience with continuous integration tools and technologies (Jenkins, Git, Chat-ops)
Keynote: Platform Building Blocks: How to build ML infrastructure with CNCF projects - https://www.youtube.com/watch?v=ncED2EMcxZ8
The State and Future of Cloud Native Model Inference -
The Hitchhiker's Guide to Kubernetes Platforms: Don’t Panic, Just Launch! https://www.youtube.com/watch?v=a84mwXicpdc
Yearly based
New York