Team Introduction​:
The TikTok Flink Ecosystem Team plays a critical role in delivering real-time computing capabilities to power TikTok’s massive-scale recommendation, search, and advertising systems. This team is focused on building the infrastructure for stream processing at exabyte scale — enabling ultra-low-latency, high-reliability, and cost-efficient real-time data transformations.​

We are deeply involved in developing and optimizing Apache Flink and surrounding components like connectors, state backends, and runtime execution models to meet TikTok’s rapidly evolving data needs at EB-level throughput and scale.​

We also collaborate closely with ML infrastructure teams to bridge real-time stream processing and machine learning. This includes integrating Velox to accelerate model training, building multimodal data pipelines, and utilizing frameworks like Ray to orchestrate large-scale distributed ML workflows.


Responsibilities:​
- Design and develop core Flink operators, connectors, or runtime modules to support TikTok’s exabyte-scale real-time processing needs.​
​- Build and maintain low-latency, high-throughput streaming pipelines powering online learning, recommendation, and ranking systems.​
- ​Collaborate with ML engineers to design end-to-end real-time ML pipelines, enabling efficient feature generation, training data streaming, and online inference.​
- Leverage Velox for compute-optimized ML data transformation and training acceleration on multimodal datasets (e.g., video, audio, and text).​
- Use Ray to coordinate distributed machine learning workflows and integrate real-time feature pipelines with ML model training/inference.​
- Optimize Flink job performance, diagnose bottlenecks, and deliver scalable solutions across EB-scale streaming workloads.

Location

San Jose, California, United States

Job Overview
Job Posted:
2 days ago
Job Expires:
Job Type
Intern

Share This Job: