Data quality and diversity is the foundation for training the best agents in any domain. As a member of the Data Team at Reflection, you will play a pivotal role in shaping how we collect and analyze human, synthetic, and internet data. This is an interdisciplinary role that primarily requires engineering, research, and communication skills, along with a sharp attention to detail and willingness to “roll up your sleeves” and look at the data.
1. Experiment and Benchmark Design
Develop techniques for collecting, augmenting, filtering, or synthesizing training and evaluation data using creativity and analytical thinking
Design experiments, in collaboration with machine learning researchers, to assess the impact of different datasets on model performance
When required, manage human annotators working on data collection efforts – this could include tracking payments and hours, training annotators, and providing technical support, feedback, and quality control
2. Qualitative and Quantitative Data Analysis
Analyze collected data, e.g. coding tasks, both qualitatively and quantitatively
Evaluate model behavior to identify its strengths and weaknesses
Clearly communicate findings with machine learning research and product teams
3. Data Engineering
Design, implement, and optimize scalable data pipelines to support reinforcement learning and supervised finetuning
Leverage LLMs to perform data filtering, cleaning, and augmentation
Software engineering background with experience building data processing pipelines at scale, particularly with LLM integration
Proficiency in Python or other programming languages (Go, TypeScript, etc.)
Detail-oriented and analytical, with the ability to conduct careful qualitative and quantitative data analysis.
Excellent organizational and communication skills to collaborate closely with cross-functional teams and manage human data operations
Experience with machine learning, reinforcement learning, and LLMs is a plus, but not strictly required.
The opportunity to work at the forefront of AI research and data collection for training cutting-edge models.
Collaboration with a team of world-class researchers and engineers from top AI labs and companies.
Competitive compensation and benefits, with opportunities for professional growth.