Companies want to train their own large models on their own data. The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to model quality at worst. There is compelling research showing that smarter data selection can train better models faster—we know because we did much of this research. Given the high costs of training, this presents a huge market opportunity. We founded DatologyAI to translate this research into tools that enable enterprise customers to identify the right data on which to train, resulting in better models for cheaper.
Our team has pioneered deep learning data research, built startups, and created tools for enterprise ML.
Following our $11.65M Seed round last September, we've raised a $46M Series A led by Felicis Ventures. Our investors include Radical Ventures, Amplify Partners, Microsoft, Amazon, and notable angels like Jeff Dean, Geoff Hinton, Yann LeCun and Elad Gil.
With over $57.5M in total funding, we're rapidly scaling our team and compute resources to revolutionize data curation across modalities. Join us in pushing the boundaries of what's possible in AI!
This role is based in Redwood City, CA. We are in person 4 days a week and offer relocation assistance to new employees. We provide visa sponsorship for candidates selected for this role.
Learn more about the company here.
About the Role
This role is closed for summer 2024 since we have reached capacity, but we will still accept intern candidates for fall and winter!
As a Research Intern at DatologyAI, you will conduct research investigating how intervention on training data can improve the quality and shape the behavior of deep learning models. Here is what your day-to-day would look like:
Transform messy literature into practical improvements. The research literature is vast, ambiguous, and constantly evolving. You will use your skills as a scientist to source, vet, implement, and improve promising ideas from the literature and your own creation.
Perform High-Risk, High-Reward Research. We want our interns to focus on problems that have massive potential to transform how data is ingested into future ML models. Rather than making incremental changes to current algorithms, we want you to work on novel project ideas that could change how we view data.
Conduct science driven by real-world needs. At DatologyAI, we understand that conference reviewers and academic benchmarks don’t always incentivize the most impactful research. Concrete customer needs and product improvements will guide your research.
Science is more than just experiments. We expect our Research Scientist Interns to collaborate closely with engineers, talk to customers, and shape the product vision
About You
Ideal candidates should have strong coding skills with experience with one of the following:
We would like to hire students with practical experience and/or publications related to any of the following research topics:
Data research
Data pruning/curation
Curriculum learning
Synthetic data generation
Dataset distillation
Effects of training data on model behavior
Embedding models
Semantic search
Efficient ML
We would love to have you if you have practical experience and/or publications related to training large vision (especially video), language, and multimodal models.
Or teach us something new that you are passionate about that could improve data curation!