Posted: Nov 15, 2024 Weekly Hours: 35 Role Number:200579190
The Apple Machine Learning Research Team is seeking a strong graduate level researcher with experience training large foundation models from scratch for computer vision and multimodal representation learning. The focus for this internship will likely emphasize understanding rather than generative tasks, but experience with both is a plus. The team is interested in many scientific and applied questions around pushing model capabilities, training and inference-time efficiency and scalability, data efficiency, and promising approaches to fusing across modalities.
Description
The goal will be to continue to develop strong vision-language models along the lines of https://arxiv.org/abs/2401.08541 that emphasize simplicity and scalability for pre-training large multimodal foundation models. We expect to work on efforts to find optimal strategies for fusing modalities early, including exploring ways to leverage mixtures-of-experts in interleaved data settings. In addition, we are excited to explore novel use cases and capabilities of multimodal models trained with early fusion.
Minimum Qualifications
Candidate must be either a Master's or PhD-level candidate with strong experience in machine learning and computer science.
We expect the candidate to have solid hands-on coding skills and deep familiarity with training modern deep learning models from scratch.
The candidate must be a strong communicator and good team player.
Preferred Qualifications
Experience with language modeling, mixtures-of-experts, computer vision, and scaling are great to have.
Analytical experience and mathematical versatility is also a strong plus.