As a Research Engineer Intern – Vision-Language Models for E2E Autonomous Driving, you’ll explore the potential of vision-language models to enhance reasoning, scene understanding, and interpretability in end-to-end autonomous driving. You’ll have the opportunity to work towards a publication at a top tier venue by contributing to key areas of model development, including curating both real-world and synthetic training data, fine-tuning foundational vision-language models, and designing robust evaluation frameworks.
Responsibilities:
Lead model development efforts using vision-language models for end-to-end autonomous driving systems
Curate high-quality training datasets from both real-world trips and synthetic sources
Optimize model architectures and fine-tune pre-trained foundational models to enhance performance and adapt to specific challenges
Design and implement evaluation frameworks to rigorously assess model performance in real-world driving environments
Required Skills:
Pursuing MS or PhD in CS, EE, mathematics, statistics or related field
Thorough understanding of deep learning principles and familiarity with vision language models
2-3 years experience with implementing and training deep learning models in at least one deep learning framework (PyTorch, Tensorflow, Jax)
Preferred Skills:
Past experiences in projects involving design, training or fine-tuning of vision language models and familiarity with knowledge distillation, quantization, vLLM
Past experiences in deep learning projects related to autonomous driving
Publication record in relevant venues (CVPR, ICLR, ICCV, ECCV, NeurIPS, AAAI, SIGGRAPH)