About Arc Institute
The Arc Institute is a new scientific institution conducting curiosity-driven basic science and technology development to understand and treat complex human diseases. Headquartered in Palo Alto, California, Arc is an independent research organization founded on the belief that many important research programs will be enabled by new institutional models. Arc operates in partnership with Stanford University, UCSF, and UC Berkeley.
While the prevailing university research model has yielded many tremendous successes, we believe in the importance of institutional experimentation as a way to make progress. These include:
- Funding: Arc fully funds Core Investigators’ (PIs’) research groups, liberating scientists from the typical constraints of project-based external grants.
- Technology: Biomedical research has become increasingly dependent on complex tooling. Arc Technology Centers develop, optimize, and deploy rapidly advancing experimental and computational technologies in collaboration with Core Investigators.
- Support: Arc aims to provide first-class support—operationally, financially, and scientifically—that will enable scientists to pursue long-term high risk, high reward research that can meaningfully advance progress in disease cures, including neurodegeneration, cancer, and immune dysfunction.
- Culture: We believe that culture matters enormously in science and that excellence is difficult to sustain. We aim to create a culture that is focused on scientific curiosity, a deep commitment to truth, broad ambition, and selfless collaboration.
Arc has scaled to nearly 200 people to date. With $650M+ in committed funding and a state of the art new lab facility in Palo Alto, Arc will continue to grow quickly in the coming years.
About the position
The Arc Institute is seeking a Machine Learning Scientist to join the Hsu and Konermann Labs at the Arc Institute (also affiliated with UC Berkeley Bioengineering and Stanford Biochemistry). The successful candidate will play a crucial role in advancing the state-of-the-art in generative AI applied to biology, including our frontier DNA foundation model (Evo). You will focus on developing ML models for biological data, leveraging frontier approaches like hybrid masked-causal objectives, discrete diffusion, mechanistic interpretability techniques, and more. You will also apply your models for important computational biology applications in genome mining, molecular technology development, and invention of new therapeutic approaches.
This role is a unique opportunity to advance state-of-the-art machine learning in genomics, contribute to high-impact scientific discoveries, and help define how computational approaches shape our understanding and engineering of biology. You will work in a highly collaborative team with expert experimental biologists to realize the full impact of your work, and also have the opportunity to contribute to Institute-wide machine learning efforts such as Arc’s Virtual Cell Initiative.
About you
- You are passionate about developing machine learning models with real-world applications and scientific impact
- You have a strong understanding of modern deep learning, computational biology, and genetics
- You are excited about collaborating with a multidisciplinary team of experimental biologists and machine learning researchers at Arc
- You are known for your ability to analyze/visualize complex datasets, build high-quality ML tools, draw meaningful conclusions, and work effectively in a multidisciplinary team.
- You are familiar with recently reported projects in the lab, including our recent work in the fields of generative genomics and biological design: genomic language modeling, protein language modeling, and discovering new genome engineering tools
In this position you will
- Train and evaluate state-of-the-art machine learning models for molecular biology and genomics
- Develop methods for applying biological language models to the study of both eukaryotic and prokaryotic genome function and gene discovery
- Apply recent advances in mechanistic interpretability, such as sparse autoencoders, to the study of genomic language models such as Evo
- Develop, test, and maintain modular open-source software to accelerate adoption and application of genomic language models.
- Contribute directly to ongoing discovery projects in the lab in the fields of generative genomics, genome engineering, and therapeutics
- Effectively communicate analysis results to experimental scientists as well as computational scientists
Requirements
- Ph.D. in Computer Science, Bioinformatics, Computational Biology, Genetics/Genomics, with 0-5 years of industry/academia experience post degree.
- Hands-on experience with training and evaluating the performance of machine learning models for large datasets
- High competency with Python, bash, and standard deep learning frameworks such as PyTorch
- Experience with Linux, git/GitHub, Docker, and Jupyter/RMarkdown notebooks.
- Appreciation for how choices in experimental design affect the data analysis process.
- Enjoy working collaboratively and cross-functionally with experimental scientists