Help architect, design and implement ThinkBio.Ai AI insight and data platforms; design agentic architecture for the AI platform
Evaluate and analyze foundation models for protein structure, clinical data, genomic data and small molecule data; fine tune foundation models. Design loss functions relevant to biological applications
Design scalable data models for multi-modal biological data
Create a componentized architecture with component level structured APIs
Design and implement knowledge graphs for biological data; apply deep learning based methods to create representations from knowledge graphs
Work closely with other team members and partners to identify most critical data centered challenges and address them using cutting-edge computational, statistical and machine learning applications
Requirements
Ph.D. in Computer science, AI/ML, Mathematics/Statistics
10+ years’ experience and technical expertise in AI platforms, data platforms, deep learning models
Solid understanding of computer science fundamentals
data structures: trees, directed and undirected graphs, hash tables, heaps
algorithms: search, sort, rank, graph based algorithms
Solid understanding of statistical/mathematical concepts
probability theory
correlation metrics
Bayesian probability
Statistical regression methods
Good understanding of GPU architectures – CUDA
Proficiency in Python, C, C++
Expertise in AI frameworks: PyTorch, TensorFlow
Exposure to agentic frameworks like LangChain and LlamaIndex
Firm grasp of modern statistical methods and machine learning techniques, and their applications to large-scale, data
Ability to manage projects with minimal supervision, using creative and analytical thinking.
Ability to drive highly collaborative work across the organization and outside the company
Excellent oral and written communication skills
Preferred Skills
Experience and understanding of how bioinformatics and data science can best be applied to speed up drug discovery
Basic understanding of biological concepts and a familiarity with drug development process
Knowledge of bioinformatic tools and databases to analyze genomics and proteomics data.