• Manage Azure Infrastructure: Configure, maintain, and optimize Azure infrastructure for AI model development and deployment, ensuring scalability and performance.
  • Model Performance Monitoring: Implement and maintain monitoring systems to track model performance, proactively identifying and addressing issues as they arise.
  • Incident Response: Collaborate with the SRE team to respond promptly to outages and incidents related to model operations, ensuring minimal downtime and rapid issue resolution.

Requirements

  • Azure Infrastructure Experience: Proficiency in managing Azure infrastructure components, including virtual machines, storage, and networking, to support AI model development and deployment.
  • CI/CD Pipeline Experience: Experience with Continuous Integration/Continuous Deployment (CI/CD) pipelines, including the automation of model deployment processes.
  • Containerization in the Cloud: Strong knowledge of containerization technologies in the cloud, such as Docker and Kubernetes, for efficient deployment and scaling of machine learning models.
  • Machine Learning Expertise: Proficient in building and optimizing machine learning models, with a deep understanding of various ML algorithms and frameworks.
  • Programming Skills: Proficiency in programming languages commonly used in machine learning, such as Python and libraries like TensorFlow and PyTorch.
  • Data Management: Experience in data preprocessing, feature engineering, and data pipeline development for machine learning.
  • Collaborative Team Player: Excellent communication skills and the ability to work collaboratively with cross-functional teams, including AI engineers and SREs.
  • Documentation: Effective documentation skills to maintain clear and organized records of models, infrastructure configurations, and incident responses.
  •  

Location

Mumbai, Maharashtra, India

Job Overview
Job Posted:
1 month ago
Job Expires:
Job Type
Full Time

Share This Job: