DevOps Engineer - Core AI
We are seeking a skilled DevOps Engineer to join our Core AI Development Team. This team is responsible for ensuring the stability, scalability, and efficiency of our AI systems, which enables the firm to leverage Large Language Models (LLMs) in their daily workflows. As a DevOps Engineer, you will manage our build/release processes, develop metrics systems, and maintain the reliability of our Kubernetes-based infrastructure while leveraging DataDog for monitoring. This role is critical to ensuring smooth operations and continuous improvement of our AI-driven solutions.
Why Join Us
As a DevOps Engineer in the Core AI Development Team, you will play a vital role in ensuring the success of our AI-driven solutions. You will work on cutting-edge technologies, collaborate with talented colleagues, and directly impact the firm's ability to leverage AI at scale. If you are passionate about DevOps, infrastructure reliability, and enabling innovation, we encourage you to apply.
Responsibilities
- Design, implement, and maintain scalable build/release pipelines to support rapid development cycles.
- Ensure the stability and scalability of our Kubernetes-based infrastructure.
- Develop and manage metrics systems leveraging DataDog to monitor system performance and reliability.
- Collaborate with development teams to automate workflows and improve CI/CD processes.
- Troubleshoot and resolve infrastructure-related issues, ensuring minimal downtime.
- Implement best practices for system security, reliability, and scalability.
- Stay updated with advancements in DevOps tools and methodologies to continuously improve processes.
Required Skills / Experience
- Minimum 4 years of experience as a DevOps Engineer.
- Strong experience with Kubernetes, including deployment, scaling, and troubleshooting.
- Proficiency with CI/CD tools and pipelines (e.g., Jenkins, GitLab CI/CD, or similar).
- Experience with monitoring and observability tools, particularly DataDog.
- Solid understanding of infrastructure-as-code tools (e.g., Terraform, Helm).
- Strong scripting skills in Python, Bash, or similar languages.
- Familiarity with containerization technologies such as Docker.
- Proven ability to design and implement scalable and reliable systems.
- Academic degree in a quantitative field (Math, CS, Science’s ) or equivalent experience/knowledge.
Desirable Skills / Experience
- Experience with cloud platforms (AWS, GCP, Azure) for infrastructure management.
- Familiarity with AI/LLM-related workflows and tools.
- Knowledge of networking concepts and security best practices.