Oracle Corporation's 'SaaS Engineering' team is setting up an exciting new team to work on advances in service reliability with teams of autonomous AI agents. This initiative aims to develop a robust system using advanced ML/AI tools to analyze system logs, predict failures, and autonomously resolve issues before they impact cloud services. The project combines the cutting-edge domains of anomaly detection and autonomous AI agents to enhance service resiliency. This newly formed team will be crucial in driving innovation and ensuring the reliability of Oracle's cloud services, making significantcontributions to service uptime and customer satisfaction.
Requirements:
● Extensive Experience: 7-10 years of experience in machine learning engineering, with a strong focus on developing and deploying machine learning models in production environments.
● Technical Expertise: Strong expertise in Python and deep learning frameworks such as PyTorch and TensorFlow. Extensive experience with time-series data analysis, feature engineering, and model optimization techniques.
● Cloud & Containerization: Knowledge of cloud services (AWS, Azure, Oracle Cloud), containerization (Docker, Kubernetes), and microservices architecture. Proven track record of deploying machine learning models in a cloud environment.
�� AI Agent Frameworks: Knowledge and hands-on experience with AI agent frameworks and libraries is a plus. Proven ability to develop and deploy autonomous AI Agents in real-world applications
�� Anomaly Detection Expertise: Experience in designing and implementing anomaly detection systems for system logs. Proficiency in using statistical methods, machine learning algorithms, and signal processing techniques to identify and diagnose anomalies in large-scale log data.
● Problem-Solving Skills: Excellent problem-solving skills, with the ability to identify and address potential issues proactively. Experience in troubleshooting and optimizing machine learning models and systems.
��� Leadership Skills: Proven leadership abilities with a track record of mentoring and guiding junior team members. Strong communication skills to effectively collaborate with cross-functional teams and present complex technical information to non-technical stakeholders.
● Educational Background: Advanced degree (Master���s or Ph.D.) in Computer Science, Machine Learning, Data Science, or a related field. Relevant certifications in machine learning or AI are a plus.
�� Adaptability: Ability to thrive in a fast-paced, dynamic environment, managing multiple tasks and projects simultaneously. Eagerness to continuously learn and adapt to new technologies and methodologies.
Career Level - IC5
Responsibilities:
● Lead ML Model Development: Lead the design and development of advanced machine learning models for detecting anomalies in system logs. This includes selecting appropriate algorithms, designing model architecture, and ensuring robust model training and validation.
● Scalable Solution Architecture: Architect and implement scalable machine learning solutions for real-time failure prediction and automated resolution. Ensure that solutions are designed for high availability, fault tolerance, and scalability in a cloud environment.
● Model Optimization: Optimize and fine-tune models to enhance recall and precision, ensuring high performance and reliability. Use techniques such as hyperparameter tuning, cross-validation, and performance monitoring to achieve optimal results.
● Mentorship & Collaboration: Mentor junior ML engineers, providing guidance on best practices in machine learning, model development, and software engineering. Collaborate closely with data scientists and data engineers to integrate models into production and ensure end-to-end solution effectiveness.
● Continuous Learning & Innovation: Stay updated with the latest advancements in machine learning and AI. Incorporate best practices and new methodologies into the development process to maintain a cutting-edge approach.
● System Integration: Work closely with DevOps and engineering teams to deploy machine learning models into production environments. Ensure seamless integration with existing systems and workflows.
● Documentation & Reporting: Ensure thorough documentation of model development processes, system architectures, and project outcomes. Regularly report on project progress, challenges, and achievements to senior leadership and stakeholders.
Disclaimer:As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s problems. True innovation starts with diverse perspectives and various abilities and backgrounds.
When everyone’s voice is heard, we’re inspired to go beyond what’s been done before. It’s why we’re committed to expanding our inclusive workforce that promotes diverse insights and perspectives.
We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by calling +1 888 404 2494, option one.
Disclaimer:
Oracle is an Equal Employment Opportunity Employer*. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
* Which includes being a United States Affirmative Action Employer
Yearly based
United States