Oracle's Enterprise Engineering GPU Compute team is seeking an experienced Director of Software Engineering to lead the Validation for New Product Introduction for the latest GPU SKUs and their health over their lifecycle.
This is a hands-on leadership role requiring experience working with GPU servers, their configuration, validation , benchmarking , debugging , diagnosis and repairs. You will drive technical strategy, mentor a world-class engineering team, and deliver solutions that set the quality bar for our GPU deliveries and health of running GPU systems.
Internal Responsibilities
Key Responsibilities Strategic Leadership & Product Innovation
- Define and execute the technical and product strategy for GPU Validation across existing and new GPU SKUs.
- Ensure smooth and reliable delivery of new capacity via Validation.
- Work with customers to align quality expectations , maximize customer fleet health.
- Constantly monitor and improve procedures and systems , pushing for greater automation.
- Leverage AI to build feedback loops that accelerate diagnosis and repair
Technical Execution & Platform Development
- Implement DevOps and CI/CD automation, platform monitoring, and system performance optimization (latency, scale, cost)
- Evaluate, architect, and integrate emerging data engineering and AI technologies, assessing their relevance, maturity, and impact on Oracle's GPU engineering
- Define and measure key metrics for system quality, customer satisfaction, and cost efficiency.
- Work with NVIDIA And AMD to define future strategy for GPU validation , diagnosis and repair.
- Automate diagnosis with increasing reliance on AI.
Customer Experience & Omnichannel Delivery
- Lead initiatives to deliver rapid and accurate diagnosis and repair for GPU servers.
- Drive improvements in customer self-service, satisfaction, and case deflection through innovative AI solutions
Team Leadership & Culture
- Build and foster an innovative, high-performance engineering culture that attracts and retains top talent.
- Mentor, coach, and develop team members, creating clear career growth pathways.
- Lead and inspire geographically distributed, cross-functional teams.
- Establish best practices for scalable, secure, and robust deployment of data services, ensuring seamless integration with Oracle’s cloud and application ecosystem.
- Shape technical strategy and influence the platform roadmap by translating market and customer requirements into actionable engineering plans.
- Foster cross-functional collaboration with product, engineering, science, and enterprise customers to prototype and validate innovative workflows.
Required Qualifications
- 12+ years of software engineering leadership experience, including several years operating at fleet infrastructure level.
- Proven track record delivering high-availability, cloud services with strong focus on security, compliance, and operational excellence.
- Demonstrated ability to lead, inspire, and develop world-class engineering teams in fast-paced, dynamic environments.
- Experience collaborating with cross-functional, geographically distributed teams and stakeholders.
- Excellent communication skills, strategic thinking capabilities, and analytical problem-solving abilities
Preferred Qualifications
- Experience with Oracle Cloud Infrastructure (OCI) stack
- Familiarity with regulatory requirements and compliance frameworks
External Responsibilities
Key Responsibilities Strategic Leadership & Product Innovation
- Define and execute the technical and product strategy for GPU Validation across existing and new GPU SKUs.
- Ensure smooth and reliable delivery of new capacity via Validation.
- Work with customers to align quality expectations , maximize customer fleet health.
- Constantly monitor and improve procedures and systems , pushing for greater automation.
- Leverage AI to build feedback loops that accelerate diagnosis and repair
Technical Execution & Platform Development
- Implement DevOps and CI/CD automation, platform monitoring, and system performance optimization (latency, scale, cost)
- Evaluate, architect, and integrate emerging data engineering and AI technologies, assessing their relevance, maturity, and impact on Oracle's GPU engineering
- Define and measure key metrics for system quality, customer satisfaction, and cost efficiency.
- Work with NVIDIA And AMD to define future strategy for GPU validation , diagnosis and repair.
- Automate diagnosis with increasing reliance on AI.
Customer Experience & Omnichannel Delivery
- Lead initiatives to deliver rapid and accurate diagnosis and repair for GPU servers.
- Drive improvements in customer self-service, satisfaction, and case deflection through innovative AI solutions
Team Leadership & Culture
- Build and foster an innovative, high-performance engineering culture that attracts and retains top talent.
- Mentor, coach, and develop team members, creating clear career growth pathways.
- Lead and inspire geographically distributed, cross-functional teams.
- Establish best practices for scalable, secure, and robust deployment of data services, ensuring seamless integration with Oracle’s cloud and application ecosystem.
- Shape technical strategy and influence the platform roadmap by translating market and customer requirements into actionable engineering plans.
- Foster cross-functional collaboration with product, engineering, science, and enterprise customers to prototype and validate innovative workflows.
Required Qualifications
- 12+ years of software engineering leadership experience, including several years operating at fleet infrastructure level.
- Proven track record delivering high-availability, cloud services with strong focus on security, compliance, and operational excellence.
- Demonstrated ability to lead, inspire, and develop world-class engineering teams in fast-paced, dynamic environments.
- Experience collaborating with cross-functional, geographically distributed teams and stakeholders.
- Excellent communication skills, strategic thinking capabilities, and analytical problem-solving abilities
Preferred Qualifications
- Experience with Oracle Cloud Infrastructure (OCI) stack
- Familiarity with regulatory requirements and compliance frameworks