Shepherd is OCI's cloud-scale release orchestration platform, enabling the safe and reliable deployment of software, infrastructure, and configuration changes across regions, realms, and critical cloud services. As OCI continues to expand into sovereign, regulated, and large-scale deployment environments, Shepherd is evolving into a resilient orchestration control plane for cloud-scale change.
We are seeking a Lead Principal Platform Software Engineer to lead the evolution of Shepherd's architecture and engineering practices. This hands-on technical leadership role requires deep expertise in distributed systems, cloud platform architecture, deployment orchestration, and operational excellence. The successful candidate will drive architectural direction, deliver foundational software, lead complex cross-team initiatives, and establish engineering standards that improve reliability, maintainability, and developer productivity.
Internal Responsibilities
• Lead the architecture, design, and implementation of Shepherd platform capabilities for release orchestration, deployment safety, rollback automation, dependency modeling, and operational workflows.
• Design and build highly available distributed systems that operate reliably across cloud-scale, multi-region, and partially connected environments.
• Drive long-term platform architecture, including APIs, service boundaries, persistence models, workflow execution, resiliency, compatibility, and extensibility.
• Partner across engineering, SRE, security, compliance, infrastructure, and OCI service teams to deliver cross-organizational technical initiatives.
• Establish engineering standards for code quality, testing, documentation, observability, operational readiness, and maintainability.
• Provide technical leadership through architecture reviews, design documents, production-ready code, mentoring, and incident investigations.
• Develop automation and AI-assisted engineering workflows that improve developer productivity while maintaining production quality, security, and reliability.
• Continuously improve platform reliability, deployment safety, operational efficiency, and engineering excellence.
Required Qualifications
• Strong experience designing, building, and operating large-scale distributed systems in production cloud environments.
• Deep understanding of cloud platform architecture, orchestration systems, APIs, resiliency, observability, and operational safety.
• Strong programming skills in Java, Python, Go, TypeScript, or similar languages.
• Experience leading complex cross-functional technical initiatives and influencing architecture across teams.
• Strong software engineering fundamentals including testing, code quality, design patterns, and maintainability.
• Experience with databases, distributed systems, API design, and production debugging.
• Hands-on experience with AI-assisted software engineering and engineering automation.
• Excellent written and verbal communication, technical leadership, and mentoring skills.
Preferred Experience
• Experience with Oracle Cloud Infrastructure (OCI) or another hyperscale cloud platform.
• Experience with deployment orchestration, release automation, cloud control planes, or internal developer platforms.
• Experience with compliance-sensitive or sovereign cloud environments.
• Experience defining engineering standards, SLOs, rollout policies, and observability frameworks.
• Experience developing reusable platform libraries, automation frameworks, and reference implementations.
External Responsibilities
• Lead the architecture, design, and implementation of Shepherd platform capabilities for release orchestration, deployment safety, rollback automation, dependency modeling, and operational workflows.
• Design and build highly available distributed systems that operate reliably across cloud-scale, multi-region, and partially connected environments.
• Drive long-term platform architecture, including APIs, service boundaries, persistence models, workflow execution, resiliency, compatibility, and extensibility.
• Partner across engineering, SRE, security, compliance, infrastructure, and OCI service teams to deliver cross-organizational technical initiatives.
• Establish engineering standards for code quality, testing, documentation, observability, operational readiness, and maintainability.
• Provide technical leadership through architecture reviews, design documents, production-ready code, mentoring, and incident investigations.
• Develop automation and AI-assisted engineering workflows that improve developer productivity while maintaining production quality, security, and reliability.
• Continuously improve platform reliability, deployment safety, operational efficiency, and engineering excellence.
Required Qualifications
• Strong experience designing, building, and operating large-scale distributed systems in production cloud environments.
• Deep understanding of cloud platform architecture, orchestration systems, APIs, resiliency, observability, and operational safety.
• Strong programming skills in Java, Python, Go, TypeScript, or similar languages.
• Experience leading complex cross-functional technical initiatives and influencing architecture across teams.
• Strong software engineering fundamentals including testing, code quality, design patterns, and maintainability.
• Experience with databases, distributed systems, API design, and production debugging.
• Hands-on experience with AI-assisted software engineering and engineering automation.
• Excellent written and verbal communication, technical leadership, and mentoring skills.
Preferred Experience
• Experience with Oracle Cloud Infrastructure (OCI) or another hyperscale cloud platform.
• Experience with deployment orchestration, release automation, cloud control planes, or internal developer platforms.
• Experience with compliance-sensitive or sovereign cloud environments.
• Experience defining engineering standards, SLOs, rollout policies, and observability frameworks.
• Experience developing reusable platform libraries, automation frameworks, and reference implementations.