The Oracle Cloud Infrastructure team provides the opportunity to build and operate massive-scale, integrated cloud services in a broadly distributed, multi-tenant cloud environment. OCI is committed to delivering cloud products that meet the needs of customers tackling some of the world’s biggest challenges.
As a Software Engineer on the Multi-cloud team, you will design, build, test, deploy, and operate services and platforms that enable cloud integration, interoperability, automation, and reliable service delivery. You will work with engineers across OCI, Oracle, and external partner teams to solve complex distributed systems problems.
This is an AI-first team. We use AI-assisted engineering tools, including Codex, as part of our daily development workflow. You will be expected to use AI effectively to improve productivity, create and maintain AI skills, automate engineering tasks, accelerate troubleshooting, and improve the quality and maintainability of our systems.
You should be comfortable working across the stack, learning new systems quickly, debugging complex production issues, and operating services with high availability expectations. This role includes participation in an on-call rotation, incident response, operational reviews, and continuous improvement of service reliability.
Basic Qualifications:
- 5+ years of experience designing, building, and operating production-scale distributed systems or cloud services.
- BS/MS in Computer Science or equivalent experience.
- Deep hands-on experience with Java is strongly preferred; strong experience in another systems or application programming language, such as Python, Go, C/C++, or C#, will also be considered. Candidates should be able to quickly become productive in a Java-based service environment.
- Demonstrated ability to apply AI-assisted development tools such as Codex, Claude Code, GitHub Copilot, or similar tools to improve engineering productivity, code quality, documentation, testing, debugging, and operational workflows.
- Ability to design, create, and guide adoption of reusable prompts, workflows, tools, or AI skills that improve team execution and engineering quality.
- Deep experience designing, deploying, operating, and troubleshooting Linux-based containerized services using Docker and Kubernetes, including service configuration, networking, scaling, health checks, observability, and production incident debugging.
- Deep experience designing, building, operating, and debugging RESTful APIs over HTTPS, including API contracts, authentication, authorization, error handling, idempotency, retries, timeouts, observability, backward compatibility, and production troubleshooting.
- Experience building or operating cloud services, distributed systems, or integration platforms.
- Strong foundation in computer science fundamentals, including data structures, algorithms, operating systems, networking, and distributed systems.
- Strong experience improving CI/CD, infrastructure automation, service observability, metrics, logging, tracing, alerting, and operational readiness for production services.
- Working knowledge of cloud infrastructure, identity, networking, security, or multi-cloud environments.
- Demonstrated ownership of production reliability, including incident response, root cause analysis, operational reviews, automation, testing, and durable corrective actions.
- Demonstrated ability to lead technical projects across multiple engineers or teams.
- Experience mentoring engineers through design reviews, code reviews, debugging, operational readiness, and production support.
- Willingness to participate in an on-call rotation and support production services.
Internal Responsibilities
As a Principal Software Engineer on the Multi-cloud team, you will lead the design, development, deployment, and operation of Java-based cloud services, RESTful APIs, and containerized workloads running on Kubernetes. You will work across OCI, Oracle, and partner teams to deliver reliable integration platforms and automation capabilities.
You will own complex features and service improvements from architecture through production, including design, implementation, testing, debugging, observability, operational readiness, and long-term maintainability. You will help guide technical direction, review designs, mentor engineers, and raise the engineering bar for reliability, scalability, security, and supportability.
You will use AI-assisted development tools such as Codex, Claude Code, GitHub Copilot, or similar tools to improve engineering productivity, code quality, documentation, testing, and troubleshooting. You will also help create and promote reusable prompts, workflows, tools, or AI skills that improve team execution.
This role includes participation in an on-call rotation, production incident response, and leadership in improving service reliability, automation, monitoring, alerting, and post-incident follow-up.
External Responsibilities
As a Principal Software Engineer on the Multi-cloud team, you will lead the design, development, deployment, and operation of Java-based cloud services, RESTful APIs, and containerized workloads running on Kubernetes. You will work across OCI, Oracle, and partner teams to deliver reliable integration platforms and automation capabilities.
You will own complex features and service improvements from architecture through production, including design, implementation, testing, debugging, observability, operational readiness, and long-term maintainability. You will help guide technical direction, review designs, mentor engineers, and raise the engineering bar for reliability, scalability, security, and supportability.
You will use AI-assisted development tools such as Codex, Claude Code, GitHub Copilot, or similar tools to improve engineering productivity, code quality, documentation, testing, and troubleshooting. You will also help create and promote reusable prompts, workflows, tools, or AI skills that improve team execution.
This role includes participation in an on-call rotation, production incident response, and leadership in improving service reliability, automation, monitoring, alerting, and post-incident follow-up.