At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world. Values are OCI’s foundation and how we deliver excellence. We strive for equity, inclusion, and respect for all. We are committed to the greater good in our products and our actions. We are constantly learning and taking opportunities to grow our careers and ourselves. We challenge each other to stretch beyond our past to build our future. You are the builder here. You will be part of a team of really smart, motivated, and diverse people and given the autonomy and support to do your best work. It is a dynamic and flexible workplace where you’ll belong and be encouraged.
OCI Compute is looking for strong Systems/Software Developers with a strong Linux OS background to take on the challenge of engineering Compute GPU/HPC Infrastructure solutions and build an imaging service for Large Scale Compute/HPC/AI/ML Customer Workloads and performance while providing strong guarantees of availability to our customers. Your team will have diverse expertise in systems, networking, storage, and software development to provide the stability, performance, and reliability that our customers need.
You will be responsible for developing and maintaining our core Linux images, solving complex Linux and hypervisor-level issues (e.g., QEMU), and building solutions to address Platform and Custom GPU/HPC images to meet customer workload requirements. OCI has taken the lead in becoming the premiere Cloud Provider of GPUs, and this is your opportunity to play a role in the Compute/HPC/AI/ML industry movement on the Linux platform.
Internal Responsibilities
- Design and develop image automation software in Java, Python, and other languages.
- Apply engineering principles for defining robust and maintainable architectures and designs.
- Build cloud service on top of the modern Infrastructure as Service (IaaS) building blocks at OCI
- Design and build distributed, scalable, fault tolerant software systems
- Participate in the entire software lifecycle – development, testing, CI/CD and production operations
- Collaborate broadly across multiple disciplines from hardware designers to HPC/GPU developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables. Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.
- Balance between product feature development and production operational concerns like ops automation, structured logging, instrumentation for metrics and participating in on-call.
Qualifications:
- 6-10+ years of developing and shipping enterprise distributed and/or cloud native systems
- Strong grasp of system design fundamentals and distributed systems architectural best practices
- Demonstrated ability to write great code in Java, Python, or similar OO languages
- Experienced at building highly available services, possessing knowledge of common service-oriented design patterns and service-to-service communication protocols
- Experience with HPC and GPU compute fundamentals.
- Strong desire to make an impact and thrive in collaborative and energetic environments
- Ability to effectively communicate technical concepts verbally and through design aspects
- BS or MS degree in Computer Science, or equivalent
Preferred Skills and Experience:
- Linux/Windows core operating system including systems tuning
- Imaging tooling such as Ansible, Packer and Oracle Image builder.
- Experience with Oracle's cloud infrastructure.
External Responsibilities
- Design and develop image automation software in Java, Python, and other languages.
- Apply engineering principles for defining robust and maintainable architectures and designs.
- Build cloud service on top of the modern Infrastructure as Service (IaaS) building blocks at OCI
- Design and build distributed, scalable, fault tolerant software systems
- Participate in the entire software lifecycle – development, testing, CI/CD and production operations
- Collaborate broadly across multiple disciplines from hardware designers to HPC/GPU developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables. Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.
- Balance between product feature development and production operational concerns like ops automation, structured logging, instrumentation for metrics and participating in on-call.
Qualifications:
- 6-10+ years of developing and shipping enterprise distributed and/or cloud native systems
- Strong grasp of system design fundamentals and distributed systems architectural best practices
- Demonstrated ability to write great code in Java, Python, or similar OO languages
- Experienced at building highly available services, possessing knowledge of common service-oriented design patterns and service-to-service communication protocols
- Experience with HPC and GPU compute fundamentals.
- Strong desire to make an impact and thrive in collaborative and energetic environments
- Ability to effectively communicate technical concepts verbally and through design aspects
- BS or MS degree in Computer Science, or equivalent
Preferred Skills and Experience:
- Linux/Windows core operating system including systems tuning
- Imaging tooling such as Ansible, Packer and Oracle Image builder.
- Experience with Oracle's cloud infrastructure.