U.S. Citizenship required and eligibility for a Federal Security Clearance
Our Team
Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Data, Analytics Platform. This team will focus on product development and product strategy for Oracle Health, while building out a complete platform supporting modernized, automated healthcare. This is a net new line of business, constructed with an entrepreneurial spirit that promotes an energetic and creative environment. We are unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence.
Oracle Health Data, Analytics Platform has a rare opportunity to play a critical role in how Oracle Health products impact and disrupt the healthcare industry by transforming how healthcare and technology intersect.
You will have the opportunity to:
- Reach billions of people with our products & services
- Create technology in which truly impacts the world
- Ability to have immediate impact on developing technology
- Unlimited growth potential with inspiring work
- Work with the best minds in the industry
- Enjoy working in an open, diverse, and productive environment
About The Job
This role provide support to core data platforms behind Oracle Health’s Data & Analytics Platform. As a Senior Site Reliability Engineer (SRE), you will own shared, mission-critical systems used by multiple products and teams.
You will work on the design and operation of large-scale, stateful distributed platforms, including Hadoop ecosystem components (HDFS, YARN, HBase) deployed on Oracle Big Data Service (BDS), Kafka, and Storm. These multi-tenant platforms are deployed and operated through Ansible- and Terraform-based automation and require strong architectural ownership to manage scale, change, and broad blast radius.
What You'll Do
Platform Ownership & Technical Leadership
- Own the end-to-end reliability, scalability, and operability of shared data platforms
- Define platform standards, architectural direction, and operational guardrails
- Influence cross-team technical decisions and long-term platform strategy
- Drive long-term platform evolution and influence reliability strategy across the data ecosystem
Architecture & Design
- Clearly articulate system behavior, dependencies, and failure modes
- Make principled trade-offs between reliability, performance, cost, and complexity
- Provide guidance and guardrails that enable downstream teams to use platforms safely and effectively
Operations Engineering
- Establish capacity models, scaling strategies, and operational best practices
- Design platforms that behave predictably under load, failure, and change
- Own platform lifecycle events: upgrades, expansions, decommissioning, and recovery
Distributed Systems Expertise
- Operate and evolve stateful distributed systems where data placement, replication, and recovery are critical
- Reason about failure modes such as backpressure, rebalancing, region movement, replication lag, and rolling upgrades
Security
- Operate and maintain Kerberized platforms, including authentication, authorization, and secure service-to-service communication
- Treat security as a first-class architectural concern
Automation
- Design and evolve an Ansible- and Terraform-driven automation framework
- Treat automation as production software: versioned, reviewed, tested, and improved
- Eliminate operational toil by encoding reliability and safety into the platform
Incident Leadership & Prevention
- Serve as the ultimate escalation point for complex or ambiguous incidents
- Focus on eliminating entire classes of failure, not just resolving individual issues
Representation
- Represent SRE and platform engineering in high-visibility and sensitive forums
- Communicate clearly with engineering leadership and partner teams
Internal Responsibilities
Responsibilities
The team operates within the Oracle Health Data & Analytics Platform, supporting one of Oracle Health’s core products, HealtheIntent. We operate the big data and streaming infrastructure that enables downstream teams to deliver reliable customer-facing solutions at scale, while continuously improving operability and efficiency.
Required Experience
- 4+ years operating large-scale, customer-facing distributed platforms
- Deep experience with HDFS, YARN, HBase, Kafka, Storm, or similar systems
- Strong background in Linux, networking, and distributed system troubleshooting
- Infrastructure-as-Code using Ansible and Terraform
- Scripting and automation using Python, Ruby, and Bash
- Hands-on experience operating Kerberized environments
- Proven ability to define and document technical architecture for complex systems
- Demonstrated ownership of shared platforms with broad blast radius and multiple downstream consumers
- Experience designing observability and capacity models for distributed platforms
Required Qualifications:
- U.S. Citizenship and eligibility for a Federal Security Clearance
- 5+ years of technical experience relevant to this position
- Ability to communicate effectively and build rapport with team members
- BS or MS in Computer Science, or equivalent
External Responsibilities
Responsibilities
The team operates within the Oracle Health Data & Analytics Platform, supporting one of Oracle Health’s core products, HealtheIntent. We operate the big data and streaming infrastructure that enables downstream teams to deliver reliable customer-facing solutions at scale, while continuously improving operability and efficiency.
Required Experience
- 4+ years operating large-scale, customer-facing distributed platforms
- Deep experience with HDFS, YARN, HBase, Kafka, Storm, or similar systems
- Strong background in Linux, networking, and distributed system troubleshooting
- Infrastructure-as-Code using Ansible and Terraform
- Scripting and automation using Python, Ruby, and Bash
- Hands-on experience operating Kerberized environments
- Proven ability to define and document technical architecture for complex systems
- Demonstrated ownership of shared platforms with broad blast radius and multiple downstream consumers
- Experience designing observability and capacity models for distributed platforms
Required Qualifications:
- U.S. Citizenship and eligibility for a Federal Security Clearance
- 5+ years of technical experience relevant to this position
- Ability to communicate effectively and build rapport with team members
- BS or MS in Computer Science, or equivalent