Leads complex, data center hardware support services to ensure resilience, reliability, and availability at scale. Designs, governs, and signs off on advanced change activities and site augmentations, driving standardization of SOPs/MOPs across regions. Serves as the highest operational escalation point for incidents, leading cross-functional triage, root-cause analysis, and long-term corrective actions. Partners with engineering, capacity, networking, and security teams to anticipate risks, implement architectural improvements, and automate repeatable workflows. Mentors and develops team mates, uplifts best practices, and influences policy and process changes that improve performance across the group and organization.
Requirements:
- Bachelor of Science degree in Computer Science, Information Systems, OR equivalent work experience.
- Recent graduates are encouraged to apply if they can demonstrate interest and knowledge in the position and a willingness to learn on the job – gap year students also considered.
- Prefer 2-5 years of data center experience; 4 years for development preferred
- Experience working in a mission-critical operations environment.
- The position has a physical component requiring the ability to lift to 25Kg and may require working in cramped spaces or elevated locations.
- Thorough understanding of data center design, including electrical and cooling plant & and operation thereof.
- Airflow management (hot aisle, cold aisle, containment etc.)
- Rack, and stack equipment including commissioning of new racks.
- Strong structured cabling skills on fiber & and copper (e.g looming, term & test).
- Fully competent in own area of expertise-anticipates problems and develops contingency plans
- Leadership experience is a must (minimum 10 plus people in a quickly changing environment)
- Able to carry out directive without supervision
- Able to resolve issues on their own and apply stop gaps for damage control
- Able to address not just technicians, but management as well
- Able to coordinate with peer level IC’s and formulate solutions to a multitude of problems.
- Able to guide junior technicians and instruct them on proper etiquette and protocol
- Able to lift 50lbs.
- Able to climb a 9 ft. ladder.
- Ability to stand for up to 12 hours
Required systems & hardware expertise:
- Systems administration (Linux and/or Windows Servers - especially command line based).
- Hardware repair, diagnosis & break/fix.
- Networking fundamentals (DNS, TCP/IP, basic troubleshooting).
- IT Hardware Concepts (RAID, SAN, x86 architecture, SCSI, FC, ethernet, iLO).
- Base Operating System Installs and Configuration.
- Core OS Services: SSH, Telnet, RDP, FTP, NFS, DNS, DHCP, Serial communication etc.
Please note: Visa sponsorship is not available for this role. For clarity purposes, this means that Oracle is not in a position now, or in the future, to offer US immigration sponsorship. This includes, but is not limited to, support of H-1B, TN, O-1, or F-1 e.g. EAD, OPT, CPT, I-20, F-1 visa stamp etc.
Internal Responsibilities
Key Responsibilities
Datacenter Services Operations–Break/Fix and Hardware Maintenance:
• Oversees lifecycle management for servers and components; sets standards for diagnosis, replacement, and performance validation without impacting critical workloads
• Coaches teams on advanced power distribution practices and complex hardware interventions; validates that work meets reliability, safety, and SLA objectives
Datacenter Services Operations–Network Configuration, Installation, and Augmentation:
• Leads the planning, risk assessment, and execution governance for large-scale network builds and migrations; validates architecture, cabling, and end-to-end connectivity
• Defines acceptance criteria and sign-off gates for network changes; ensures interoperability and error-free integration with upstream and downstream systems
Technical Support and Trouble Shooting:
• Owns the escalation queue and trend analysis; directs complex investigations, establishes fix-forward plans, and ensures high-quality documentation of resolutions within SLAs
• Shapes the ticket taxonomy, triage playbooks, and automation triggers to improve response consistency and speed
Safety and Compliance–Safety, Security, Compliance, and Documentation:
• Governs adherence to SOPs/MOPs and site rules for all high-risk work; ensures alignment with physical security procedures, local regulations, and audit requirements; verifies functional testing of security controls
• Ensures comprehensive documentation, audit trails, and change records; prepares for and leads compliance reviews and remediations
Continuous Improvement:
• Leads post-incident reviews and enterprise-level RCAs; socializes learnings and implements durable design/process changes that reduce risk across the fleet
• Partners with engineering to mitigate availability risks, reduce single points of failure, and improve observability and alert fidelity
Collaboration, Vendor Relations, and Leadership:
• Mentors technicians and specialists; sets expectations for execution quality, safety, and customer focus; builds capability through training and coaching
• Influences staffing and readiness plans for on-call and peak events; models inclusive practices and cross-team collaboration
Core Responsibilities
Planning & Execution - Drives completion of complex and/or ambiguouswork areas in accordance with broad project requirements. Provides leadership to team members on shifting priorities and resources according to team needs.
Collaboration & Partnership - Collaborates across Lines of Business to lead the delivery of impactful work and provide support for critical team objectives. Leverages advanced understanding ofbusiness, stakeholder, and/or customer needs to build partnerships within and outside of team.
Problem Solving - Proactively develops and shares procedures to identify and resolve complex issues, serving as the final non-managerial escalation point for the team.Collects and reviews data and/or information to troubleshoot the most complex errors.
Continuous Learning - Models continuous learning by actively seeking to expand knowledge and learning new skills and/or tools, and staying current with trends and best practices in one's field. Seeks out and leverages feedback and advanced training to refine improve skills.
Continuous Improvement - Recommends, implements, and shares strategies to improve the efficiency and effectiveness of complex processes, protocols, and workflows for own and other teams.
External Responsibilities
Key Responsibilities
Datacenter Services Operations–Break/Fix and Hardware Maintenance:
• Oversees lifecycle management for servers and components; sets standards for diagnosis, replacement, and performance validation without impacting critical workloads
• Coaches teams on advanced power distribution practices and complex hardware interventions; validates that work meets reliability, safety, and SLA objectives
Datacenter Services Operations–Network Configuration, Installation, and Augmentation:
• Leads the planning, risk assessment, and execution governance for large-scale network builds and migrations; validates architecture, cabling, and end-to-end connectivity
• Defines acceptance criteria and sign-off gates for network changes; ensures interoperability and error-free integration with upstream and downstream systems
Technical Support and Trouble Shooting:
• Owns the escalation queue and trend analysis; directs complex investigations, establishes fix-forward plans, and ensures high-quality documentation of resolutions within SLAs
• Shapes the ticket taxonomy, triage playbooks, and automation triggers to improve response consistency and speed
Safety and Compliance–Safety, Security, Compliance, and Documentation:
• Governs adherence to SOPs/MOPs and site rules for all high-risk work; ensures alignment with physical security procedures, local regulations, and audit requirements; verifies functional testing of security controls
• Ensures comprehensive documentation, audit trails, and change records; prepares for and leads compliance reviews and remediations
Continuous Improvement:
• Leads post-incident reviews and enterprise-level RCAs; socializes learnings and implements durable design/process changes that reduce risk across the fleet
• Partners with engineering to mitigate availability risks, reduce single points of failure, and improve observability and alert fidelity
Collaboration, Vendor Relations, and Leadership:
• Mentors technicians and specialists; sets expectations for execution quality, safety, and customer focus; builds capability through training and coaching
• Influences staffing and readiness plans for on-call and peak events; models inclusive practices and cross-team collaboration
Core Responsibilities
Planning & Execution - Drives completion of complex and/or ambiguouswork areas in accordance with broad project requirements. Provides leadership to team members on shifting priorities and resources according to team needs.
Collaboration & Partnership - Collaborates across Lines of Business to lead the delivery of impactful work and provide support for critical team objectives. Leverages advanced understanding ofbusiness, stakeholder, and/or customer needs to build partnerships within and outside of team.
Problem Solving - Proactively develops and shares procedures to identify and resolve complex issues, serving as the final non-managerial escalation point for the team.Collects and reviews data and/or information to troubleshoot the most complex errors.
Continuous Learning - Models continuous learning by actively seeking to expand knowledge and learning new skills and/or tools, and staying current with trends and best practices in one's field. Seeks out and leverages feedback and advanced training to refine improve skills.
Continuous Improvement - Recommends, implements, and shares strategies to improve the efficiency and effectiveness of complex processes, protocols, and workflows for own and other teams.