Network Developer 4

Oracle Corporation • Seattle, WA, United States, US • 3d ago

We are the AI Infrastructure - Network Operations team at OCI. We support and operate the RDMA/RoCE network fabrics for OCI's largest AI and HPC customers. These fabrics are the foundation underneath OCI's AI, GPU and HPC services, and support major tier-0 vendors in the generative AI industry. If you're running an AI workload at OCI, we're running the RDMA network underneath your workload.

A Principal Network Engineer on our team supports the design, deployment, and operations of a large-scale global Oracle cloud computing environment (Oracle Cloud Infrastructure - OCI). Primarily focused on operation and support of RDMA/RoCE network fabrics and systems, through a combination of a deep network understanding and automation skills to operate a production environment. As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure and the Internet.

Internal Responsibilities

Key Responsibilities

Lead network lifecycle management initiatives by defining technical objectives, delivery plans, and implementation procedures for large-scale network infrastructure projects.
Translate high-level network architectures into detailed designs and deployment plans while ensuring scalability, reliability, and operational readiness.
Serve as the technical lead for moderately complex network projects, coordinating the efforts of multiple engineers across design, deployment, automation, and operational support.
Design, implement, and support network solutions across data center, backbone, cloud, and service provider environments.
Partner with service owners and infrastructure teams to ensure network solutions are fully integrated with monitoring, observability, automation, and operational support systems.
Act as a Tier 2 and specialized escalation point for network incidents, driving root cause analysis, corrective actions, and long-term reliability improvements.
Lead the investigation and resolution of complex network issues and large-scale service-impacting events.
Develop automation solutions, tools, and scripts to improve operational efficiency, network reliability, deployment consistency, and incident response.
Contribute to the design and delivery of network automation frameworks and operational tooling.
Collaborate closely with product teams, program managers, network leadership, and PMO organizations to align infrastructure capabilities with product and service requirements.
Partner with vendor engineering teams and account managers to troubleshoot issues, evaluate new technologies, and drive operational improvements.
Participate in hardware evaluations, RFQ/RFP processes, and adoption of new networking technologies and platforms.
Drive technology decisions that support business, product, and service objectives.
Mentor junior engineers through technical guidance, troubleshooting support, design reviews, and knowledge sharing.
Contribute to engineering best practices, documentation standards, operational excellence, and continuous improvement initiatives.

Preferred Skills & Experience

Strong experience with large-scale network operations, design, and troubleshooting.
Expertise in routing and switching technologies, including BGP, OSPF, EVPN-VXLAN, MPLS, and data center networking.
Experience with network automation using Python, Ansible, APIs, or similar technologies.
Strong understanding of observability, monitoring, telemetry, and incident management.
Experience working with cloud infrastructure, hyperscale environments, or large-scale distributed systems.
Ability to lead technical projects and influence outcomes across multiple teams.
Strong written and verbal communication skills with the ability to work effectively across engineering, operations, and leadership teams.

External Responsibilities

Key Responsibilities

Lead network lifecycle management initiatives by defining technical objectives, delivery plans, and implementation procedures for large-scale network infrastructure projects.
Translate high-level network architectures into detailed designs and deployment plans while ensuring scalability, reliability, and operational readiness.
Serve as the technical lead for moderately complex network projects, coordinating the efforts of multiple engineers across design, deployment, automation, and operational support.
Design, implement, and support network solutions across data center, backbone, cloud, and service provider environments.
Partner with service owners and infrastructure teams to ensure network solutions are fully integrated with monitoring, observability, automation, and operational support systems.
Act as a Tier 2 and specialized escalation point for network incidents, driving root cause analysis, corrective actions, and long-term reliability improvements.
Lead the investigation and resolution of complex network issues and large-scale service-impacting events.
Develop automation solutions, tools, and scripts to improve operational efficiency, network reliability, deployment consistency, and incident response.
Contribute to the design and delivery of network automation frameworks and operational tooling.
Collaborate closely with product teams, program managers, network leadership, and PMO organizations to align infrastructure capabilities with product and service requirements.
Partner with vendor engineering teams and account managers to troubleshoot issues, evaluate new technologies, and drive operational improvements.
Participate in hardware evaluations, RFQ/RFP processes, and adoption of new networking technologies and platforms.
Drive technology decisions that support business, product, and service objectives.
Mentor junior engineers through technical guidance, troubleshooting support, design reviews, and knowledge sharing.
Contribute to engineering best practices, documentation standards, operational excellence, and continuous improvement initiatives.

Preferred Skills & Experience

Strong experience with large-scale network operations, design, and troubleshooting.
Expertise in routing and switching technologies, including BGP, OSPF, EVPN-VXLAN, MPLS, and data center networking.
Experience with network automation using Python, Ansible, APIs, or similar technologies.
Strong understanding of observability, monitoring, telemetry, and incident management.
Experience working with cloud infrastructure, hyperscale environments, or large-scale distributed systems.
Ability to lead technical projects and influence outcomes across multiple teams.
Strong written and verbal communication skills with the ability to work effectively across engineering, operations, and leadership teams.

Apply