In this role you will lead the design and development team to build advanced AI applications powered by AI models. You will use AI/ML to automate, optimize, and secure networks, focusing on tasks like self-provisioning, auto-ingesting, auto-qualifying systems and self-healing networks, requiring skills in Python, ML frameworks, training AI models, and an understanding of networking protocols, data center designs, infrastructure as a service, network monitoring and network automation.
Internal Responsibilities
As a Principal AI Networking Developer you will be responsible for building and optimizing large-scale AI systems, ensuring scalability, reliability, and performance. The candidate should be able to work collaboratively with cross-functional teams to drive the development and deployment of AI solutions. If you have a passion for building cutting-edge AI applications and are looking for a challenging role, we encourage you to apply. Strong problem-solving skills, attention to detail, and excellent communication skills are essential for this role.
- Design and implement scalable orchestration for serving and training AI/ML models.
- Explore and incorporate contemporary research on AI, agents, and inference systems into the software stack for designing, monitoring, troubleshooting and deploying networks.
- Evaluate, Integrate, and Optimize technologies across the stack, for latency, throughput, and resource utilization for training and inference workloads.
- Lead initiatives in AI systems design, including Retrieval-Augmented Generation (RAG) and LLM fine-tuning.
- Design and develop scalable services and tools to support GPU-accelerated AI pipelines, Python/Go, and observability frameworks.
Required/Preferred experience:
- Strong Python and ML frameworks (PyTorch, TensorFlow)
- LLMs, embeddings, vector search, RAG pipelines, and fine-tuning
- Data engineering: Spark, Kafka, Flink, OCI Streaming/Data Flow
- Distributed systems and large-scale training/inference
- Handling network telemetry (NetFlow, packet captures, streaming telemetry)
- Network automation frameworks (Terraform, Ansible, NAPALM, Batfish is a
plus) - Containerization, model serving, GPU workflows, CI/CD, and MLOps tools
- Writing design docs, scoping features, and owning delivery end-to-end
Required Education and Work Experience:
BSEE, BSCS, BSCE, or equivalent. MSEE, MSCS, or MSCE is a plus. At least 7+ years of experience building software systems and prior experience building AI applications training models.
External Responsibilities
As a Principal AI Networking Developer you will be responsible for building and optimizing large-scale AI systems, ensuring scalability, reliability, and performance. The candidate should be able to work collaboratively with cross-functional teams to drive the development and deployment of AI solutions. If you have a passion for building cutting-edge AI applications and are looking for a challenging role, we encourage you to apply. Strong problem-solving skills, attention to detail, and excellent communication skills are essential for this role.
- Design and implement scalable orchestration for serving and training AI/ML models.
- Explore and incorporate contemporary research on AI, agents, and inference systems into the software stack for designing, monitoring, troubleshooting and deploying networks.
- Evaluate, Integrate, and Optimize technologies across the stack, for latency, throughput, and resource utilization for training and inference workloads.
- Lead initiatives in AI systems design, including Retrieval-Augmented Generation (RAG) and LLM fine-tuning.
- Design and develop scalable services and tools to support GPU-accelerated AI pipelines, Python/Go, and observability frameworks.
Required/Preferred experience:
- Strong Python and ML frameworks (PyTorch, TensorFlow)
- LLMs, embeddings, vector search, RAG pipelines, and fine-tuning
- Data engineering: Spark, Kafka, Flink, OCI Streaming/Data Flow
- Distributed systems and large-scale training/inference
- Handling network telemetry (NetFlow, packet captures, streaming telemetry)
- Network automation frameworks (Terraform, Ansible, NAPALM, Batfish is a
plus) - Containerization, model serving, GPU workflows, CI/CD, and MLOps tools
- Writing design docs, scoping features, and owning delivery end-to-end
Required Education and Work Experience:
BSEE, BSCS, BSCE, or equivalent. MSEE, MSCS, or MSCE is a plus. At least 7+ years of experience building software systems and prior experience building AI applications training models.