The AI Validation Platform team owns the cloud-agnostic, reliable, and cost-efficient platform that powers GM’s AV efforts. We’re proud to serve as the infrastructure platform for teams developing autonomous vehicles (L3/L4/L5). Our platform supports the simulated validation of state-of-the-art (SOTA) machine learning models, with a focus on performance, availability, concurrency, and scalability. We enable rapid innovation and development by prioritizing high-impact, ML-centric use cases.
We are seeking a Senior ML Infrastructure engineer to help build and scale robust Compute platforms for Simulation workflows. In this role, you will focus on scaling, driving efficiency, and high utilization of cutting-edge GPUs, while also leveling up the platform’s reliability. The successful candidate will have experience building and running scalable distributed systems. They will rapidly test and promote ideas, have strong problem-solving skills, and demonstrate a bias for action.
You will play a key role in shaping the architecture, roadmap, and user experience of a robust service supporting our AI Validation / Simulation needs. The ideal candidate brings experience in designing distributed systems, strong problem-solving skills, and a get-it-done attitude. This is a high-impact opportunity to influence the future of AI infrastructure at GM.
-
Collaborate with Simulation engineers, ML engineers and researchers to understand critical workflows, parse them to platform requirements, and deliver incremental value.
-
Lead technical decision-making on Compute architecture, cloud capacity provisioning, caching, and auto-scaling mechanisms.
-
Drive the development of monitoring, observability, and metrics to ensure reliability, performance, and resource optimization.
At a Minimum We'd Like You To Have