The Role:
At General Motors, the Cloud Platform team is key to enabling our organizations to safely and efficiently adopt the cloud. Our team builds secure, reliable and cost effective services that enable our peers to confidently deliver products; we ensure that best practices are baked in and effortless. As Site Reliability Engineers s, the team also use our expertise to contribute to General Motor’s transition to a reliability-first software organisation, by collaborating with our peers to deliver Production Readiness Reviews, SLOs, and Incident Reviews across teams. By joining us, you will have a rare opportunity to contribute to building a multi-cloud Platform as a Service that enables millions of General Motors customers to interact with their vehicles and services.
This is an SRE-flavored software engineering role, and we are looking for individuals who are passionate about building reliable, observable services and understanding best practices for running services in the cloud. The Cloud Platform team owns our platform end to end, delivering new features, supporting our peers, and ensuring we meet our SLAs.
As an engineer you will participate in a scrum team to deliver high quality software to production. You’ll be involved in designing features, working with peer teams to help them use our services effectively, and be part of our 24/7 support process.
What You’ll Do:
-
Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring.
-
Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents.
-
Responding to incidents, conducting root cause analysis, participating in post incident review, and implementing corrective actions to prevent similar incidents in the future.
-
Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation.
-
Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems.
Your Skills & Abilities (Required Qualifications)
-
Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems.
What Will Give You A Competitive Edge (Preferred Qualifications):
-
Experience with Git/source code management, CI/CD development, open-source development.
-
Experience of event driven architectures or services such as Kafka.
-
Hands-on experience in Infrastructure as Code tools like Terraform, Terragrunt, Azure Resource Manager (ARM) templates, YAML pipelines, or Bicep.
-
Working knowledge of AWS and Azure services such as Event Hubs, or AKS/EKS.
-
Experience of observability using OpenTelemetry, Prometheus or services such as DataDog.
-
Kubernetes experience, including app deployment, service meshes such as Istio or Consul, networking.
-
Knowledge of Cloud networking i.e. VCP, VNET, Subnet, DNS, Load Balancer, including troubleshooting, diagnostics.
#LI-KL2