Career Area:
Technology, Digital and Data
Job Description:
Your Work Shapes the World at Caterpillar Inc.
When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it.
Senior Engineering Manager – Data, ML, DevOps & AI Ops
About the Role
We are seeking a Senior Engineering Manager to lead our Data, ML, DevOps, and AI Ops capabilities, driving the design, development, deployment, and intelligent operation of enterprise-scale data platforms, machine learning systems, and cloud-native infrastructure.
This role is accountable for operationalizing data and AI at scale—ensuring reliability, performance, security, and continuous optimization across data pipelines, ML platforms, application infrastructure, and production environments. You will enable advanced analytics, AI-driven applications, and digital transformation initiatives by embedding automation, observability, and AI-powered operations into the core engineering ecosystem.
You will lead a multidisciplinary organization spanning Data Engineering, ML Engineering, Platform Engineering, DevOps, and AI Ops, and play a critical role in enabling real-time insights, predictive intelligence, resilient platforms, and intelligent automation across the enterprise.
Key Responsibilities
Leadership & Strategy
- Provide strategic direction and technical leadership across Data Ops, ML Ops, DevOps, and AI Ops, fostering a culture of engineering excellence, automation, and operational rigor.
- Define and execute the end-to-end platform strategy spanning data pipelines, ML lifecycle, CI/CD, infrastructure, and intelligent operations.
- Partner with executive leadership on technology roadmaps, platform modernization, vendor strategy, and emerging capabilities in AI, DevOps, and cloud platforms.
Data, ML & Platform Engineering
- Architect and scale cloud-native data platforms supporting real-time and batch ingestion, transformation, analytics, and AI workloads.
- Drive ML Ops best practices for model training, deployment, monitoring, retraining, and governance across the full model lifecycle.
- Ensure seamless integration of data platforms, ML services, and application ecosystems.
DevOps & Platform Reliability
- Establish and mature DevOps practices, including CI/CD pipelines, infrastructure-as-code, automated testing, and release management for data, ML, and application platforms.
- Ensure high availability, performance, scalability, and cost efficiency across cloud infrastructure and platform services.
- Embed SRE principles, SLIs/SLOs, and resilience engineering into platform operations.
AI Ops & Intelligent Operations
- Lead the adoption of AI Ops capabilities for proactive monitoring, anomaly detection, incident correlation, root cause analysis, and predictive remediation.
- Integrate observability signals (logs, metrics, traces, events) across data, ML, and application platforms to enable intelligent, self-healing systems.
- Drive automation to reduce manual operational overhead and improve MTTR, reliability, and platform insights.
Governance, Security & Compliance
- Establish enterprise standards for data governance, model governance, security, privacy, and compliance across platforms.
- Ensure platforms meet enterprise, regulatory, and cybersecurity requirements by design.
Collaboration & Talent
- Collaborate with data scientists, product teams, architects, and business stakeholders to translate AI and platform strategies into production-ready solutions.
- Lead talent development, hiring, and organizational design, building a high-performing, globally scalable engineering organization.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 15+ years of experience in software, data, or platform engineering, with 5+ years in senior engineering leadership roles.
- Strong expertise across Data Engineering, ML Ops, DevOps, and production platform operations.
- Hands-on experience with cloud platforms (AWS, Azure, or GCP) and container orchestration (Docker, Kubernetes).
- Proven experience with CI/CD pipelines, infrastructure-as-code (Terraform, ARM, CloudFormation), and automation frameworks.
- Solid understanding of streaming and data platforms (Kafka, Spark, Flink) and ML Ops tooling (MLflow, Kubeflow, SageMaker).
- Experience driving platform reliability, security, governance, and compliance at enterprise scale.
- Strong leadership, communication, and stakeholder management skills.
Preferred Qualifications
- Experience with AI Ops platforms, intelligent observability, and incident automation.
- Exposure to feature stores, model registries, real-time inference, and event-driven architectures.
- Knowledge of SRE practices, error budgets, and resilience engineering.
- Familiarity with GPU acceleration, distributed training, and high-performance computing.
- Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and log analytics platforms.
- Contributions to open-source projects or published work in data platforms, ML Ops, DevOps, or AI Ops.
Why Join Us?
- Lead enterprise-critical platforms at the intersection of Data, AI, DevOps, and Intelligent Operations.
- Shape how AI is built, deployed, and operated at scale, not just experimented with.
- Influence platform strategy and engineering culture across a global organization.
- Competitive compensation, flexible work options, and strong career growth opportunities.
Posting Dates:
March 13, 2026 - March 27, 2026
Caterpillar is an Equal Opportunity Employer. Qualified applicants of any age are encouraged to apply
Not ready to apply? Join our Talent Community.