Job Description
The Program and Purchase Cost Optimization (PPCO) team at General Motors is seeking a highly motivated and technically skilled Data Analytics engineer to lead analytics initiatives that enable product and program level cost optimization through robust data engineering, exploratory data analysis (EDA), and predictive modeling.
This role sits within the PPCO Systems and Data Analytics teams and focuses on building and maintaining the data foundations that power cost analytics initiatives across GM Finance. You will work with data from across the GM ecosystem, identifying interconnections between multiple enterprise applications from different functional areas (e.g., engineering, purchasing, finance, program management) and designing scalable data structures and pipelines that turn disparate and unstructured data into trusted, analysis-ready assets.
The ideal candidate has deep experience in data engineering, ETL/ELT, database and table management, and EDA, combined with strong skills in classical predictive modeling. You will use current data to understand behaviors and to predict product attributes such as manufacturing parameters, product cost, program development cost creep, and economic factors based on parameters and features extracted from existing data.
You will partner closely with Data Analysts, IT and cross-functional stakeholders to design and implement data architectures, curate high-quality datasets, and develop predictive models that deliver meaningful, actionable insights to improve vehicle profitability and program performance.
Responsibilities
- Design and maintain data models and curated datasets (relational schemas, dimensional models, and feature characterizations) that support PPCO analytics and efficient downstream consumption for reporting in tools such as Power BI (while not being primarily responsible for dashboard design).
- Query, integrate, and engineer data from multiple enterprise systems and relational databases, and design, build, and operate robust ETL/ELT pipelines (e.g., Databricks/Spark) to produce unified, analysis-ready datasets with appropriate data quality checks, validation, and monitoring.
- Perform in-depth EDA on large, heterogeneous datasets, engineer derived features (characterizations, aggregations, encodings), and understand data distributions, identify anomalies, and uncover data quality risks and insight to address by the business team.
- Develop and validate predictive machine learning models and descriptive models (regression, clustering, decision trees, random forests, gradient boosting, time-series/panel models), utilizing existing data to predict product attributes (geometries, cost, economic indicators) from known input parameters and historical patterns.
- Enable system integration across the GM ecosystem by designing the data pipelines, workflow automation and data transformation to take data from one system and turning it into a consumable format for the destination system.
- Automate and optimize existing cost engineering workflows by building scalable, Python-based data and analytics tools.
- Work across functional boundaries, including Engineering, Program Management, R&D, Finance, and Purchasing, to understand data sources, business logic, and use case
Required Qualifications
- Ability to translate ambiguous business questions into well-defined analytical and data engineering problems, and communicate findings and recommendations in a clear, structured manner to technical and non-technical stakeholders.
- Bachelor’s degree in computer science, engineering, statistics, mathematics, physics, or a related quantitative field (advanced degree preferred).
- 5+ years as a data scientist / research scientist / ML engineer (or ~2 years with MS).
- Advanced Python proficiency (e.g., pandas, NumPy, scikit-learn, PySpark).
- Advanced SQL proficiency.
- Experience with Databricks, Spark, and/or other cloud-based data platforms for large-scale data processing.
- Experience designing and implementing ETL/ELT pipelines that integrate data from multiple transactional and analytical systems.
- Strong skills in EDA to understand data quality, structure, and relationship
- Practical experience developing predictive models and machine learning (regression methods, Clustering and segmentation, random forests, gradient boosting)
- Solid problem-solving skills and the ability to convert business questions into data and modeling problems with clear hypotheses and success criteria.
Preferred Qualifications
- Experience working with automotive, manufacturing, or engineering data.
- Exposure to advanced ML/AI techniques (e.g., deep learning, NLP, LLMs) is a plus but not required; the primary focus of this role is data engineering, EDA, and classical predictive modeling in support of PPCO’s cost optimization mission.
Location: Hybrid. This role is categorized as hybrid. This means the successful candidate is expected to report to the - Global HQ Warren MI three times per week, at minimum [or other frequency dictated by the business]
Relocation: This role is NOT eligible for relocation benefits.
GM DOES NOT PROVIDE IMMIGRATION-RELATED SPONSORSHIP FOR THIS ROLE. DO NOT APPLY FOR THIS ROLE IF YOU WILL NEED GM IMMIGRATION SPONSORSHIP NOW OR IN THE FUTURE. THIS INCLUDES DIRECT COMPANY SPONSORSHIP, ENTRY OF GM AS THE IMMIGRATION EMPLOYER OF RECORD ON A GOVERNMENT FORM, AND ANY WORK AUTHORIZATION REQUIRING A WRITTEN SUBMISSION OR OTHER IMMIGRATION SUPPORT FROM THE COMPANY (e.g., H-1B, OPT, STEM OPT, CPT, TN, J-1, etc.)