Lead Reserach Scientist, AI Safety and Reinforcement Learning

AMD • Full-time • Santa Clara, California, United States • 3w ago

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.Together, we advance your career.

THE ROLE:

We are hiring an Lead Research Scientist, AI Safety and Reinforcement Learning focused on recursive self-improvement (RSI) in a bounded, engineering-first sense: systems where models, data generators, or toolchains participate in improving their own training signals, curricula, or verification—always under explicit governance, kill switches, and human oversight. You will research when such loops help (e.g. synthetic data quality, targeted self-play, automated curriculum refinement) versus when they amplify bias or reward hacking, and you will design measurement and containment so RSI-style pipelines remain auditable and safe for AMD’s AI-for-HW and generative-AI programs.

THE PERSON:

You are skeptical by default but constructive: you formalize assumptions, bound autonomy, and insist on counterfactual evaluation. You connect RSI concepts to concrete metrics—data efficiency, robustness, regression rates—not open-ended capability claims.

KEY RESPONSIBILITIES:

Research self-improving training loops: model-generated supervision, iterative distillation, self-critique, and automated curriculum updates with clear scope limits
Develop theory- and systems-grounded evaluations for capability drift, Goodhart effects, and distributional shift in closed-loop training
Partner with RL scientists on where RSI-style objectives intersect policy optimization and preference learning
Define red-team protocols and monitoring for RSI pilots; document rollback criteria before experiments touch shared infrastructure
Publish or produce technical reports where appropriate; align internal narrative with responsible deployment standards

PREFERRED EXPERIENCE:

Strong background in machine learning (ML), AI safety, reinforcement learning, or a related field, with publications or substantial work in iterative training, self-training, or open-ended learning.
Experience with empirical safety evaluation, scalable oversight, or stress-testing of generative model training pipelines
Strong software skills for building controlled experimental harnesses and reproducible RSI microcosms

ACADEMIC CREDENTIALS:

PhD in Computer Science, Machine Learning, or related field strongly preferred.

#LI-BM1

#LI-Hybrid

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

Related Jobs

Apply