Manager - AI/LLMOps Forward Deployed Engineer

Andersen Tax LLC • Full-time • New York, New York, United States • 3w ago

Application Notice

We encourage you to apply thoughtfully by selecting one position that best matches your qualifications and interests. You may submit up to two active applications at a time. Please consider your location choice carefully—we recommend applying where you envision building your future.

The Firm

A New Era of AI-Driven, Multidimensional Consulting

Today’s businesses face complex global challenges that demand more than conventional consulting services. Andersen Consulting offers a seamless, multidimensional approach that combines expertise in business transformation, artificial intelligence, cybersecurity, sustainability, and digital strategy with Andersen Global’s established tax and legal capabilities. This integration of emerging technologies positions Andersen Consulting as the partner of choice in the $1 trillion consulting industry.

The Role

At Andersen Consulting, AI/LLMOps FDEs own the technical delivery of production AI systems end-to-end and sit across the table from engineering teams at global enterprises. The practice is early. The engineers who join now will define its architecture, its standards, and what world-class AI delivery looks like for clients who trust us with their most consequential systems.

Enterprise AI programs follow a predictable arc: a successful POC, a budget approval, and then months of stalled production work before the initiative quietly dies. The gap between a notebook that impresses stakeholders and a system that runs the business under real enterprise load is an engineering problem. Its name is LLMOps.

LLMs have been serious enterprise tools for roughly three years. We are not looking for someone who has mastered a stable field. We are looking for someone who has been in the room while the field was being invented: someone who built RAG pipelines that broke in production, debugged agent loops that silently degraded after staging, and designed eval suites from scratch because nothing off the shelf measured what actually mattered.

What You'll Do

Day-to-day, you embed directly with client teams and own a defined technical workstream from design through production:

Design and deploy RAG pipelines using pgvector, Pinecone, Qdrant, or Weaviate, with deliberate chunking strategies, hybrid search, and re-ranking layers.
Build multi-step agent workflows using LangChain, LangGraph, LlamaIndex, or the Anthropic Claude SDK, with tool use, structured outputs, and memory.
Implement guardrails at the correct system boundary: output schema enforcement, PII detection, content filtering, and agent behavior constraints calibrated for each client's regulatory context.
Design eval suites in using off-the-shelf tools or custom Python harnesses that measure retrieval quality, answer faithfulness, hallucination rates, and latency under load.
Instrument observability from day one: LLM call latency, retrieval quality metrics, agent decision traces, cost per query.
Write production Python: typed, tested, linted, and deployable by another engineer.
Deploy and serve models on AWS (Bedrock, SageMaker, EKS), Azure (Azure OpenAI Service, AKS), or GCP (Vertex AI, GKE) under latency-sensitive enterprise conditions.
Treat prompt and context engineering as an engineering discipline: versioned system prompts, few-shot libraries, chain-of-thought elicitation, and context window budgeting, all of it tracked, tested, and iterated rather than adjusted informally.
Build and deploy MCP servers to expose enterprise data sources, internal APIs, and tools to LLM agents in a standardized, auditable way.
Contribute reusable accelerators, reference architectures, and internal tooling during bench time. Every asset you build should make the next engagement start faster
Travel is a real part of this role. Client work regularly requires on-site presence during discovery, architecture reviews, and go-live.

What You'll Build

Production RAG pipelines with hybrid search, re-ranking, and query-time monitoring, designed to hold up under real query distributions and corpus drift, not just benchmark datasets.
Multi-step agent orchestration systems with tool use, memory, and structured output validation, built to be reliable at enterprise load, not just demonstrable in a demo environment.
Eval frameworks designed from scratch to measure what the client's system actually needs to get right: faithfulness, groundedness, latency percentiles, and failure mode frequency.
Guardrail infrastructure positioned correctly in the call stack: input validation, output schema enforcement, and behavioral constraints for the specific regulatory context of each engagement.
Observability stacks instrumented at the LLM call, retrieval, and agent decision layer, giving clients operational visibility instead of log files they can't act on.
MCP servers exposing internal enterprise systems (databases, document stores, internal APIs) to LLM agents through a standardized, auditable interface.
CI/CD pipelines for LLM systems with automated eval runs on prompt changes, model version regression testing, and deployment gating before anything reaches production.

The Requirements

7+ years of total experience in data engineering, MLOps, software engineering, or closely adjacent infrastructure roles, with evidence of end-to-end ownership: architecture through production, including the post-launch work.
2–3 years of LLM-specific engineering experience is credible at this level. That is the ceiling given when the field started, not a floor. What distinguishes senior candidates is judgment: knowing when an agent architecture is the right tool, how to design for observability before it is needed, and how to have the hard conversation when a POC is not production-ready.
Production Python. Typed, tested, and structured for multi-engineer deployment. We will ask about your testing practices and code review standards.
At least one production LLM system shipped: a RAG pipeline, agent workflow, or LLM-powered application that handled real enterprise load, not just internal demos. The key question is what broke after launch and what you did about it.
Hands-on vector retrieval experience with pgvector, Pinecone, Qdrant, Weaviate, or Chroma, including hybrid search design, embedding quality diagnosis, and re-ranking strategy selection, not just initial setup.
Working knowledge of at least one agentic SDK (Anthropic Claude SDK, LangChain, LangGraph, LlamaIndex, or OpenAI SDK) in a non-trivial use case involving tool use, memory, or structured output handling.
Cloud AI infrastructure depth in at least one of AWS (Bedrock, SageMaker, EKS), Azure (Azure OpenAI Service, AKS), or GCP (Vertex AI, GKE).
LLM observability experience: you have instrumented a system before, not just read the documentation.
Enough communication clarity to explain an architecture decision to an engineering team and a business stakeholder in the same meeting. These are client-facing roles.
Travel up to 50%

Preferred Qualifications

Model serving experience in latency-constrained environments.
Guardrail implementation experience in regulated data contexts: PHI, PII, or MNPI.
RAGAS or custom eval harness experience.
MCP server development. Still rare enough that hands-on experience is high signal.
Palantir Foundry or AIP experience, a meaningful differentiator as Andersen's Palantir practice scales alongside the LLMOps practice.
Prior client-facing or consulting experience. FDEs who understand stakeholder management and expectation-setting ramp faster and derisk engagements earlier.
Domain exposure in financial services (insurance, reinsurance, credit), healthcare, supply chain, or manufacturing.

Compensation and Benefits

Our firm offers a competitive base salary and comprehensive benefits package designed to support the well-being, growth, and long-term success of our people. We are committed to recognizing individual contributions and providing resources that enable our employees to thrive both personally and professionally.

Salary Range: For individuals hired to work in New York, the expected salary range for this role is $195,875 - 266,171. Actual compensation will be determined based on the candidate’s qualifications, experience, and skill set.

Benefits: Employees (and their families) are eligible for medical, dental, vision, and basic life insurance coverage. Employees may enroll in the firm’s 401(k) plan upon hire. We offer 200 hours of paid time off annually, along with twelve paid holidays each calendar year. For a full listing of benefit offerings, please visit https://www.andersen.com/careers.

Applicants must be currently authorized to work in the United States on a full-time basis upon hire. Andersen will not consider candidates for this position who require sponsorship for employment visa status now or in the future (e.g., H-1B status).

Andersen Tax is an equal opportunity employer committed to fostering an inclusive workplace. We evaluate all applicants and employees without regard to race, color, religion, national origin, ancestry, sex (including pregnancy, childbirth, and related medical conditions), sexual orientation, gender identity or expression, age, disability, genetic information, marital status, military or veteran status, or any other characteristic protected under applicable federal, state, or local law. All qualified applicants, including those with criminal histories, will be considered in a manner consistent with applicable law. We provide reasonable accommodations to qualified individuals with disabilities as required by law.

ANDERSEN TAX LLC NOTICE FOR JOB APPLICANTS