Senior Staff Machine Learning Engineer, Data & Eval

Remote, USA • Full-time • Posted 2026-05-31

In this Senior Staff role, you will set technical direction and lead execution for ML evaluation and the end-to-end data flywheel powering CSxAI products (e.g., assistive agents, issue resolution, and tooling).
Your work will define how we measure quality, how we turn feedback into learning signals, and how we continuously improve models and products safely and efficiently.
You will partner closely with product, engineering, design, operations to build evaluation systems that are trusted, scalable, and actionable - connecting offline metrics to online outcomes.
Work with large scale structured and unstructured data; explore, experiment, build and continuously improve Machine Learning models and pipelines for Airbnb product, business and operational use cases.
Work collaboratively with cross-functional partners including product managers, operations and data scientists, to identify opportunities for business impact; understand, refine, and prioritize requirements for machine learning, and drive engineering decisions.
Hands-on develop, productionize, and operate Machine Learning models and pipelines at scale, including both batch and real-time use cases.
Leverage third-party and in-house Machine Learning tools & infrastructure to develop reusable, highly differentiating and high-performing Machine Learning systems, enable fast model development, low-latency serving and ease of model quality upkeep.

Educational Background: PhD in Computer Science, Mathematics, Statistics, or related technical field (or equivalent practical experience).
Industry Experience: 10+ years building, testing, and shipping ML/AI systems end-to-end; including 2+ years of experience with GenAI/LLM systems in production.
Leadership Experience: 5+ years leading large, ambiguous technical initiatives as a senior IC, influencing roadmap and engineering/science direction across teams.
Technical Proficiency:
Deep expertise in evaluation methodology (offline/online alignment, metric design, human-in-the-loop evaluation, A/B testing, power analysis, regression testing).
Hands-on experience with GenAI systems, including orchestration, retrieval, tool calling, memory, etc.
Experience building data pipelines and quality systems (labeling workflows, dataset curation, versioning, monitoring, and governance).
Solid ML fundamentals and best practices (model selection, training/serving, monitoring, reliability, and model lifecycle management).

This role may also be eligible for bonus, equity, benefits, and Employee Travel Credits.

Apply tot his job

Apply To this Job

Similar Jobs