Principal Research Scientist – Scaling
- Description:
- Lead and grow a multidisciplinary research team focused on LLM scaling, efficiency, and systems performance.
- Define and execute the scaling research roadmap in alignment with Databricks’ strategic objectives.
- Drive algorithmic innovations for large-scale training and inference, including optimizers, low-precision techniques, and model adaptation methods.
- Oversee the design and execution of large-scale experiments and benchmark results against state-of-the-art methods.
- Optimize distributed training, parallelism, memory management, and hardware utilization in collaboration with systems and infrastructure teams.
- Translate research breakthroughs into customer-facing capabilities in the Databricks AI platform.
- Establish metrics, evaluation protocols, and best practices for scaling-focused research and drive adoption across the organization.
- Champion responsible deployment by ensuring model behavior, reliability, and safety remain first-class considerations.
- Work hands-on with the team to develop high-quality Python and PyTorch code for research, prototyping, and production integration.
- Mentor and develop research scientists and engineers through technical guidance and career support.
- Requirements:
- Proven ability to lead a research team developing novel techniques for foundation model efficiency or related topics.
- Strong track record of industry impact.
- Deep expertise in at least one of: generative AI, LLMs, distributed ML systems, model optimization, or responsible AI.
- Strong emphasis on scaling and efficiency for large-scale neural networks.
- Strong programming skills and demonstrated ability to write high-quality, efficient code in Python and PyTorch.
- Demonstrated ability to translate research innovation into scalable product capabilities with product and engineering teams.
- Excellent communication, leadership, and stakeholder management skills.
- Experience influencing cross-functional roadmaps and aligning research with business impact.
- Prior work at the intersection of systems and ML, such as distributed training frameworks, compiler and kernel optimization, or memory-/compute-efficient model design (preferred).
- Strong industry and academic network in large-scale ML, with ongoing collaborations or conference service such as PC or area chair roles (preferred).
- First-author publications at top ML/systems conferences such as ICLR, ICML, NeurIPS, or MLSys, or influential open-source contributions / widely used deployed systems, especially in optimization or efficiency (preferred).
- Benefits:
- Competitive base salary range of $280,000 to $350,000 USD.
- Eligibility for an annual performance bonus.
- Eligibility for equity as part of the total compensation package.
- Comprehensive benefits and perks offered regionally.
- Compensation may be adjusted based on skills, experience, certifications, training, and work location.
Apply tot his job
Apply To this Job