Featherless AI logo

Machine Learning Engineer — Distillation

Featherless AIRemote (world)


No Relocation

Posted: January 22, 2026

Job Description

About the Role

We’re looking for a Machine Learning Engineer focused on model distillation to help us build smaller, faster, and more efficient models without sacrificing quality. You’ll work at the intersection of research and production—taking cutting-edge techniques and turning them into systems that scale.

This is a hands-on role with real ownership: you’ll design distillation pipelines, run large-scale experiments, and ship models used in production.

What You’ll Do

  • Design and implement knowledge distillation pipelines (teacher–student, self-distillation, multi-teacher, etc.)

  • Distill large foundation models into smaller, faster, and cheaper models for inference

  • Run and analyze large-scale training experiments to evaluate quality, latency, and cost tradeoffs

  • Collaborate with research to translate new distillation ideas into production-ready code

  • Optimize training and inference performance (memory, throughput, latency)

  • Contribute to internal tooling, evaluation frameworks, and experiment tracking

  • (Optional) Contribute back to open-source models, tooling, or research

What We’re Looking For

  • Strong background in machine learning or deep learning

  • Hands-on experience with model distillation (LLMs or other neural networks)

  • Solid understanding of training dynamics, loss functions, and optimization

  • Experience with PyTorch (or JAX) and modern ML tooling

  • Comfort running experiments on multi-GPU or distributed setups

  • Ability to reason about model quality vs. performance tradeoffs

  • Pragmatic mindset: you care about shipping, not just papers

Nice to Have

  • Experience distilling LLMs or large sequence models

  • Experience with inference optimization (quantization, pruning, kernels, etc.)

  • Familiarity with evaluation for language models

  • Open-source contributions or research publications

  • Experience in early-stage or fast-moving startups

Why Join

  • Work on core model quality and cost efficiency—not side projects

  • High ownership and direct impact on product and roadmap

  • Small, senior team with strong research + engineering culture

  • Competitive compensation + meaningful equity

  • Remote-friendly, async-first environment

Additional Content

About the Role

We’re looking for a Machine Learning Engineer focused on model distillation to help us build smaller, faster, and more efficient models without sacrificing quality. You’ll work at the intersection of research and production—taking cutting-edge techniques and turning them into systems that scale.

This is a hands-on role with real ownership: you’ll design distillation pipelines, run large-scale experiments, and ship models used in production.

What You’ll Do

  • Design and implement knowledge distillation pipelines (teacher–student, self-distillation, multi-teacher, etc.)

  • Distill large foundation models into smaller, faster, and cheaper models for inference

  • Run and analyze large-scale training experiments to evaluate quality, latency, and cost tradeoffs

  • Collaborate with research to translate new distillation ideas into production-ready code

  • Optimize training and inference performance (memory, throughput, latency)

  • Contribute to internal tooling, evaluation frameworks, and experiment tracking

  • (Optional) Contribute back to open-source models, tooling, or research

What We’re Looking For

  • Strong background in machine learning or deep learning

  • Hands-on experience with model distillation (LLMs or other neural networks)

  • Solid understanding of training dynamics, loss functions, and optimization

  • Experience with PyTorch (or JAX) and modern ML tooling

  • Comfort running experiments on multi-GPU or distributed setups

  • Ability to reason about model quality vs. performance tradeoffs

  • Pragmatic mindset: you care about shipping, not just papers

Nice to Have

  • Experience distilling LLMs or large sequence models

  • Experience with inference optimization (quantization, pruning, kernels, etc.)

  • Familiarity with evaluation for language models

  • Open-source contributions or research publications

  • Experience in early-stage or fast-moving startups

Why Join

  • Work on core model quality and cost efficiency—not side projects

  • High ownership and direct impact on product and roadmap

  • Small, senior team with strong research + engineering culture

  • Competitive compensation + meaningful equity

  • Remote-friendly, async-first environment