Bagel Labs
Bagel Labs
Bagel Labs is an AI research lab that trains frontier diffusion models across commodity hardware instead of one uniform GPU cluster. Our method, Distributed Diffusion Models or DDM, trains many smaller expert models independently with no gradient synchronization, then combines them at inference with a lightweight router. Paris-1 proved the idea on images and Paris-2 proved it on video, where three 11B experts and a router beat a monolithic baseline trained on the same compute by more than 50% on FVD. We are now applying DDM to physical AI, where the training runs are long, expensive, and easy to break.
We ignore years of experience and pedigree. If you have strong systems taste and can make messy research infrastructure reliable under pressure, we want to hear from you. Every requirement below is flexible for someone with the engineering judgment to back it up.
You will build the systems layer that turns frontier research into results we can trust. Training across mixed hardware with no gradient sync breaks the usual playbook, so much of this infrastructure does not exist yet and you will invent it. The work spans distributed training, GPU orchestration, observability, benchmark harnesses, experiment tracking, and data and model pipelines. Physical AI workloads are the focus, but the core skill is building high leverage ML systems that researchers actually want to use.
You have made messy experiments reliable before, maybe in ML infrastructure, distributed training, research engineering, GPU systems, or data infrastructure. Strong candidates come from large model training, video generation, inference systems, infrastructure startups, or research teams where prototypes had to grow into dependable systems. You think about the researcher on the other side of your tools, and you build things they will actually use.
Bagel Labs
Bagel Labs
Range