AI2 released BAR, short for Branched Adapters and Routers — a recipe for post-training language models one capability at a time by training domain experts independently on disjoint data slices, then merging the resulting experts into a single mixture-of-experts model for inference. The released arXiv paper, titled Train Separately, Merge Together, frames the method as an answer to a practical problem that has been gnawing at industrial post-training pipelines for more than a year. When a lab wants to upgrade a single capability — say, adding a new long-context SFT set, improving a math reasoning track, or patching a safety behavior — the conventional recipe requires rerunning the full post-training pipeline against the joint objective, because supervised fine-tuning and reinforcement-learning-from-human-feedback objectives interact in ways that make isolated updates regress prior capabilities. BAR decouples the problem: each capability is trained on its own branch using lightweight adapters, each branch is routed to its own expert slot, and the resulting merged MoE model exposes all capabilities at inference without ever having computed the cross-capability joint loss. AI2 reports that this allows independent expert upgrading in production — the practical win is that a team working on the math track does not have to coordinate with the team working on safety or agentic tool use to ship an update. The paper is open, and AI2 confirms the recipe is being applied to the Olmo model family. The importance lies less in a headline benchmark number and more in the operational claim: most large labs have accumulated dozens of post-training datasets, and their current merge strategy is a combination of data mixing, per-task LoRA, and careful rehearsal — none of which scale cleanly. If BAR's claims hold under replication, this is the kind of infrastructure paper that changes how open-weight model families are maintained over time. Early community reaction has centered on whether the MoE routing overhead offsets the avoided compute for joint retraining, and on how BAR's expert isolation interacts with constitutional-AI-style objectives that intentionally span many capability axes.
- {'source': 'arXiv cross-listings', 'text': 'Listed under Agents, Efficiency, RL, cs.LG — indicating the contribution spans multiple research subfields.'}
- {'source': 'AI2 Blog', 'text': 'AI2 published an accompanying blog post framing the contribution as infrastructure for modular post-training.'}