EXPERIMENT

RetroChimera

Planning and conducting chemical syntheses remains a major bottleneck in the discovery of functional small molecules – holding back the potential for generative AI in molecular design.

RetroChimera is a model that takes as input a target molecule that one wants to synthesize, encoded as a sequence of characters (using the SMILES notation), and produces several potential chemical reactions which could be used to produce that input molecule. Each reaction is represented as a group of ingredients (reactant molecules), and those molecules are represented in the same format as the input.

Inspired by how chemists use different strategies to ideate reactions, RetroChimera uses a framework for building highly accurate reaction models that combine predictions from diverse sources with complementary inductive biases using a learning-based ensembling strategy.

Through experiments across several orders of magnitude in data scale and time-splits, researchers showed that RetroChimera outperforms all major models by a large margin, owing both to the good individual performance of its constituents, but also to the scalability of the ensembling strategy. Moreover, they found that PhD-level organic chemists prefer predictions from RetroChimera over the reactions it was trained on in terms of quality.

RetroChimera is now available for experimental purposes on Azure AI Foundry.