Researchers developed an AI system called ProteinSGM, which uses generative diffusion to create entirely new therapeutic proteins. The model learns from image representations to generate fully new proteins that appear to be biophysically real, meaning they fold into configurations that enable them to carry out specific functions within cells. The researchers hope the system will help advance the field of generative biology, promising to speed drug development by making the design and testing of entirely new therapeutic proteins more efficient and flexible.
ProteinSGM, the new system developed by researchers at the University of Toronto, is poised to transform the field of generative biology. With its ability to generate entirely new therapeutic proteins more efficiently and flexibly, ProteinSGM could accelerate drug development in unprecedented ways. Philip M. Kim, a professor in the Donnelly Centre for Cellular and Biomolecular Research at U of T’s Temerty Faculty of Medicine, explains that “our model learns from image representations to generate fully new proteins, at a very high rate.” The proteins produced by ProteinSGM appear to be biophysically real, folding into configurations that enable them to carry out specific functions within cells.
The findings of the study were published in the journal Nature Computational Science, marking the first of their kind in a peer-reviewed journal. The research team also published a pre-print of the model last summer through the open-access server bioRxiv. While this system represents a significant breakthrough, it is not the only one of its kind. Two similar pre-prints were published in December, RF Diffusion by the University of Washington and Chroma by Generate Biomedicines.
Proteins are made up of chains of amino acids that fold into three-dimensional shapes, dictating protein function. These shapes evolved over billions of years and are varied and complex but limited in number. Researchers have begun to design folding patterns not produced in nature by obtaining a better understanding of how existing proteins fold. However, a major challenge has been to imagine folds that are both possible and functional. According to Kim, “It’s been very hard to predict which folds will be real and work in a protein structure. By combining biophysics-based representations of protein structure with diffusion methods from the image generation space, we can begin to address this problem.”
<style=”text-align: justify;”>The success of ProteinSGM lies in its ability to draw from a vast set of image-like representations of existing proteins that encode their structure accurately. These images are then fed into a generative diffusion model, which gradually adds noise until each image becomes all noise. The system tracks how the images become noisier and then runs the process in reverse, learning how to transform random pixels into clear images that correspond to fully novel proteins. Jin Sub (Michael) Lee, a doctoral student in the Kim lab and first author on the paper, explains that “a key idea was the proper image-like representation of protein structure, such that the diffusion model can learn how to generate novel proteins accurately.”