Ginkgo Bioworks wants to build a ChatGPT for DNA

ChatGPT may be suitable for drafting emails to your boss, but can a similar AI style craft a DNA sequence for a life-saving medication? There’s optimism among the developers of a new wave of foundational AI models. One such project is being spearheaded by Ginkgo Bioworks, which has recently inked a five-year partnership with Google Cloud to house AI tailored for synthetic biology.

Ginkgo, known for delving into the genetic intricacies of pharmaceuticals, fertilizers, fragrances, and cannabinoids, holds two key advantages in this endeavor: a substantial repository of proprietary data from its 15-year history and a streamlined automated system for data collection. This was emphasized by Jason Kelly, Ginkgo’s CEO and co-founder.

In our conversation with Kelly, we explored generative AI’s potential in the field of biotechnology, delving into the nuances of the experiment and its potential outcomes.

This conversation has been condensed and clarified for conciseness.

How is Ginkgo striving to introduce generative AI to the realm of biotechnology?

The question at hand is, “Can we create foundational models in biology? What would that entail?” To begin with, DNA is a code, correct? So, if you take a gene, which constitutes a single component of DNA, a human possesses tens of thousands of genes. Consider a gene as a narrative that unfolds from start to finish, like a book. So, you can apply the same kind of predictive training as with large language models, but in this case, the subject is a gene from the natural world. And, akin to the internet being a reflection of human language—these models can genuinely learn English—DNA sequences aren’t arbitrary strings of DNA bases (A, T, C, and G); they are a product of 4 billion years of evolution. They possess a grammar, they constitute a language, and they encapsulate a design that distinguishes a protein as either effective or subpar. If you scramble this sequence, it results in a malfunction—precisely what genetic diseases manifest as: a mutation that disrupts the normal function.

Hence, what exists in nature are these ‘books,’ composed in the language of DNA and proteins. So, here’s the essence: Employ the same process, train a colossal neural network, a substantial foundational model, and teach it to compose DNA, similar to how ChatGPT or GPT-4 learned to craft English while grasping the fundamentals of grammar. The real magic occurs if this foundational model, combined with all the antibody data, proves to be more effective in drug discovery, creating novel crop protection solutions, and who knows what else biotechnology may achieve—perhaps it can be further enhanced with this foundational model supporting the machine learning we have historically employed. This is what we’re developing in partnership with Google. Essentially, Ginkgo is embarking on an extensive neural network training endeavor, leveraging as many ‘genome books’ as we can access—more than anyone else, we believe. And with these, we aim to train this substantial model.

Source EmergingTechBrew