Malte Wagenbach

Biology Becomes Engineering: The De Novo Protein Revolution

March 19, 2026

In January 2025, a team at the University of Washington published a paper in Nature describing proteins they had designed from scratch - no natural template, no evolutionary starting point - that neutralize lethal snake venom. Mice given lethal doses of three-finger toxins survived at rates of 80-100%.

These proteins do not exist in nature. They were never selected by evolution. They were imagined by a computer, synthesized in a lab, and they worked on the first try.

This is not an isolated result. It is the current state of a field that has, in the last three years, crossed a threshold that changes everything downstream. Biology is becoming an engineering discipline. Not metaphorically. Literally.

What "de novo" means and why it matters

There are three things you can do with proteins computationally:

Predict structure: Given a natural amino acid sequence, figure out what 3D shape it folds into. This is what AlphaFold solved in 2020. It is extraordinary and it won the Nobel Prize. But it is a reading problem - understanding what nature already made.

Engineer existing proteins: Take a natural protein and mutate specific amino acids to improve it. This is traditional protein engineering. It is useful but constrained - you are always working within the design space that evolution explored.

De novo design: Specify a function you want - "bind this target," "catalyze this reaction," "form this shape" - and computationally generate both the 3D structure and the amino acid sequence from scratch. No natural template. No evolutionary precedent. A protein that has never existed in four billion years of life on Earth.

The third category is the revolution. And it went from theoretical curiosity to working technology in roughly 36 months.

The timeline that changed everything

David Baker founded the Rosetta software project at the University of Washington in 1998. For two decades, his lab worked on the problem: if you know the physics of how amino acids interact, can you design proteins that fold into shapes nature never made? The early results were modest - small, stable folds that proved the concept but had limited function.

The acceleration started in 2020-2022, when deep learning transformed the field:

November 2020 - AlphaFold2 achieves a median accuracy score of 92.4/100 on protein structure prediction, effectively solving a 50-year-old problem. But more importantly for design: it provides a fast, reliable way to check whether a designed sequence will actually fold into the intended structure.

September 2022 - ProteinMPNN, a message-passing neural network for sequence design, achieves 52.4% sequence recovery on native protein backbones (versus 32.9% for Rosetta). This is the inverse folding tool - give it a 3D shape, it generates an amino acid sequence that folds into it.

July 2023 - RFdiffusion published in Nature. This is the pivotal moment. Baker's team fine-tuned their RoseTTAFold structure prediction network on protein structure denoising tasks, creating a generative model. It starts with noise and iteratively denoises into a plausible protein structure, conditioned on desired properties - a binding target, a catalytic geometry, a scaffold constraint.

The pipeline snapped together: RFdiffusion generates a structure. ProteinMPNN generates a sequence. AlphaFold verifies that the sequence folds correctly. The entire cycle runs in hours on a GPU cluster. What took nature billions of years of random mutation and selection now takes an afternoon of computation.

June 2024 - EvolutionaryScale launches with $142M in seed funding and ESM3, a 98-billion parameter model trained on 2.78 billion proteins. ESM3 generates esmGFP - a functional green fluorescent protein with only 58% sequence identity to any known fluorescent protein. That 42% gap represents roughly 500 million years of natural evolution, compressed into a single generation step.

October 2024 - David Baker, Demis Hassabis, and John Jumper receive the Nobel Prize in Chemistry. Baker for computational protein design. Hassabis and Jumper for structure prediction with AlphaFold.

The Nobel committee does not award prizes for potential. They award them for accomplished fact.

The numbers

The field has passed the point where you need to argue it works. The numbers speak:

MetricValue
De novo proteins experimentally validated (across 11 studies)614 tested, 269 successful (43%)
Alpha-helical design success rate~88%
BindCraft binder design success rate10-100% per target
Designs needed for a hit (BindCraft)As few as 10
ProteinMPNN sequence recovery52.4% (vs 32.9% Rosetta)
ESM3 esmGFP divergence from nature96 mutations, 58% identity to nearest natural protein
Snake venom neutralization survival80-100% in mice from lethal doses
OpenCRISPR-1 mutations from natural Cas9400+
Baker lab career output640+ papers, 100+ patents, 21 biotech spinouts

BindCraft deserves special attention. Released as an open-source pipeline from EPFL in late 2024, it achieves 10-100% success rates for de novo protein binder design with nanomolar affinity. "As few as 10 designs" to get a working binder. No high-throughput screening. No library selection. You describe the target, run the computation, order 10 synthetic genes, test them, and several work.

For context: traditional antibody discovery through phage display or animal immunization takes months to years and costs hundreds of thousands of dollars per target. BindCraft takes days and costs the price of gene synthesis.

What is already working

This is not a technology waiting for its first application. It is a technology with applications shipping now.

Therapeutics:

  • Neo-2/15 (Neoleukin Therapeutics): A de novo designed protein that mimics IL-2 and IL-15, binding their receptor with higher affinity than the natural molecules. Superior anti-tumor activity in mice with reduced toxicity. In Phase I clinical trials for solid tumors.
  • Antibody Cages (Archon Biosciences, spun out October 2024): De novo designed nanocages that amplify antibody potency 20x. Their conatumumab nanocage triggers cancer cell death in the lab - an antibody that failed as a standalone drug succeeds when presented on a designed scaffold.
  • RSV/hMPV vaccine (Icosavax, acquired by AstraZeneca in 2024): Computationally designed virus-like particle vaccine with positive Phase II immunogenicity data.
  • RFantibody: Software that designs complete antibody variable regions - VHH nanobodies, single-chain variable fragments, full antibodies - that bind user-specified epitopes with atomic precision. Cryo-EM confirmed binding poses against influenza and C. difficile. Released freely for academic and commercial use.

Gene editing:

  • OpenCRISPR-1 (Profluent): The first AI-designed gene editor to successfully edit the human genome. Over 400 mutations from natural Cas9. Fully synthetic - never existed in nature. Shows reduced off-target effects and lower immunogenicity than the natural enzyme. Published in Nature in 2025.

Enzymes:

  • FAST-PETase: A designed enzyme that almost completely degrades post-consumer PET plastic from 51 different product types in one week.
  • De novo luciferases (Baker lab, Nature 2023): Entirely AI-designed bioluminescent enzymes. Small (13.9 kDa), thermostable (>95C), with catalytic activity comparable to natural luciferases. Monod Bio spun out to commercialize biosensors.
  • Riff-Diff enzymes (Nature 2025): De novo designed catalysts for retro-aldol and Morita-Baylis-Hillman reactions. Functional enzymes designed from catalytic motif geometry alone.

Biosensors:

  • Monod Bio: De novo bioluminescent sensors based on conformational switches. Already licensed to Bio-Techne.

The companies and the capital

The commercial ecosystem around de novo protein design is scaling fast:

CompanyFocusCapital
Xaira TherapeuticsAI-generated drugs (RFdiffusion team)$1B at launch (Apr 2024)
Chai DiscoveryDe novo antibody design$225M total, $1.3B valuation
ProfluentAI-designed gene editors$150M total
EvolutionaryScaleESM3 foundation model$142M seed
Cradle BioML-guided protein engineering$103M
ArzedaDesigned enzymes (food, pharma, industrial)$86M
Generate BiomedicinesGenerative protein therapeutics$50M Amgen deal (up to $1.9B)
Monod BioBioluminescent biosensors$25M
Archon BiosciencesAntibody Cages$20M seed

Xaira's $1 billion launch is the signal. That is not a seed round for a speculative thesis. That is Arch Venture Partners, Foresite Capital, and Sequoia betting that de novo protein design is ready for drug development at industrial scale. The team includes the principal architects of RFdiffusion and RFantibody.

Baker himself has co-founded 21 biotech companies from his lab. Icosavax was acquired by AstraZeneca. Neoleukin is in clinical trials. The translation pipeline from academic breakthrough to commercial product is well-established and accelerating.

The AI virtual cell: where this is heading

Three efforts are converging on building computational models of entire cells:

The Chan Zuckerberg Initiative is building virtual cell models with 1,000+ GPUs, in partnership with NVIDIA (announced October 2025). In August 2025, they released rBio - the first reasoning model trained on virtual cell simulations. Scientists can ask "Would suppressing gene A increase activity of gene B?" and get validated predictions.

The Arc Institute released State: a virtual cell model trained on single-cell perturbation data from over 100 million cells, predicting how stem cells, cancer cells, and immune cells respond to drugs, cytokines, or genetic perturbations.

The vision: integrate protein design, gene regulation, metabolic modeling, and cell signaling into a unified computational model. Simulate intracellular dynamics and predict biosafety endpoints before any in-cell or in-vivo testing.

We are not there yet. But the components now exist. De novo protein design gives us programmable molecular parts. Virtual cell models give us a way to predict how those parts behave in biological context. Together, they point toward biology as a fully programmable medium - not in some distant future, but on a trajectory measured in years.

The second and third order effects

I write about technologies that change the substrate on which civilization operates. Energy is one substrate. Computation is another. But proteins are the most fundamental substrate of all. Every living thing on Earth runs on proteins. Every disease involves proteins. Every industrial biological process - fermentation, enzyme catalysis, antibody production - depends on proteins that evolution happened to produce.

De novo design removes "that evolution happened to produce" from that sentence. The consequences cascade:

Medicine - First order: better drugs (designed cytokines, antibodies, enzymes as therapeutics). Second order: programmable therapeutics - drugs that sense disease state and activate conditionally. Baker's lab demonstrated this in October 2025: smart protein structures that control drug localization based on environmental cues, and switchable IL-2 variants that activate immune cells and then silence them on demand. Third order: the end of the antibody discovery bottleneck. The $200+ billion antibody market currently relies on animal immunization or phage display. De novo design makes antibodies to any target a software problem. The timeline collapses from years to days.

Materials - First order: protein-based materials with properties nature never produced. Second order: self-assembling nanomaterials for electronics, energy storage, filtration. Third order: biological manufacturing replacing petrochemical manufacturing. Enzymes operating at room temperature and atmospheric pressure replace industrial catalysts requiring hundreds of degrees and high pressure. The energy cost of chemical manufacturing drops by orders of magnitude.

Food - First order: better enzymes for food processing, alternative proteins engineered for taste and nutrition. Second order: crops with designed proteins for pest resistance, nitrogen fixation, drought tolerance. Third order: food production partially decoupled from land area through designed biosynthesis pathways.

Energy and environment - First order: designed enzymes for plastic degradation (FAST-PETase already works), carbon capture, biofuel production. Second order: industrial chemistry at ambient conditions. Third order: biological carbon capture at scale - designed organisms converting CO2 into useful materials.

The engineered protein sector is projected to exceed $500 billion by 2035. That number probably underestimates the real impact, because it only counts the direct market for designed proteins. It does not count the industries those proteins transform.

What still does not work

I am going to be honest about the limitations, because the hype cycle in biotech destroys credibility and this field deserves better.

Enzyme activity gap. Designed enzymes work, but they are generally less active than their natural counterparts. Nature has had four billion years to optimize. Computation has had three. The gap is closing - GRACE-designed carbonic anhydrases achieved 400 WAU/mL - but it is real.

Protein dynamics. AlphaFold and RFdiffusion produce static structures. Real proteins are dynamic - they flex, breathe, shift between conformational states. Designing proteins that move correctly (allosteric switches, molecular machines, motors) remains extremely hard. Nature builds rotary motors (ATP synthase), linear walkers (kinesin), and self-propelled rotors (the bacterial flagellum). We cannot yet design any of these from scratch.

Beta-sheet architectures. Alpha-helical designs succeed roughly 88% of the time experimentally. Beta-barrel designs succeed roughly 21%. This is a five-fold gap in reliability for an entire class of protein architecture.

In vivo behavior. A protein that works in a test tube may aggregate in a cell, get degraded by proteases, trigger immune responses, or bind unintended targets. The gap between in vitro validation and therapeutic viability remains large and expensive to cross.

Membrane proteins. Designing functional channels, transporters, and receptors that span cell membranes is significantly harder than designing soluble proteins. Most design successes to date are soluble.

Validation bottleneck. Computation is fast. Synthesizing and testing designed proteins is slow and expensive. This bottleneck limits iteration speed and will continue to do so until lab automation catches up.

These limitations are real. They are also the kind of limitations that shrink over time as tools improve, data accumulates, and the field matures. The trajectory is clear even if the destination is not yet reached.

Why this is the most important technology you are not following

I have written tens of thousands of words on energy - thorium, geothermal, the metabolic map of civilization. Energy is the substrate on which industrial civilization runs. Change the energy substrate and you change everything built on top of it.

Proteins are the substrate on which biology runs. All of it. Every cell, every organism, every disease, every food chain, every metabolic pathway. For four billion years, the only way to get a new protein was to wait for random mutation and hope that natural selection found a use for it.

That constraint is gone. Not theoretically. Practically. Today, a graduate student with access to RFdiffusion, ProteinMPNN, and a gene synthesis service can design a protein that has never existed in the history of life on Earth, order it as synthetic DNA for a few hundred dollars, express it in bacteria, and test whether it works - all within a few weeks.

The design-build-test cycle for the fundamental molecular machinery of life has compressed from evolutionary timescales to human project timescales. From billions of years to weeks.

I do not think we have fully processed what this means. We are used to engineering things we invented - circuits, engines, software. We are not used to engineering the molecular machinery that predates us by four billion years. But that is where we are. Biology is no longer something that happens to us. It is something we design.

The age of discovering proteins is ending. The age of designing them has begun.

Enjoyed this?

I write about energy, AI, systems thinking, and building things that matter. Subscribe to get new posts.

Subscribe on Substack