0:00
/
0:00

Why evolution designed us to die fast, & how we can change that – Jacob Kimmel

Is the answer already in our genome?

Jacob Kimmel, the Co-founder & President of NewLimit, thinks he can find the transcription factors to reverse aging. We do a deep dive on why this might be plausible and why evolution hasn’t already optimized for longevity. We also talk about why drug discovery has been getting exponentially harder, and what a new platform for biological understanding to speed up progress would look like. As a bonus, we get into the nitty gritty of gene delivery and Jacob’s controversial takes on CAR-T cells.

For full disclosure, I am an angel investor in NewLimit. This did not impact my decision to interview Jacob, nor the questions I asked him.

Watch on YouTube; listen on Apple Podcasts or Spotify.

Sponsors

  • Hudson River Trading uses deep learning to tackle one of the world's most complex systems: global capital allocation. They have a massive in-house GPU cluster, and they’re constantly adding new racks of B200s to ensure their researchers are never constrained by compute. Explore opportunities at hudsonrivertrading.com/dwarkesh

  • Google’s Gemini CLI turns ideas into working applications FAST, no coding required. It built a complete podcast post-production tool in 10 minutes, including fully functional backend logic, and the entire build used less than 10% of Gemini’s session context. Check it out on Github now!

To sponsor a future episode, visit dwarkesh.com/advertise.

Timestamps

(00:00:00) – Three reasons evolution didn’t optimize for longevity

(00:12:07) – Why didn't humans evolve their own antibiotics?

(00:25:26) – De-aging cells via epigenetic reprogramming

(00:44:43) – Viral vectors and other delivery mechanisms

(01:06:22) – Synthetic transcription factors

(01:09:31) – Can virtual cells break Eroom’s Law?

(01:31:32) – Economic models for pharma

Transcript

00:00:00 – Three reasons evolution didn’t optimize for longevity

Dwarkesh Patel 00:00:00

Today I have the pleasure of interviewing Jacob Kimmel, who is the president and co-founder of NewLimit, where they're trying to epigenetically reprogram cells to their younger states.

Jacob, thanks so much for coming on the podcast.

Jacob Kimmel 00:00:09

Thanks so much for having me. Looking forward to the conversation.

Dwarkesh Patel 00:00:11

All right, first question. What's the first principles argument for why evolution just discards us so easily? I know evolution cares about our kids. But if we have longer, healthier lifespans, we can have more kids, right? We can care for them longer, we can care for our grandkids. Is there some pleiotropic effect that an anti-aging medicine would have which actually selects against you staying young for longer?

Jacob Kimmel 00:00:37

I think there are a couple different ways one can tackle this. One is you have to think about what's the selective pressure that would make one live longer and encode for higher health over longer durations. Do you have that selective pressure present? There's another which is, are there any anti-selective pressures that are actually pushing against that? There's a third piece of this, which is something like the constraints of your optimizer. If we think about the genome as a set of parameters and the optimizer is natural selection, then you've got some constraints on how that actually works. You can only do so many mutations at a time. You have to spend your steps that update your genome in certain ways.

Tackling those from a few different directions, what would the positive possible selection be? As you highlighted, it might be something like, “If I'm able to extend the lifespan of an individual, they can have more children, they can care for those children more effectively. That genome should propagate more readily into the population.” One of the challenges then—if you're trying to think back in a thought experiment style of evolutionary simulation here—would be: What were the conditions under which a person would actually live long enough for that phenotype to be selected for, and how often would that occur?

This brings us back to some very hypothetical questions. Things like, what was the baseline hazard rate during the majority of human and primate evolution? The hazard rate is simply, “What is the likelihood you're going to die on any given day?” That integrates everything. That's diseases from aging, that's getting eaten by a tiger, that's falling off a cliff, that's scraping your foot on a rock and getting an infection and dying from that.

From the best evidence we have, the baseline hazard rate was very, very high. Even absent aging, you're unlikely to actually reach those outer limits of possible health where aging is one of the main limitations. The number of individuals in the population that are going to make it later in that lifespan, where using some of your evolutionary updates to try and push your lifespan upward, is relatively limited. The amount of gradient signal flowing back to the genome then is not as high as one might intuitively think.

Dwarkesh Patel 00:02:29

On that, often people who are trying to forecast AI will discuss how hard evolution tried to optimize for intelligence, and what were the things which optimizing for intelligence would have prevented evolution from selecting for at the same time? So even if intelligence were a relatively easy thing to build in this universe, it would have taken evolution so long to get at human-level intelligence. And, potentially, if intelligence were really easy, then it might imply that we're going to get superintelligence and Jupiter-level intelligence, etc. The sky's the limit.

One argument is birth canal sizes, etc., or the fact that we had to spend most of our resources on the immune system. But what you just hinted at is an independent argument that if you have this high hazard rate, that would imply you can't be a kid for too long. Kids die all the time and you have to become an adult so that you can have your kids.

Jacob Kimmel 00:03:27

You’ve got to contribute resources back to the group. You can't just be a freeloader. You need to get calories, go out in the jungle, get some berries.

Dwarkesh Patel 00:03:33

If you're just hanging out learning stuff for 50 years, you're just going to die before you get to have kids yourself.

Obviously, humans have bigger brains than other primates. We also have longer adolescences, which help us make use potentially of the extra capacity our brain gives us. But if you made the adolescence too long, then you would just die before you get to have kids. If that's going to happen anyways, what's the point of making the brain bigger? AKA, maybe intelligence is easier than we think, and there's a bunch of contingent reasons evolution didn't churn as hard on this variable as it could have.

Jacob Kimmel 00:04:05

I entirely agree with that particular thesis. In biology in general, when you're trying to engineer a given property, be it being healthier longer or be it making something more intelligent… This is true even at the micro-level of trying to engineer a system to manufacture a protein at high efficiency. You always have to start by asking yourself, “Did evolution spend a lot of time optimizing this? If yes, my job is going to be insanely hard. If not, potentially there are some low-hanging fruit.” This is a good argument for why, potentially, intelligence wasn't strongly selected for.

The lifespan argument plays back into intelligence to a degree. You start to ask, “If I have intelligence that's able to compound over time and for instance, in some hypothetical universe, my fluid intelligence lasts much longer into my lifespan…” If the number of people who are reaching something like 65 is very small in a population, you're not necessarily going to select for alleles that lead to fluid intelligence preservation late into life.

This is part of my own pet hypothesis around some of the interesting phenomenology in when discoveries are made throughout lifespans. There are some famous results. For instance—and I'm going to get the exact age a little bit wrong—but in mathematics, most great discoveries happen roughly before 30. Why should that be true? That doesn't make sense.

You can put down a bunch of societal reasons for it. Maybe you become staid in your ways. Your teachers have caused you to restrict your thinking by that point. But really, that's true across centuries? Is that true across many different unique cultures around the world? That's true in both cultures from the East and cultures from the West? That seems unlikely to me.

A much simpler explanation is that for whatever reason, our fluid intelligence is roughly maximized at the time where the population size during human evolution was maximal. If you had to pick an age at which fluid intelligence was selected most strongly for, it's probably around 25 or 30. That's probably about the age of the adults in the large populations that were being selected for during most of evolution.

There's a lot of reason here to think that there's interplay between many features of modern humans and how long we were living, and how that dictates some of the features that occur that rise and fall throughout our lives.

Dwarkesh Patel 00:06:01

In one way, this is a very interesting RL problem. It's a long-horizon RL problem, a 20-year horizon length, and then there's a scalar value of how many kids you have, I guess that survive, etc. If you've heard from your friends about how hard RL is on these models for just very intermediate goals that last an hour or a couple of hours, it's surprising that any signal propagates across a 20-year horizon.

On the point about fluid intelligence peaking, it’s not only the case that in many fields achievement peaks before 30. In many cases, if you look at the greatest scientists ever, they had many of their greatest achievements in a single year.

Jacob Kimmel 00:06:42

Yeah, the annus mirabilis.

Dwarkesh Patel 00:06:44

Yeah, exactly.

Jacob Kimmel 00:06:44

Yeah, exactly.

Dwarkesh Patel 00:06:45

Newton, what is it? Optics, gravity, calculus at 21.

Jacob Kimmel 00:06:49

Do you know the Alexander von Humboldt story?

Dwarkesh Patel 00:06:51

No.

Jacob Kimmel 00:06:52

Alexander von Humboldt is one of the most famous scientists in history who is kind of forgotten now. He had this one expedition to South America where he climbed Mount Chimborazo at a time when very few Europeans had done that. He was able to observe various ecological layers that were repeated across latitudes and across altitudes. It caused him to formulate an understanding of how selection was operating on plants at different layers in the ecosystem. That one expedition was the basis of his entire career.

When you see something named Humboldt, just to give you a sense of how famous this guy is, it's usually Alexander von Humboldt. It's not like this is some massive, prosperous German family name that just happens to be really common. It's this one guy. So really it was like this singular year in which he conceived a lot of our modern understanding of botany and selective pressure.

Dwarkesh Patel 00:07:35

Interesting. So that's one out of three components of the evolutionary story.

Jacob Kimmel 00:07:39

The next piece of the evolutionary story is, “Is there anything selecting against longevity?” Let's just pretend everything I said was wrong. Can I still make an argument that maybe evolution hasn't maximally optimized for our longevity?

One argument that comes up, and I'll caveat and say I don't know how strong some of the mathematical models that people put together here are. You can find people using the same idea to argue for and against. But there's this notion of what's called kin selection.

If you take a selfish gene view of the world—that really this is the genome optimizing for the genome's propagation, it's not trying to optimize for any one individual—then actually optimizing for longevity is a pretty tricky problem because you have this nasty regularization term.

If you're able to make a member of the population live longer, but you don't also counteract the decrease in their fitness over time—meaning you maybe extend maximum lifespan but you haven't totally eliminated aging—then the number of net calories contributed to the genome as a function of that person's marginal year and their own calorie consumption is less than if you were to allow that individual to die and actually have two 20 year olds, for instance, that follow behind them.

So there is a notion by which a population being laden demographically with many aged individuals, even if they did have fecundity persisting out some period later in life, is actually net negative for the genome's proliferation and that really a genome should optimize for turnover and population size at max fitness.

Dwarkesh Patel 00:08:56

I love this idea of aging as a length regularizer. People might be familiar with the idea that when companies are training models, they'll have a regularizer for, “You can do chain of thought, but don't make the chain of thought too long.” You're saying how many calories you consume over the course of your life, is one such regularizer? That's interesting.

The third point was...

Jacob Kimmel 00:09:20

The third piece is basically optimization constraints. So this is where another ML analogy is helpful. Well, actually a two-layer neural network is technically a universal approximator, but we can never actually fit them in such a way. Why does that occur? People will wave their hands, but it basically comes down to the fact that we don't really know how to optimize them, even if you can prove out in a formal sense that they are universal approximators.

I think we have similar optimization challenges with our genome as the parameters and evolution as the optimization algorithm. One of those is that your mutation rate basically bounds the step size you can take. So if you imagine that at each generation, you get some number of inputs, you can select for some number of alleles. The max number of variations in the genome is set by your mutation rate. If you dial your mutation rate up too high, you probably get a bunch of cancers, so you're selected against. If you have it too low, you can't really adapt to anything. You end up with this happy medium, but that limits your total step size. Then the number of variants you can screen in parallel is basically limited by your population size. So for most of evolution, there are lots of forces constraining population size as well.

One of the dominant sources of selection on the genome is really prevention of infectious disease. It seems like when you study the history of early modern man, infectious disease is actually what shaped a lot of our population demographics. There's a lot of pressure pushing for those step sizes, those updates to the genome, really to be optimizing for protection against infectious disease rather than other things.

Even if you imagine that maybe the arguments on the first and the second of these possible positive selection being absent for longevity and potentially some negative selection existing, you could construct a reasonable argument for why humans don't live forever and why the genome hasn't optimized for that, simply based on these optimization constraints. You have to imagine not only that the positive selection is there and the negative selection is absent, but that when you think about the weighted loss term of all the things the genome is optimizing for, that the weight on longevity is high enough to matter. Even if you imagine it's there, if you simply imagine that the lambdas are dialed toward infectious disease resilience more effectively, then you can construct an argument for yourself.

And so I think really when you start to ask “why don't we live forever, why didn't evolution solve this?” you actually have to think about an incredibly contingent scenario where both the positive selection is there, the negative selection is absent, and you have a lot of our evolutionary pressure going toward longevity to solve this incredibly hard problem in order to construct the counterfactual in which longevity is selected for and does arise in modern man and in which we are optimal.

So I think that puts human aging and longevity and health really in this category of problems in which evolution has not optimized for it. Ergo, it should be, relatively speaking, relative to a problem evolution had worked on, easy to try and intervene and provide health. In many ways, the existence of modern medicines, which are incredibly simplistic—we are targeting a single gene in the genome and turning it off everywhere at the same time—the fact that these provide massive benefit to individuals is another sort of positive emission or piece of evidence.

00:12:07 – Why didn't humans evolve their own antibiotics?

Dwarkesh Patel 00:12:07

Antibiotics are an even more clear case of that because here's something that evolution actually cares a lot about. It feels like antibiotics should be…

Jacob Kimmel 00:12:15

Why didn't humans evolve their own antibiotics?

Dwarkesh Patel 00:12:16

Yeah.

Jacob Kimmel 00:12:17

It's an excellent question that I haven't heard posed before. Where do antibiotics come from? To your point, we could synthesize them. They're just metabolites, largely of other bacteria or other fungi. You think about the story of penicillin. What happens? Alexander Fleming finds some fungi growing on a dish. The fungi secrete this penicillin antibiotic compound. So there's no bacteria growing near the fungi. He says he has this light bulb moment of, “Oh my gosh, they're probably making something that kills bacteria.”

There's no prima facie reason that you couldn't imagine encoding an antibiotic cassette into a mammalian genome. Part of the challenge that you run into is that you're always in evolutionary competition. There's this notion of what's called the Red Queen hypothesis. It's an allusion to the story in Lewis Carroll's Through the Looking Glass, where the Red Queen is running really fast just to stay in place.

When you look at pathogen-host interactions or competition between bacteria and fungi that are all trying to compete for the same niche, what you find is they're evolving very rapidly in competition with one another. It's an arms race. Every time a bacteria evolves a new evasion mechanism, the fungus that occupies the niche will evolve some new antibiotic.

Part of why there is this competitiveness between the two is they both have very large population sizes in terms of number of genomes per unit resource they're consuming. There are trillions of bacteria in a drop of water that you might pick up. There's trillions of copies of the genome. Massive analog parallel computation. And at the same time, they can tolerate really high mutation rates because they're prokaryotic. They don't have multiple cells. If one cell manages to mutate too much and it isn't viable, or it grows too fast, it doesn't really compromise the population and the whole genome. Whereas for metazoans like you and I, if even one of our cells has too many mutations, it might turn into a cancer and eventually kill off the organism.

What I'm getting at, and this is a long-winded way of getting there, is that bacteria and other types of microorganisms are very well adapted to building these complex metabolic cascades that are necessary to make something like antibiotics. It's necessary to maintain that same mutation rate and population size in order to maintain the competition. Even if our human genome stumbled into making an antibiotic, most pathogens probably would have mutated around it pretty quickly.

Dwarkesh Patel 00:14:18

That should imply that through evolutionary history there are millions of “naive antibiotics” which could have acted as antibiotics, but now, basically, all the bacteria have evolved around it.

Do we see evidence of these historical antibiotics that some fungi came up with and the bacteria revolved around and there's evidence for remnants in their DNA?

Jacob Kimmel 00:14:41

I'm going a bit beyond my own knowledge here, but my strong hypothesis would be yes. I can't point to direct evidence today. There are some examples of this. For instance, bacteria that fight off viruses that infect them, bacteriophages, have things like CRISPR systems. You can actually go and look at the spacers, the individual guide sequences that tell the CRISPR system, “Which genome do you go? Where do you cut?”

And you find some of these guides that are very ancient. It seems like this bacterial genome might not have encountered that particular pathogen for quite a while. So you can actually get an evolutionary history of what the warfare was like, what the various conflicts were throughout this genomic history just by looking at those sequences.

In mammals where I do know a bit better, we do have examples of this where there is this co-evolution of pathogen and host. Imagine you have some antipathogen gene A fighting off some virus X. Well you then actually update. Now you have virus X’ and antipathogen gene A’. Now virus X’ goes away, but actually virus X still exists and we've lost our ability to fight it.

Those examples really do happen. There's a prominent one in the human genome. We have a gene called TRIM5alpha. It actually binds an endogenous retrovirus that is no longer present, but was at one point actually resurrected by a bunch of researchers. It was demonstrated that this is the case. We have this endogenous gene which basically fits around the capsid of the virus like a baseball in a glove and prevents it from infecting.

It turns out if you look at the evolutionary history of that gene and you trace it back through monkeys, you can actually find that a previous iteration inhibited SIV, which is the cousin of HIV in humans. Old World monkeys actually can't get SIV, whereas New World monkeys can and humans can, obviously.

So it seems like what happened—and you can actually make a few mutations in TRIM5alpha and find that this is true—is that TRIM5alpha once protected against an HIV-like pathogen in the primate genomes. And then there was this challenge from this massive endogenous retrovirus. It was so bad that the genome lost the ability to fight off these HIV-like viruses in order to restrict this endogenous retrovirus. You can see it because that retrovirus integrates into our genome. There are latent copies, like the half bodies of this virus all throughout our DNA code. Then this particular retrovirus went extinct. Reasons unknown, no one knows why. But we didn't re-update that piece of our host defense machinery to fight off HIV again.

So we're in a situation where you can go in and take human cells and make just a couple edits in that TRIM5alpha gene. It's currently protecting against a virus which no longer exists. You can edit it back to actually restrict HIV dramatically. So there are plenty of examples. You could imagine the same thing for antibiotics where like, “hey, this particular defense mechanism went away because the pathogen evolved its own defense to it.” Well, the pathogen might have lost that defense long ago. If you could extract that historical antibiotic, that historical antifungal, potentially it actually has efficacy.

Dwarkesh Patel 00:17:24

Isn't the mutation rate per base pair per generation like one in a billion or something?

Jacob Kimmel 00:17:27

It's quite low.

Dwarkesh Patel 00:17:29

You're saying that in our genomes we can find some extended sequence which encodes how to bind specifically to the kind of virus that SIV is. The amount of evolutionary signal you would need in order to have a multiple base pair sequence… So each nucleotide consecutively would have to mutate in order to finally get the sequence that binds to SIV. That seems almost implausible. I guess evolution works, so we can come up with new genes, but how would that even work?

Jacob Kimmel 00:18:00

A great explanation for understanding a lot of evolution and how you're able to actually adapt to new environments, new pathogens, is that gene duplication is possible. This explains a whole lot. If you look at most genes in the genome, they actually arise at least at some point in evolution from a duplication event.

That means you've got gene A, it's performing some job, and then some new environmental concern comes along. Maybe it's a lack of a particular source of nutrient, maybe it's a pathogen challenging you. Maybe gene A, if it were to dedicate all of its energies, so to speak and you were to mutate it to solve this new problem, could be adapted with a minimal number of mutations. But then you lose its original function.

So we have this nice feature of the genome which is that it can just copy and paste. So occasionally what will happen in evolution is you get a copy paste event. Now I've got two copies of gene A and I can preserve my original function in the original copy. Then this new copy can actually mutate pretty freely because it doesn't have a strong selective pressure on it. So most mutations might be null. I've got two copies of the gene, I can have lots of mutations in it accumulate. Nothing bad really happens because I've got my backup copy, my original. So you can end up with drift.

Dwarkesh Patel 00:19:04

You're saying that even though the per base pair mutation rate might be one in a billion, if you've got 100 copies of a gene, then the mutation rate on a gene, or on a low Hamming distance sequence to the one you're aiming for, might actually be quite high, and you can actually get the target sequence.

Jacob Kimmel 00:19:21

It's not that the base rate goes up. It's not like DNA polymerase is more erroneous or that you're just doubling it. That is true, but I don't think it's the main mechanism.

One of the main mechanisms that just makes it difficult for evolution to solve a problem is if a mutation breaks a gene. Somewhere along the path of edits, imagine there are three edits that take a host defense gene from restricting SIV to restricting this new nasty PT endogenous retrovirus. Well, if one edit just breaks the gene, two edits just breaks the gene, three edits fixes it, it's really hard for evolution to find a path whereby you're actually able to make those first two edits because they're net negative for fitness. So you need some really weird contingent circumstances.

Through duplication, you can create a scenario where those first two edits are totally tolerated. They have no effect on fitness. You've got your backup copy, it's doing its job. Even though the mutation rate is low, some of these edits actually aren't that large. I'm going to forget the number of edits, for instance in TRIM5alpha, for this particular phenomenon, but it's in the tens. It's not that you need massive kilobase scale rearrangements. It's actually a fairly small number of edits. Basically you can just align the sequence of this gene in New World versus Old World monkeys and then for humans and you find there's a very high degree of conservation.

Dwarkesh Patel 00:20:36

Conceptually, is there some phylogenetic tree of gene families where you've got the transposons and you've got the gene itself, but then you've got the descendant genes which are low Hamming distance? Is there some conceptual way in which they're categorized?

Jacob Kimmel 00:20:51

You can arrange genes in the human genome by homology to one another. What you find is even in our current genome, even without having the full historical record, there are many, many genes which are likely resulting from duplication events.

One trivial way that you can check this for yourself is just go look at the names of genes. Very often you'll see something where it's like gene one, gene two, gene three or type one, type two, type three. If you then go look at the sequences, sometimes those names arise from the fact that they were discovered in a common pathway and they have nothing to do with each other.

A lot of the time it's because the sequences are actually quite darn similar. Really what probably happened is they evolved through a duplication event and then maybe did some swapping with some other genes. And you ended up with these quite similar, quite homologous genes that now have specialized functions.

So when evolution has a new problem to solve, it doesn't have to start from scratch. It starts from what was the last copy of the parameters for encoding a gene that is getting close to solving this. Okay, let's do a copy paste on that and then iterate and fine-tune on those parameters as opposed to having to start with “ab initio, some random stretch of sequence somewhere in the genome has to become a gene.”

Dwarkesh Patel 00:21:50

This is fascinating. Back to aging. You’ll have to cancel your evening plans. I've got so many questions for you.

Jacob Kimmel 00:21:58

Keep going man.

Dwarkesh Patel 00:22:00

So the second reason you gave was that there's selective pressure against people who get old but still keep living, but they're slightly less fit.

Jacob Kimmel 00:22:17

They’re suboptimal from a calorie input perspective, the number of calories they can gather for the population is lower.

Dwarkesh Patel 00:22:21

That's how people love thinking about their grandpas.

Jacob Kimmel 00:22:25

Suboptimal calorie provider right there.

Dwarkesh Patel 00:22:28

A concern you might have about the effects of longevity treatments on your own body is that you will fix some part of the aging process, but not the whole thing. It seems like you're saying that you actually think this is the default way in which an anti-aging procedure would work, because that's the reason evolution didn't optimize for it. We're only fixing half of the aging process and not the whole thing. Whereas sometimes I hear longevity proponents be like, “No, we'll get the whole thing. There's going to be a source that explains all of aging and we'll get it.”

Whereas, your evolutionary argument for why evolution didn't optimize against aging relies on the fact that aging actually is not monocausal and evolution didn't bother to just fix one cause of aging.

Jacob Kimmel 00:23:17

That's correct. I don't think that there is a single monocausal explanation for aging. I think there are layers of molecular regulation that explain a lot. For instance, I have dedicated my career now to working on epigenetics and trying to change which genes cells use because I think that explains a lot of it. But it's not that there is some upstream “bad gene X” and all we have to do is turn that off and suddenly aging is solved.

The most likely outcome is that when we eventually develop medicines that prolong health in each of us, it's not going to fix everything all at once. There's not going to be a singular magic pill. Rather you're going to have medicines that add multiple healthy years to your life, years you can't otherwise get back. But it's not going to fix everything at the same time. You are still going to experience, for the first medicine, some amount of decline over time. This gives you an example, if you think about evolution as a medicine maker in this sort of anthropomorphic context, of why it might not have been selected for immediately.

00:25:26 – De-aging cells via epigenetic reprogramming

Dwarkesh Patel 00:25:26

So evolution didn't select for aging. What are you doing? What's your approach at NewLimit that you think is likely to find the true cause of aging?

Jacob Kimmel 00:25:38

We're working on something called epigenetic reprogramming, which very broadly is using genes called transcription factors. I like to think about these as the orchestra conductors of the genome. They don't perform many functions directly themselves, but they bind specific pieces of DNA and then they tell which genes to turn on, which genes to turn off.

They eventually put chemical marks on top of DNA, on some proteins that DNA surrounds. This is one of the answers, this particular layer of regulation called the epigenome. It's the answer to this fundamental biological question of how do all my cells have the same genome but ultimately do very different things? Your eyeball and your kidney have the same code, and yet they're performing different functions. That may sound a little bit simplistic, but ultimately, it's kind of a profound realization.

That epigenetic code is really what's important for cells to define their functions. That's what's telling them which genes to evoke from your genome. What has now become relatively apparent is that the epigenome can degrade with age. It changes. The particular marks that tell your cells which genes to use can shift as you get older. This means that cells aren't able to use the right genetic programs at the right times to respond to their environment. You're then more susceptible to disease, you have less resilience to many insults that you might experience.

Our hope is that by remodeling the epigenome back towards the state it was in when you were young right after development, that you'll be able to actually address myriad different diseases whose one of strong contributing factors is that cells are less functional than when you were at an earlier point in your life. We're going after this by trying to find combinations of these transcription factors that are able to actually remodel the epigenome so that they can bind to just the right places in the DNA and then shift the chemical marks back toward that state when you are a young individual.

Dwarkesh Patel 00:27:13

If you're just making these broad changes to a cell state through these transcription factors which have many effects, are there other aspects of a cell state that are likely to get modified at the same time in a way that would be deleterious. Or would it be a straightforward effect on cell state?

Jacob Kimmel 00:27:33

How I wish it were straightforward. No, it's very likely. Each of these transcription factors binds hundreds to thousands of places in the genome. One way of thinking about it is if you imagine the genome as the base components of cell function, then these transcription factors are kind of like the basis set in linear algebra. It's different combinations and different weights of each of the genes. Most of them are targeting pretty broad programs.

There are no guarantees that aging actually involves moving perfectly along any of the vectors in this particular basis set. And so it's probably going to be a little tricky to figure out a combination that actually takes you backward. There's, again, no guarantees from evolution that it's just a simple reset. It's actually a critical part of the process that we run through as we try to discover these medicinal combinations of transcription factors we can turn on, ensuring that they not only are making an aged cell revert to a younger state...

We measure that a couple different ways. One is simply measuring which genes those cells are using. They use different genes as they get older. You can measure that just by sequencing all of the mRNAs, which are really the expressed form of the genes being utilized in the genome at a given time. You see that aged cells use different genes. Can I revert them back to a younger state? Colloquially, we call this a "looks like" assay. Can I make an old cell look like a young one based on the genes it's using?

More importantly, we go down and drill to the functional level. We measure, “Can I actually make an aged cell perform its functions, its object roles within the body, the same way a young cell would?” These are the really critical things you care about for treating diseases. Can I make a hepatocyte, a liver cell in Greek, function better in your liver so it's able to process metabolites like the foods you eat, how it's able to process toxins like alcohol and caffeine? Can I make a T cell respond to pathogens and other antigens that are presented within your body?

These are the ways in which we measure age. We need to ensure that not only does the combination of TFs that we find actually have positive effects along those axes. But we then want to also measure any potential detrimental effects that emerge. There are canonical examples where you can seemingly reverse the age of a cell, for instance, at the level of a transcriptome, but simultaneously, you might be changing that cell's type or identity.

Shinya Yamanaka was a scientist who won the Nobel in 2012 for some work he did in about 2007, where he discovered that you could just take four transcription factors and actually, just by turning on these four genes, turn an adult cell all the way back into a young embryonic stem cell. It's a pretty amazing existence proof that shows that you can reprogram a cell's type and a cell's age simultaneously, just by turning on four genes. Out of the 20,000 genes in the genome, the tens of millions of biomolecular interactions, just four genes is enough. That's a shocking fact.

We actually have known for many years now that you can reprogram the age of a cell. The challenge is that simultaneously, you're doing a bunch of other stuff, as you alluded to. You're changing its type, and that might be pathological. If you did that in the body, it would probably cause a type of tumor called a teratoma.

So we measure not only at the level of the genes a cell is using. Do you still look like the right type of cell? Are you still hepatocyte? Are you still a T cell? If not, that's probably pathological. You can also use that same information to check for a number of other pathologies that might develop. Did I make this T cell hyperinflammatory in a way that would be bad? Did I make this liver cell potentially neoplastic, proliferate too much even when the organism's healthy and undamaged?

You can check for each of those at the level of gene expression programs and likewise, functionally. Before you put these molecules in a human, you actually just functionally check in an animal. You make an itemized list of the possible risks you might run into. Here are the ways it might be toxic, here are the ways it might cause cancer. Are we able to measure deterministically and empirically that that doesn't actually occur?

Dwarkesh Patel 00:30:52

This is a dumb question, but it will help me understand why an AI model is necessary to do any of this work. You mentioned the Yamanaka factors. From my understanding, the way he identified these four transcription factors was that he found the 24 transcription factors that have high expression in embryonic cells, and then he just turned them all on in a somatic cell. Basically, he systematically removed from this set until he found the minimal set that still induces a cell to become a stem cell.

That doesn't require any fancy AI models. Why can't we do the same things for the transcription factors that are expressed more in younger cells as opposed to older cells, and then keep eliminating from them until we find the ones that are necessary to just make a cell young?

Jacob Kimmel 00:31:42

I wish it were so easy. You're entirely right. Shinya Yamanaka was able to do this with a relatively small team, with relatively few resources, and achieve this remarkable feat. It's entirely worth asking. Why can't a similar procedure work for arbitrary problems in reprogramming cell state? Whether it be trying to make an aged cell act like a young one, a disease cell act like a healthy one, why can't you just take 24 transcription factors and randomly sort through them?

There were two features of Shinya's problem that I think make it amenable to that sort of interrogation that aren't present for many other types of problems. This is why he's such a remarkable scientist. Most of science is problem selection. You don't actually get better at pipetting or running experiments after a certain age, but you do get better at picking what to do. He's amazing at this.

The first feature is that measuring your success criterion is trivial in the particular case he was investigating. He's starting with somatic cells that, in this case, were a type of fibroblast, which literally is defined as cells that stick to glass and grow in a dish when you grind up a tissue. It sounds fancy, but it's a very simplistic thing. He's starting with fibroblasts, you can look at them under a microscope, and you can see they’re fibroblasts just based on how they look.

Then the cells he's reprogramming toward are embryonic stem cells. These are tiny cells, they're mostly nucleus. They grow really fast. They look different, they detach from a dish, they grow up into a 3D structure. They express some genes that will just never be turned on in a fibroblast by definition.

How he ran the experiment was he just set up a simple reporter system. He took a gene that should never be on in a fibroblast, should only be on in the embryo, and he put a little reporter behind it so that these cells would actually turn blue when you dumped a chemical on them. Then he ran this experiment in many, many dishes with millions upon millions of cells.

The second really key feature of the problem is this notion that those cells he's converting into amplify. They divide and grow really quickly. In order for you to find a successful combination, you don't actually need it to be efficient almost at all. The original efficiency Yamanaka published, the number of cells in the dish that convert from somatic to an induced pluripotent state, back into a stem cell, is something like a basis point or a tenth of a basis point, so 0.01%, 0.001%. If these cells were not growing and they were not proliferating like mad, you probably would never be able to detect that you had actually found anything successful. It's only because success is easy to measure once you have it and—even being successful in very rare cases, one in a million—amplifies and you can detect it, that this was amenable to his particular approach.

In practice, what he would do is dump these factors or this group of 24 minus some number, eventually whittling it down to four. He would dump these onto a group of cells and over the course of about 30 days, just a few cells in that dish, like a countable number on your fingers, would actually reprogram. But they would proliferate like mad. They form these big colonies. It's a single cell that just proliferates and forms a bunch of copies of itself. They form these colonies. You can see with your eyeballs by holding the dish up to the light and looking for opaque little dots on the bottom. You don't need any fancy instruments. Then you could stain them with this particular stain and they would turn blue based on the genetic reporter he had.

We look at those key features of the problem and we pick any other problem we're interested in. I'm interested in aging, so that's the one I'm going to pick for explanation. How difficult is it to measure the likelihood of success or whether you've achieved success for cell age? It turns out age is much more complicated in terms of discriminating function than actually just comparing two types of cells. An old liver cell and a young liver cell, prima facie, actually look pretty darn similar. It's actually quite nuanced the ways in which they're distinct. There isn't a simple, trivial system where you just label your one favorite gene or you can just…

Dwarkesh Patel 00:35:09

Give the young cells cancer. They'll grow.

Jacob Kimmel 00:35:13

Just make the old ones cancer, and then they'll grow. Dwarkesh, you've solved it for me. There's no trivial way that you can tell whether or not you've succeeded. You actually need a pretty complex molecular measurement. For us, a real key enabling technology—I don't think our approach would really have been possible until it emerged—was something called single-cell genomics. You now take a cell, rip it open, sequence all the mRNAs it's using.

At the level of individual cells, you can actually measure every gene that they're using at a given time and get this really complete picture of a cell's state, everything it's doing, lots of mutual information to other features. From that profile, you can train something like a model that discriminates young and aged cells with really high performance. It turns out there's no one gene that actually has that same characteristic. Unlike in Yamanaka's case, where a single gene on or off is an amazing binary classifier, you don't have that same feature of easy detection of success in aging. The second feature is, as you highlighted, we can't just turn these into cancer cells. Success doesn't amplify.

In some ways, the bar for a medicine is higher than what Yamanaka achieved in his laboratory discovery. You can't just have 0.001% success and then wait for the cells to grow a whole bunch in order to treat a patient's disease or make their liver younger, make their immune system younger, make their endothelium younger. You need to actually have it be fairly efficient across many cells at a time. Because of this, we don't have the same luxury Yamanaka did of taking a relatively small number of factors and finding a success case within there that was pretty low efficiency. We actually need to search a much broader portion of TF space in order to be successful.

And when you start playing that game, and you think “How many TFs are there?” Somewhere between 1000 and 2000, it depends on exactly where you draw the line. Developmental biologists love to argue about this over beer, but let's call it 2000 for now. You want to choose some combination. Let's say you guess somewhere between one and six factors might be required. The number of possible combinations is about 10^16. If you do any math on the back of a napkin, in order to just screen through all of those, you would need to do many orders of magnitude more single-cell sequencing than the entire world has done to date cumulatively across all experiments. It's just not tractable to do exhaustively.

That's where actually having models that can predict the effect of these interventions comes in. If I can do a sparse sampling, I can test a large number of these combinations. I can start to learn the relationship of what a given transcription factor is going to do to an aged cell. Is it going to make it look younger? Is it going to preserve the same type? I can learn that across combinations. I can start to learn their interaction terms. Now I can use those models to actually predict in silico for all the combinations I haven't seen, which are most likely to give me the state I want. You can actually treat that as a generative problem and start sampling and asking which of these combinations is most likely to take my cell to some target destination in state space. In our case, I want to take an old cell to a young state, but you could imagine some arbitrary mappings as well.

As you get to these more complex problems, you don't have the same features that Shinya benefited from, which were the ability to measure success really easily—you can see it with your bare eyes, you don't even need a microscope—and two, amplification, as you get into these more challenging problems. You're going to need to be able to search a larger fraction of the space to hit that higher bar.

Dwarkesh Patel 00:38:11

So we can think of these transcription factors as these basis directions, and you can get a little bit of this thing, a little bit of that thing and some combination.

And evolution has designed these transcription factors to…Is that your claim? They have relatively modular, self-contained effects that work in predictable ways with other transcription factors and so we can use that same handle to our own ends?

Jacob Kimmel 00:38:37

That would be very much my contention. One piece of evidence for this is that's the way development works. It's a crazy thing to think about, but you and I were both just a single cell. Then we were a bag of undifferentiated cells that were all exactly alike. Somehow we became humans with hundreds of different cell types all doing very different things.

When you look at how development specifies those unique fates of cells, it is through groups of these transcription factors that each identify a unique type. In many cases, the groups of transcription factors, the sets that specify very different fates, are actually pretty similar to one another. Evolution has optimized to just swap one TF in or swap one TF out of a combination and get pretty different effects. You have this sort of local change in sequence or gene set space leading to a pretty large global change in output.

Likewise, many of these TFs are duplicated in the genome. Because mutations are going to be random and they're inherently small changes at the level of sequence at a given time, evolution needs a substrate where, in order to function effectively, these small changes can give you relatively large changes in phenotype. Otherwise it would just take a very long time across evolutionary history for enough mutations to accumulate in some duplicated copy of the gene for you to evolve a new TF that does something interesting.

I think we're actually in most cases in biology—due to that evolution constraint, small edits need to lead to meaningful phenotypic changes—in a relatively favorable regime for generic, gradient-like optimizers. It would be a little bit overstating to say evolution is using the gradient, but there is a system. If you've heard of evolution strategies, where basically the way you optimize parameters is you can't take a gradient on your loss. So you make a bunch of copies of your parameters, you randomly modify them, and then you compute a gradient on your parameters against your loss, and so you can take a gradient in that space. That's how I imagine evolution is working. So you need lots of those little edits to actually lead you to have meaningful step sizes in terms of the ultimate output that you have.

Dwarkesh Patel 00:40:32

Interesting. You're just like designing a little LoRA that goes on top.

Jacob Kimmel 00:40:37

In a way. Maybe this is getting too giga-brained about it, but why does the genome even have transcription factors? What's the point? Why not just have it so every time you want a new cell type, you engineer some new cassette of genes or some new, totally de novo set of promoters or something like this?

One possible explanation for their existence, rather than just an appreciation for their presence, is that having transcription factors allows a very small number of base pair edits at the substrate of the genome to lead to very large phenotypic differences. If I break a transcription factor, I can delete a whole cell type in the body. If I retarget a transcription factor to different genes, I can dramatically change when cells respond and have hundreds of their downstream effector genes change their behavior in response to the environment.

It puts you in this regime where transcription factors are a really nice substrate to manipulate as targets for medicines. In some ways they might be evolution's levers upon the broader architecture of the genome. By pulling on those same levers that evolution has gifted us, there are probably many useful things we can engender upon biology.

Dwarkesh Patel 00:41:42

You're sort of hinting that if we analogize it to some code base, we're going to find a couple of lines that are commented out that's like, "de-aging," and then "un-hyphen" or "un-parenthesize."

Jacob Kimmel 00:41:53

I don't know about that, but I can give you a real cringe analogy that sometimes I deploy. It requires a very special audience. I think you'll probably be one who fits into it.

Dwarkesh Patel 00:42:01

You're flattering our listeners. “Only cringe listeners will appreciate it, but your audience will love this.”

Jacob Kimmel 00:42:09

I don't know about your audience, but you will. You can think about it like this. If you think about how attention works—queries, keys, values—TFs are like the queries. The genome sequences they bind to are like the keys. Genes are like the values. It turns out that that structure then allows you to very efficiently, in terms of editing space, change just one of those embedding vectors, in this case one of those sequences, and get dramatically different performances or total outputs.

So I do think it's interesting how these structures recur throughout biology, in the same way that the attention mechanism seems to exist in some neural structures. It's interesting that you can very easily see how that same sort of querying and information storage might exist in the genome.

Dwarkesh Patel 00:42:50

Interesting. A previous guest and a mutual friend, Trenton Bricken, had a paper in grad school about how the brain implements attention.

Jacob Kimmel 00:42:58

Eddie Chang has found positional encodings probably exist in humans using neuropixels, if you haven't read these papers. He implants these neuropixel probes into individuals and then he's able to talk to them, look at them as they read sentences. What he finds is that there seem to be certain representations which function as a positional encoding across sentences. They fire at a certain frequency and it just increases as the sentence goes on and then resets. It seems exactly like what we do when we train large language models.

Dwarkesh Patel 00:43:24

It's so funny the way we're going to learn how the brain works is just trying to first-principles engineer intelligence in AI. Then it just happens to be the case that each one of these things has a neural correlate.

00:44:43 – Viral vectors and other delivery mechanisms

Dwarkesh Patel 00:44:43

If you're right that transcription factors are the modality evolution has used to have complex phenotypic effects, optimize for different things...

Two-part question. One, why haven't pathogens, which have a strong interest in having complex phenotypic effects on your body, also utilized the transcription factors as the way to fuck you over and steal your resources?

Two, we've been trying to design drugs for centuries. Why aren't all the big drugs, the top-selling drugs, ones that just modulate transcription factors?

Jacob Kimmel 00:45:23

Why don't we have a million of these pills? I'll try and take those in stride. They're pretty different answers.

The first answer is that there are pathogens that utilize transcription factors as part of their life cycle. A famous example of this is HIV. HIV encodes a protein called Tat, and Tat actually activates NF-κB. HIV, to back up a little bit, is a retrovirus. It starts out as RNA, turns itself into DNA, shoves itself into the genome of your CD4+ T cells

It needs this ornate machinery to actually control when it makes more HIV and when it goes latent so it can hide and your immune system can't clear it out. This is why HIV is so pernicious. You can kill every single cell in the body that's actively making HIV with a really good drug. But then a few of them that have lingered and hunkered down just turn back on. People call this the latent reservoir.

Dwarkesh Patel 00:46:08

Similar to Hep B, right?

Jacob Kimmel 00:46:09

Hep B, Hep C, can both do this sort of latent behavior. HIV is probably the most pernicious of these. One way it does it is that this gene called Tat actually interacts with NF-κB. NF-κB is a master transcription factor within immune cells. Typically if I'm going to horribly reduce what it does, and some immunologists can crucify me later, it increases the inflammatory response of most cells. They become more likely to attack given pathogens around them on the margin. It'll turn on NF-κB activity and then use that to drive its own transcription and its own life cycle.

I can't remember quite all the details now exactly of how it works. But part of this circuitry is what allows it to—in some subset of cells where some of that upstream transcription factor machinery in the host might be deactivated—it goes latent. As long as the population of cells it's infecting always has a few that are turning off the transcription factors upstream that drive its own transcription, then HIV is able to persist in this latent reservoir within human cells. It's just one example offhand.

There are a number of other pathogens. Unfortunately, I don't have quite as much molecular detail in some of these. But they will interface with other parts of the cell that eventually result in transcription factor translocation to the nucleus and then transcription factors being active. This actually segues a little bit to your second question on why there aren’t more medicines targeting TFs.

In a way many of our medicines, ultimately downstream, are leading to changes in TF activity, but we haven't been able to directly target them due to their physical location within cells. So we go several layers upstream. If you think about how a cell works in sensing its environment, it has many receptors on the surface. It has the ability to sense mechanical tension and things like this. Ultimately, most of what these signaling pathways lead to is to tell the cell, "Use some different genes than you're using right now." That's often what's occurring. That ultimately leads to transcription factors being some of the final effectors in these signaling cascades.

A lot of the drugs we have that, for instance, inhibit a particular cytokine that might bind a receptor, or they block that receptor directly, or maybe they hit a certain signaling pathway… Ultimately, the way that they're exerting their effect is then downstream of that signaling pathway, some transcription factor is either being turned on or not turned on. You're using different genes in the cell.

We're kind of taking these crazy bank shots because we can't hit the TFs directly. That sort of begs the question, “Why can't you just go after the TF directly?” Traditionally, we use what are called small molecule drugs, where they're defined just by their size. The reason they have to be small is they need to be small enough to wiggle through the membrane of a cell and get inside.

Then you run into a challenge. If you want to actually stick a small molecule between two proteins that have a pretty big interface—meaning they've got big swaths on the side of them that all sort of line up and form a synapse with one another—then you would need a big molecule in order to inhibit that. It turns out that TF's binding DNA is a pretty darn big surface. Small molecules aren't great at disrupting that and certainly even worse at activating it. Small molecules can get all the way into the nucleus, but they can't do much once they're there. They're just too small.

The other classic modalities we have are recombinant proteins. We make a protein like a hormone in a big vat. We grow it in some Chinese hamster ovary cells, we extract it, we inject it into you. This is how, for instance, human insulin works that we make today. Or you make antibodies produced by the immune system. These run around and find proteins that have a particular sequence, they bind to it, and often they just stop it from working by glomming a big thing onto the side. Those are too big to get through the cell membrane. Then they can't actually get to a TF or do anything directly. So we take these bank shots.

What changes that today, and why I think it's pretty exciting, is we now have new nucleic acid and genetic medicines where you can, for instance, deliver RNAs to a cell that can get through using tricks like lipid nanoparticles. You wrap them in a fat bubble. It looks kind of like a cell membrane. It can fuse with a cell, and put the mRNAs in the cytosol. You can make a copy of a transcription factor there, and then it translocates to the nucleus the same way a natural one would and exerts its effect.

Likewise, there are other ways to do this using things like viral vectors, but we've only very recently actually gotten the tools we need to start addressing transcription factors as first-class targets rather than treating them as maybe some ancillary third-order thing that's going to happen.

Dwarkesh Patel 00:50:07

Interesting. So the drugs we have can't target them, but your claim is that a lot of drugs actually do work by binding to the things we actually can target and those have some effect on transcription factors.

This brings us to questions about delivery, which is the next thing I want to ask you. You mentioned lipid nanoparticles. This is what the COVID vaccines were made of. The ultimate question if we're going to work on de-aging… Even if you identify what is the right transcription factor to de-age a cell, and even if they're shared across cell types, or you figure out the right one for every single cell type, how do you get it to every single cell in the body?

Jacob Kimmel 00:50:49

How do you deliver stuff? How do you get them in there?

There are many ways one could imagine solving it. I'll narrow the scope of the problem. Delivering nucleic acid is a pretty good first-order primitive. Ultimately, the genome is nucleic acids, the RNAs that come out of it are nucleic acids. If you can get nucleic acid into a cell, you can drug pretty much anything in the genome effectively. You can reduce this problem to asking, “How do I get nucleic acids wherever I want them to any cell type very specifically?”

Today, there are two main modalities that people use, both of which have some downsides. The first one that we've touched on already is lipid nanoparticles. These are basically fat bubbles. By default, they get taken up by tissues which take up fat, like the liver. They can be used like trojan horses. They can release some arbitrary nucleic acid—usually RNA, maybe encoding your favorite genes, in our case, transcription factors—into the cell types of interest. You can play with the fats, and you can also tie stuff onto the outside of the fat. You can attach part of an antibody, for example, to make it go to different cell types in the body. The field is making a lot of progress on being able to target various different cell types with lipid nanoparticles. Even if nothing else worked for the next several decades, companies like ours would have more than enough problems to solve with the cells that we can actually target.

Another prominent way people go after this is using viral vectors. The basic idea being viruses had a lot of evolutionary history and very large population sizes. They've evolved to get into our cells. Maybe we can learn something from them, even better than Trojan horses. One type of virus people use a lot is called an AAV. Those AAVs carry DNA genomes. You can get genes, whole genes, into cells. They've got some packaging sizes. You can think of it like a very small delivery truck, so you can't put everything you want into it. They can go to certain cell types as well. On top of just where you actually get the nucleic acid to begin with, you can engineer the sequences a bit, and that basically allows you to add a NOT gate on it. You can make it turn off the nucleic acid in certain cell types, but you're never going to use the sequence engineering to get nucleic acid into cells where it didn't get delivered in the first place. You can start broad with your delivery vector and then use sequence to narrow down to make it more specific, but not the other way around.

Both of those methods are super promising. If nothing else emerged for decades, we'd still have tons and tons of problems as a therapeutic development community to solve, even using just those.

I have one very controversial opinion which people can roast me for later.

Dwarkesh Patel 00:53:04

You have just one? You're trying to solve aging and you have only one?

Jacob Kimmel 00:53:08

I have many controversial opinions. One of them is that both of these probably in the limit will not be the way that we're delivering medicines in the year 2100. If you think about viral vectors, no matter what, they're always going to be some amount of immunogenic. You're always going to have your immune system trying to fight them off. You can play tricks, you can try and cloak them, etc., but they're always going to have some toxicity risk. They also don't go everywhere. It's not that we have examples of a single viral species that infects every cell type in the body and we just need to engineer it to make it safe. We would have to also engineer the virus to go to new cell types. There's some limitations there.

LNPs likewise have some problems. They can go to tons of cell types. That's largely what we're working on. We're super excited about it. But there are some physical constraints. They just have a certain size. They have to get from your bloodstream out of your bloodstream toward a given target cell, and they have to not fuse into any of the other cells along the way. There's a whole gamut they have to run.

Ultimately, we're probably going to have to solve delivery the way that our own genome solved delivery. We have the same problem that arose during evolution. How do I patrol the body, find arbitrary signals in the environment, and then deliver some important cargo there when some set of events happens? How do I find a specific place and only near those cell types release my cargo? The problem was solved by the immune system. We have cell types in our body, T cells and B cells, which are effectively engineered by evolution to run around, invaginate whatever tissues they need to. They can climb almost anywhere in the body. There's nowhere they can't get access to, almost. Once they sense a particular set of signals—and they've got a very ornate circuitry to do this, they run basically an AND gate logic—they can release a specified payload.

Right now, the way our genome sets them up, the payload they release is largely either enzymes that will kill some cell that they're targeting or kill some pathogen, or some signal flares that call in other parts of the immune system to do the same thing. So that's super cool. But you can think about it as a modular system that evolution's already gifted us. We've got some signal and environmental recognition systems so we can find particular areas of the body that we want to find. Then we have some sort of payload delivery system. I can deliver some arbitrary set of things.

I imagine if we were to Rip Van Winkle ourselves into 2100 and wake up, the way we will be delivering these nucleic acid payloads is actually by engineering cells to do it, to perform this very ornate function. Those cells might actually live with you. You probably will get engrafted with them, and they might persist with you for many years. They deliver the medicine only when the environment within your body actually dictates that you need it. You actually won't be seeing a physician every time this medicine is active. Rather, you'll have a more ornate, responsive circuit.

The other exciting thing about cells is that they're big and they have big genomes. You actually have a large palette to encode complex infrastructure and complex circuitry. You don't need to limit yourself to the very small RNAs you can get in that might encode a gene or two, or in our case, a few transcription factors. You don't have to limit yourself to this tiny AAV genome that's only a few kilobases. You've got billions of base pairs to play with in terms of encoding all your logic. So I think that's ultimately how delivery will get solved. We've got many, many stepping stones along the way. But if I could clone myself and work on an even riskier endeavor, that's probably what I would do.

Dwarkesh Patel 00:56:12

In a way, we treat cancer this way with CAR-T therapy, right? We take the T cells out and then we tell them to go find a cancer with this receptor and kill it. Is the reason that works that the cancer cells we're trying to target are also free floating in the blood? Is that what it targets?

Basically, could this deliver to literally every single cell in the body?

Jacob Kimmel 00:56:34

Not literally every single cell. I'll asterisk it there. For example, T cells don't go into your brain. They can, but it's generally a pathology when they get in there. It's not literally every cell, but almost every cell in your body is surveilled by the immune system. There are very, very few what we call immune-privileged compartments in your body. It's things like the joints of your knees and your shoulders, your eyeball, and your brain. There might be a couple of others. The ear probably falls into that category.

A funny way of thinking about this is that all the gene-therapy people using viruses, they want to deliver to the immune-privileged compartments because their drugs are immunogenic, and they're limited to a very, very small set of diseases. In a way, it's like the shadow of all the diseases you can't address with viruses is what you can address with cells. Given the complementarity between them, you can probably cover the entire body.

They can't literally go everywhere. But your analogy to the CAR-T work is very apt as well. You can think about that two-component system. I've got some detection mechanism for the environment I want to sense to perform some function, and then I have some sort of payload that I deliver. CAR-Ts engineer the first of those and leave the second exactly the same as the immune system does. They engineer—go recognize this other antigen that you wouldn't usually target, some protein on the surface of a cell, for instance—and then deliver the payload you would usually deliver if it was infected by a virus or if you saw that it was foreign in some way. Whereas cancer cells usually don't actually look that foreign. Most of their genes are the same genes that are in your normal genome, and that's why it's hard for the immune system to surveil it.

Dwarkesh Patel 00:57:57

Interesting. Interesting. It's funny that whenever we're trying to cure infectious diseases, we just have to deal with, "Fuck, viruses have been evolving for billions of years with our oldest common ancestor, and they know exactly what they're doing, and it's so hard." Then whenever we're trying to do something else, we're like, "Fuck, the immune system has been evolving for billions of years, and it knows what it's doing, and how do we get past it?"

Jacob Kimmel 00:58:20

The Red Queen race is quite sophisticated. If you want to just throw a new tool into biology, you somehow have to get around one side of that equation.

Dwarkesh Patel 00:58:27

Given the fact that it's somewhere between impossible and very far away but it's necessary for full curing of aging, does that mean that in the short run, in the next few decades, we'll have some parts of our body which will have these amazing therapies, and then other parts which will just be stuck the way they are?

You mentioned hepatocytes are some of the cells that you're able to actually study in or deliver to. These are our liver cells. So you're saying, “Look, I can get drunk as much as I want and it's not going to have an impact on my long-run liver health because then you'll just inject me with this therapy.” But for the rest of my body, it's going to age as normal? What is the implication of the fact that the delivery seems to be lagging much behind your understanding of aging?

Jacob Kimmel 00:59:19

Just to give the delivery folks credit, they're currently ahead. There are currently no reprogramming medicines for aging, and there are medicines that deliver nucleic acids. They're still winning the race against us right now, but to your point, I hope the lines cross. I hope we outcompete them.

Even if you were able to only target some subsets of cells, it's not that you would see this strange, Frankensteinian benefit in health in some aspects and lack of benefit entirely in others. What we've found across the history of medicine is that the body's an incredibly interconnected, complex system. If you're able to rescue function even in one cell type in one tissue, you often have knock-on benefits in many places that you didn't initially anticipate.

One way we can get examples of this is through transplant experiments. Both in bone marrow and in liver, for example, we have fairly common transplant procedures that occur in humans. We can compare old humans who get livers from young people or old people and ask a pretty controlled question. What occurs as a function of just having a young liver? Is it that, for example, you can eat a lot of fatty food and drink a lot and be fine? Or is it that you see broader benefits?

The latter seems to be true. They have reduced risk of several other diseases and overall better survival as a function of having a younger liver than they do for an older one. Suggesting that because these tissues are so interconnected, many of these organs like the liver, like your adipose tissue, are endocrine organs. They're also sending out signals to many other places in your body, helping coordinate your health across multiple tissue systems. Even just one tissue can benefit other tissue systems in your body at the same time.

HSCs are another example. These are mostly examples taken from a wonderful book by Frederick Appelbaum, who trained with Don Thomas, the physician who invented human bone marrow transplants. There are many circumstances where patients got a bone marrow transplant and actually cured another disease they had as a result, maybe unanticipated, where it's even just the replacement of this one special cell type, HSCs, that has knock-on effects throughout the body. There were symptoms of these diseases that presented in myriad ways throughout their system, but ultimately, its root cause was even just a single cell.

There are counterexamples as well where you can go into animals and break even just one gene in one specific subset of T cells. You can break a gene in there that encodes for a transcription factor in their mitochondria called TFAM, and you dramatically shorten the lifespan of mice. One gene in one special type of T cells can give you that type of pathology. So it implies the inverse may also exist.

Dwarkesh Patel 01:01:50

Is this related to why Ozempic has so many downstream positive effects that seem even not totally related to its effects just on making you leaner?

Jacob Kimmel 01:02:01

I think it's one example. It is a hormone, and your endocrine system coordinates a lot of the complex interplay between your tissues. I don't think the story is fully written yet on exactly why GLP-1 and GIP-1, broadly incretin mimetic medicines like Ozempic, have so many knock-on benefits, but they're a great example of this phenomenon.

If someone told you, "I'm going to find a single molecule, and I'm going to drug it, and it's not only going to have benefits for weight loss but also for cardiovascular disease, also possibly for addictive behavior, and maybe even preventing neurodegeneration," you would have told them they were crazy. Yet, just by acting on the small number of cells in your body which are receiving this signal, the interplay and the communication between those cells and the rest of your body seems to have many of these knock-on benefits.

It's just one existence proof that very small numbers of cells in your body can have health benefits everywhere. Even if cellular delivery does not emerge by 2100, as I imagine it will, I still think that you're going to have the ability to add decades of healthy life to individuals by reprogramming the age of individual cell types and individual tissues.

Dwarkesh Patel 01:03:04

How big will the payload have to be?

Jacob Kimmel 01:03:06

How many transcription factors? I think it's just a countable number. Some of those that we've found today that have efficacy are somewhere between one and five. That's a small enough number that you can encapsulate it in current mRNA medicines. Already in the clinic today, there are medicines that deliver many different genes as RNA. There are medicines where, for instance, it's a vaccine as a combination of flu and COVID proteins, and they're delivering 20 different unique transcripts all at the same time. When you think about that already as a medicine that's being injected into people in trials, the idea of delivering just a few transcription factors is seemingly quotidian. Thankfully, I don't think we'll be limited by the size of the payloads that one can deliver.

One other really cool thing about transcription factors is that the endogenous biology is very favorable for drug development. The expression level of transcription factors in your genome relative to other genes is incredibly low. If you just look at the rank-ordered list of what are the most frequently expressed genes in the genome by the count of how many mRNAs are in the cell, transcription factors are near the bottom.

That means you don't actually need to get that many copies of a transcription factor into a cell in order to have benefits. What we've seen so far, and what I imagine will continue to play out, is that even fairly low doses of these medicines, which are well within the realm of what folks have been taking for more than a decade. They are able to induce really strong efficacy. We're hopeful that not only will the actual size of the payload in terms of number of base pairs not be limiting, but the dose shouldn't be limiting either.

Dwarkesh Patel 01:04:36

Would it have to be a chronic treatment, or could it just be a one-time dose?

Jacob Kimmel 01:04:41

In principle, it could be one time. I think that would be an overstatement for today. I can talk you through the evidence from the first principles back to the reality of what's the hardest thing we have in hand.

Epigenetic reprogramming is basically how the cell types in our bodies right now are able to adopt the identities that they have. The existence proof that those epigenetic reprogramming events can last decades is that my tongue doesn't spontaneously turn into a kidney. These epigenetic marks can persist for decades throughout a human life, or hundreds of years if you want to take the example of a bowhead whale which uses the same mechanism.

We also know that with very targeted edits, other groups have done this, folks like Luke Gilbert now at the Arc Institute, who I think of as one of the great unsung scientists of our time, have been able to make a targeted edit in a single locus and then show that you can actually make cells divide 400-plus times over multiple years in an incubator in the lab. Imagine a hothouse where you're just trying as hard as you can to break this mark down, and it can actually persist for many years. Other companies have actually now dosed some editors similar to the ones that Luke developed in his lab in monkeys and shown they last at least a couple of years.

In principle, the upper bound here is really long. You could potentially have one dose and it lasts a very long time, potentially decades, as long as it took you to age the first time maybe. We don't have data like that today. I don't want to overstate. We do have data that these positive effects can last several weeks after a dose. You could imagine, even without many leaps of faith up toward this upper bound limit of what's possible just from the data we have in hand now, that you could get doses every month, every few months and actually have really dramatic benefits that persist over time, rather than needing to get an IV every day, which might not be tractable.

01:06:22 – Synthetic transcription factors

Dwarkesh Patel 01:06:22

We've got 1600 transcription factors in the human genome. Is it worth looking at non-human TFs and seeing what effects they might have, or are they unlikely to be the right search space?

Jacob Kimmel 01:06:34

I think it's less likely. I think you have a prior that evolution has given you a reasonable basis set for navigating the states that human cells might want to occupy. In our case, we know that the state we're trying to access is encoded by some combination of these TFs. It does arise in development obviously. We're trying to make an old cell look young, not look like some Frankenstein cell that's never been seen before.

That said, we don't have any guarantees that the way aging progresses is by following the same basis set of these transcription factor programs in the genome that are encoded during development. I don't think it's unreasonable to ask, “Would your eventual ideal reprogramming medicine necessarily be a composition of the natural TFs, or would it include something like TFs from other organisms, as you posit, or even entirely synthetic transcription factors as well?”

Things like Super-SOX. Super-SOX is a particular publication from Sergiy Velychko where they mutated the SOX2 gene and they made more efficient iPSC reprogramming. They could take somatic cells and turn them into pluripotent stem cells more effectively than you could with just the canonical Yamanaka factors, which are Oct-4, Sox2, KLF4, and Myc.

iPSC reprogramming never happens in nature, so there's no reason to necessarily believe that the natural TFs are optimal.

So even really simple optimizations, like just mutagenizing one of the four Yamanaka factors we already know about or swapping some domains between a few TFs, seem to improve things dramatically. I think that's a pretty good signal that actually there's a lot of gradient to climb here and that potentially for us, the end-state products we're developing in 2100 are more like synthetic genes that have never existed, rather than just compositions of the natural set.

Dwarkesh Patel 01:08:10

What about the effects of aging? Your skin starts to sag because of the effects of gravity over the course of decades? Is that a cellular process? How would some cellular therapy deal with that?

Jacob Kimmel 01:08:23

The best evidence is that it's probably not cellular. The reason your skin sags is there's a protein in your skin called elastin, which does exactly what you'd think it would based on the name. It kind of keeps your skin elastic-y, like a waistband, and holds it to your face. You have these big polymerized fibers of elastin in your face. As far as we understand it, you only polymerize it and form a long fiber during development. Then the rest of your life, you make the individual units of the polymer, but for reasons no one as far as I can tell understands, they fail to polymerize. You can't make new long cords to hold your skin up to your face.

So the eventual solution for something like that is likely that you need to program cells to states that are extra-physiological. There might not be a cell in your body. It's not just like a young skin cell from a 20-year-old is better at making these fibers. As far as we can tell, they don't. But you could probably program a cell to be able to reinvigorate that polymerization process, to run along the fiber and repair it in places where it's damaged. Obviously these things get made during development, so it's totally physically feasible for this to occur. Maybe there's even a developmental state which would be sufficient to achieve this. I don't think anyone knows. But that would be the kind of state that one might have to engineer de novo, even if our genome doesn't necessarily encode for it explicitly.

01:09:31 – Can virtual cells break Eroom's Law?

Dwarkesh Patel 01:09:31

Interesting. Okay, what is Eroom's Law?

Jacob Kimmel 01:09:35

Eroom's Law is a funny portmanteau created by a friend of mine, Jack Scannell. He inverted the notion of Moore's Law, which is the doubling of compute density on silicon chips every few years. Moore's Law has graciously given us massive increases in compute performance over several decades. Eroom's Law is the inverse of that. In biopharma, what we're actually seeing is that there's a very consistent decrease in the number of new molecular entities, new medicines that we're able to invent, per billion dollars invested.

This trend actually starts way back in the 1950s and persists through many different technological transitions along the way. It seems to be an incredibly consistent feature of trying to make new medicines.

Dwarkesh Patel 01:10:13

In a weird way, Eroom's Law is actually very similar to the scaling laws you have in ML, where you have this very consistent logarithmic relationship. You throw in more inputs and you get consistently diminishing outputs. The difference, of course, is that this trend in ML has been used to raise exponentially more investment and to drive more hype towards AI. Whereas in biotech, modulo NewLimit's new round, it has driven down valuations, driven down excitement and energy.

With AI at least you can internalize the extra cost and the extra benefits because there's this general purpose model you're training. This year you spend $100 million training a model, next year $1 billion, the year after that, $10 billion. But it's one general purpose model, unlike, “We made money on this drug and now we're going to use that money to invest in 10 different drugs in 10 different bespoke ways.”

I was gearing up to ask you, what would a general purpose platform—where even if you had diminishing returns, at least you can have this less bespoke way of designing drugs—look like for biotech?

Jacob Kimmel 01:11:15

I'm going to slightly dodge your question first to maybe analyze something really interesting that you highlighted. You have these two phenomena: ML scaling and then scaling in terms of the cost for new drug discovery. Why is it that the patterns of investment have been so different? There are probably two key features that might explain this difference.

One is that the returns to the scaled output in the case of ML actually are expected to increase super exponentially. If you actually reach AGI, it's going to be a much larger value than just even a few logs back on the performance curve that people are following. Whereas in the life sciences thus far, each of those products we're generating further and further out on the Eroom's Law curve as time moves forward, haven't necessarily scaled in their potential revenue and their potential returns quite so much. You're seeing these increased costs not counterbalanced by increased ROI.

The other piece of it that you highlighted is that it’s unlike building a general model where potentially by making larger investments, you can be able to solve a broader addressable market, moving from solving very narrow tasks to eventually replacing large fractions of white collar intelligence. In biotech, when you're traditionally able to develop a medicine in a given indication—”I was able to treat disease X”—it doesn't necessarily engender you to be able to then treat “disease Y” more readily.

Typically where these biotech firms in general have been able to develop unique expertise is on making molecules to target particular genes, so “I'm really good at making a molecule that intervenes on gene X or gene Y.” It turns out that the ability to make those molecules more rapidly isn't actually reducing the largest risk in the process. This means that the ability to go from one or two outputs one year to then four the next is much more limited.

This brings us then to the question of what the general model would be in biology. I think it reduces down to how do you actually imbue those two properties that create the ML scaling law curve of hope and bring those over to biology so that you can take the Eroom's law curve and potentially give it the same sort of potential beneficial spin.

There are a few different versions of this you could imagine. But I'll address the first point. How do you get to a place where you're actually able to generate more revenue per medicine so that potentially the outputs you're generating are more valuable, even if each output might cost a bit more? Traditionally, when we've developed medicines, we go after fairly narrow indications, meaning diseases that fairly small numbers of people get. That's actually increased, in terms of the narrow scope of what medicines are addressing, as we've gone forward in time.

This is sort of an ironic situation where we've gone from addressing pretty broad categories of disease, like infectious disease, to narrower and narrower genetically-defined diseases that have small patient populations. Because these only affect a few people—if you think about the value function of a medicine as how many years of healthy life it gives how many people—if the “how many people” is pretty small, it just really bounds the amount of value you're able to generate.

You need to then be able to find medicines that treat most people. All of us will one day get sick and die. So arguably, the TAM for any really successful medicine could be everybody on planet Earth. We need to find a way to be able to route toward medicines that address these very large populations.

The second piece then is, how do we actually build models that enable us to take the success in one medicine we've developed and lead that to an increased probability of success on the next medicine? Traditionally, we haven't been able to do that. Maybe you're better at making an antibody for gene Y because you made one for gene X five years ago. But it turns out making an antibody isn't really the hard part of drug discovery.

Figuring out what to make an antibody to target is the hard thing about drug discovery. What gene do I intervene upon in order to actually treat a disease in a given patient? Most of the time, we just don't know. That's why even if a given drug firm becomes very good at making antibodies to gene X and they have a successful approval, when they then go to treat disease Y they don't necessarily know what gene to go after. Most of the risk is not in how to make an antibody to treat my particular target, it's in figuring out what to target in the first place.

Dwarkesh Patel 01:14:59

I'm not sure how to understand this claim that we know how to engage with the right hook, we just don't know what that hook is supposed to do in the body. I don't know if that's the way you describe it. Another claim that I've seen is that with small molecules we have this Goldilocks problem. They have to be small enough to percolate through the body and through cell walls, etc., but big enough to interfere with protein-protein interactions that transcription factors might have.

There it seems like getting the hook is the big problem.

Jacob Kimmel 01:15:32

In this particular case, if we bound ourselves to, "We must use small molecules as our modality," then there are lots of targets which are very difficult to drug.

There are many other modalities by which you can drug some of these genes. I would say–I don't have formal way of explaining this–if you were to write out a list of well-known targets that many, many folks would agree are the correct genes to go after and to try and inhibit or activate in order to treat a given set of diseases—and the only reason we don't have medicines is that we can't figure out a trick in order to be able to drug them—it's a fairly small list. It would probably fit on a single page. Whereas the number of possible indications that one could go after, and the number of possible genes that one could intervene upon especially when you consider their combinations, is astronomical.

The experiment you could run here is if you lock 10 really smart drug developers in a room. You tell them to write down some incredibly high-conviction target disease pairs where they're sure if they modulate this biology, these patients are going to benefit. All they need is some molecular hook, as you put it, in order to do this. It's a relatively short list. What you're not going to get is anything approximating the panoply of human pathologies that develop.

You can actually look for this. There are some existence proofs you can look for out in the universe. If the only problem was that we didn't have the ability to drug something using current therapeutics that we can put in humans, we should still be able to treat it in the best animal models of that disease because we can use things like transgenic systems. You can go in and you can engineer the genome of that animal. This gives you all sorts of superpowers that you don't have in patients, but allow you to, for instance, turn on arbitrarily complex groups of genes in arbitrarily specific or broad groups of cells in the organism, at any time you want, at any dose you want in the animal.

For the majority of pathologies, we just don't have many of those examples.

Dwarkesh Patel 01:17:13

What is the answer to what is the general purpose thing where every marginal discovery increases the odds you make the next discovery?

Jacob Kimmel 01:17:23

There are multiple ways one might approach this problem. The most common today, this is often what people are describing when they talk about a virtual cell. This is a very nebulous idea, sometimes numinous, if you'll let me describe it in that way as well. Concretely, what most people are trying to do is measure some number of molecules, or perceived emissions like the morphology of a cell, and then perturb it many times, turn some genes on, turn some genes off, and measure how that molecular morphological state changes.

The notion is that there's a lot of mutual information in biology. If I measure something, most commonly all the genes the cell is using at a given moment, which you can get by RNA sequencing, I get a decent enough picture of most of the other complexity going on. I can take a bunch of healthy cells and a bunch of cells that are in a diseased or aged state. I'm able then to compare those profiles and say, “Okay, my diseased cells use these genes, my healthy cells use these. Are there any interventions that I'm able to experimentally find in the lab that shift one toward the other?”

The hope would be that, because you're never going to be able to scan combinatorially all the possible groups of genes to make it concrete. There's something like 20,000 genes in the genome. You can then choose however many genes in your combination you want. It's not crazy to think of hundreds at a time. That's what transcription factors control. That's how development works. So the number of possible combinations is truly astronomical. You just can't test it all.

The hope would be that by doing some sparse sampling of those pairs—what your inputs are, here's what the cell looked like beforehand, here's the particular genes I perturbed—you have some measurement then of the state that the cell resulted in. Here's which genes went up, here's which went down. Once I've trained a model to predict from the perturbations to the output on the cell state, you can start to ask what would happen for some arbitrary combinations of genes.

Now in silico I can search all possible things that one might do and potentially discover targets that take my diseased cells back to something like healthy cells. So that's another version of what an all-encompassing model would look like where you actually have compounding returns in drug discovery.

Dwarkesh Patel 01:19:21

You basically described one of the models you guys are working on at NewLimit. You're training this model based on this data where you're taking the entire transcriptome and just labeling it based on how old that cell actually is.

If you've got all this data you're collecting on how different perturbations are having different phenotypic effects on a cell, why only record whether that effect correlates with more or less aging? Why can't you also label it with all the other effects that we might eventually care about and eventually get the full virtual cell? That's a more general purpose model, not just the one that predicts whether a cell looks old or not.

Jacob Kimmel 01:20:08

Absolutely, we actually do both today. We can train these models where the inputs are a notion of what that cell looked like at the starting place, here's what a generic old cell looked like, and then representations of the transcription factors themselves. We derive those from protein foundation models. They're language models trained on protein sequences. It turns out that gives you a really good base level understanding of biology. The model's starting from a pretty smart place.

Then you can predict a number of different targets from some learned embedding, the same way you could have multiple heads on a language model. One of those for us is actually just predicting every gene the cell is expressing. Can I just recapitulate the entire state and guess what effect these transcription factors will have on every given gene? You can think about that as an objective rather than a value judgment on the cell. I'm not asking whether or not I want this particular transcriptome. I'm just asking what it will look like.

Then we also have something more like value judgments. I believe that that transcriptome looks like a younger cell. I'm going to select on that and train ahead to predict it where I can denoise across genes and then select for younger cells. But you could do that for arbitrary numbers of additional heads. What are some other states you might want? Do I want to polarize T cells to a less inflammatory state in somebody with an autoimmune disease? Do I want to make liver cells more functional in a patient who's suffering from certain types of metabolic syndrome, maybe even orthogonal to the way that they age? Do I want to go in and change the way a neuron is functioning to a different state to treat a particular type of neurodegenerative disease? These are all questions you can ask. They're not the ones we're going after, but that is the more general, broader vision.

Dwarkesh Patel 01:21:36

This is so similar to, in LLMs, you first have imitation learning with pre-training that builds a general-purpose representation of the world. Then you do RL about a particular objective in math or coding or whatever that you care about. You are describing an extremely similar procedure where first you just learn to predict perturbations in genes to broad effects on the cell. That's the pre-training, just learning how cells work. Then there's another afterward layer of these value judgments of, “How would we have to perturb it to have effect X?” That actually seems very similar to “How do we get the base model to answer this math problem or answer this coding problem?”

I don't know if people usually put it this way, but it actually just seems extremely similar. That makes me more optimistic on this. LLMs work and RL works.

Jacob Kimmel 01:22:34

Yeah, they do. I think the conceptual analogy is very apt. We don't actually use RL at the moment, so I don't want to overstate the level of sophistication we've got. But I think the general problem reduces down in a similar way.

You can think about your earlier question of what does the general model look like that enables you to actually have compounding returns in drug discovery. You might have something like this base model, which as you said, just predicts this object function of, “How are these perturbations hitting these targets going to change which genes are turned on and off in this cell?” Then there's an entirely other task, which is, well, which genes do you want to turn on and off? What state do I want the cell to adopt?

Our lens on that is that across many different diseases people have, age is one of the strongest predictors of how they're going to progress, whether that disease arises. In many, many circumstances you have evidence in humans where you can say, “Ah, if I could make the cell younger, maybe that's not a perfect fix, but that's going to dramatically benefit not only patients who have a diagnosed disease, but it might actually help most of us stay healthier longer, even subclinically before anyone would formally say that we're sick.”

Now that's another more general function. The same way that in LLMs, you might have to create these particular RLHF environments, you need to have places where you can state a value function of the particular task that you're trying to optimize for. In drug discovery, you would then need to know, “Well, what are the cell states I want to engineer for?”

That's kind of the next generation of what a target might be. Beyond just which genes do I want to move up and down, and which gene perturbations do I put in, you then need to know what cell state am I engineering for? What do I want this T cell to do?

Dwarkesh Patel 01:24:01

You’ll have a bunch of labelers in Nigeria clicking different pictures of cells. Like, “Oh, this one looks young. This one looks old.”

Jacob Kimmel 01:24:06

“This one looks really great. I love that one.” Potentially. Potentially. It's more like developmental biologists locked in a room, as my friend Cole Trapnell would say.

Dwarkesh Patel 01:24:15

It seems like what you're describing seems quite similar to Perturb-seq. I don't know when it was done, what year was it?

Jacob Kimmel 01:24:22

There were three papers almost simultaneously in 2016.

Dwarkesh Patel 01:24:25

Okay, so almost a decade. We're still waiting, I guess, for the big breakthroughs it's supposed to cause. This is the same procedure, so why is this going to have an effect?

Jacob Kimmel 01:24:37

Why has this taken so long? Good question. The original procedure was created by a bunch of brilliant folks. There was a group in Ido Amit's lab at the Weizmann Institute, Aviv Regev's lab at the Broad, where Atray Dixit, a friend of mine, helped work on this, and then Jonathan Weissman's lab at UCSF, where Britt Adamson did a lot of the early work. They all constructed this idea where you can go in and you label a perturbation that you're delivering to a cell.

This is typically a transgenic perturbation, meaning you're integrating some new gene into the genome of a cell. That turns another gene on or off. They used CRISPR, but there's lots of ways to do it and the concept's pretty general. Then you attach on that new transgene, that new gene you put into the genome of the cell, some barcode that you can read out by DNA sequencing. Now when you rip the cells open, you're able to not only measure every gene they're using, but you also sequence these barcodes, and you know which genes you turned on and which are off. You can then start to ask questions like, “Well, I've turned on genes A, B, and C, what did it do to the rest of the cell?”

That's the general premise of the technology. It's useful to just set that up because it explains why this didn't all happen earlier. The actual readout, ripping the cells open and sequencing them, used to be pretty bad and it used to be really expensive. It's gotten much better over time. The metric people often think about here is like cost per cell to sequence. It used to be measured in dollars and now it's measured in cents, and down to the fractions of cents, because that cost curve has improved dramatically. The cost of sequencing has likewise come down. So even beyond the actual reagents necessary to rip the cell open and turn its mRNAs into DNAs that are ready for the sequencer, now the sequencer is cheaper.

The other piece is, actually getting these genes in and then figuring out which ones are there, it started out pretty bad. When we started with this technology, it was a beautiful proof of concept, but I don't think anyone would tell you it was 100% ready for prime time. When you sequenced a cell, only about 50% of the time could you even tell which perturbation you put in. Sometimes you just wouldn't detect the barcode and you'd have to throw the cell away. Or you detect the wrong barcode and now you've mislabeled your data point.

This might sound like a trivial sort of technical piece, but imagine you're running this experiment the old-fashioned way. You test different groups of genes in different test tubes on a bench. Now imagine you hired someone who every other tube labels it wrong. When you then collect data from your experiment, you basically have no idea what happened, because you've just randomized all your data labels. You wouldn't do much science. You wouldn't get very far that way.

A lot of those technologies have improved to the point where you had a number of processes which are pretty inefficient and you multiplied a lot of these things together and ended up with a very small outcome of successful cells you could actually sequence. They've all improved to the degree where now you can actually operate at scale.

Groups like ours have had to do a bunch of work in order to actually enable combinatorial perturbations, turning on more than just one gene at a time, which it turns out is much, much harder for the same reason we were just alluding to. Imagine you're having trouble figuring out which one gene you put in this cell and turned on or off. Now imagine you have to do that five times correctly in a row. Well, if you start out with the original performance where you could detect roughly 50% of them, then the fraction of cells that would be correctly labeled is like 1/2^n, where n is the number of genes you're trying to detect. Very quickly more of your data is mislabeled than is labeled.

There's lots of technical reasons like this that have gotten worked out over time. Only now are we really able to scale up where we're able to run experiments that are in the millions of cells in just a single day at, for instance, a small company like NewLimit. There was a point even just six or seven years ago where the companies that made these reagents were publishing the very first million-cell data set just as a proof of concept and only they could do it as the constructors of the technology. Now two scientists in our labs can generate that in an afternoon.

Dwarkesh Patel 01:28:02

If it actually is the case that this is very similar to the way LLM dynamics work, then once this technology is mature and you get the GPT-3 equivalent of the virtual cell, what you would expect to happen is you get many different companies, at least a couple, that are doing these cheap, Perturb-seq-like experiments and building their own virtual cells. Then they're leasing this out to other people who then have their own ideas about, "We want to see if we can come up with the labels for this particular thing we care about and test for that."

What it seems like is happening right now is, at least at NewLimit, you are like, "We know the end use case we're going after." It would be as if Cursor in 2018 was like, "We're going to build our own LLM from scratch so that we can enable our application," rather than some foundation model company being like, "We don't care what you use it for, we're going to build this."

Does that make sense? It seems like you're combining two different layers of the stack. Because nobody else is doing the other layer, you're just doing both of them. I don't know to what extent this analogy maps on.

Jacob Kimmel 01:29:18

To play with the analogy a bit, imagine that you think about NewLimit as an LLM company. If I'm going to put us in the shoes of Cursor, which oh I so wish, imagine we're trying to, in 2018, create Cursor Tab, but we're not trying to create a full LLM. I don't know enough about the underlying mechanics to know if that would have been feasible, but it's a much more feasible problem than trying to create the most recent Cursor agent or compete with modern Claude Code.

That's roughly the equivalent. The problem we're breaking off is a subset of the more general virtual cell problem. We're trying to predict, “What do groups of transcription factors do to the age of very specific types of cells?” We only work on a few cell types at NewLimit because those are some of the only cell types today with which we believe we can get really effective delivery of medicines. We think they're just more important because we can act on them today. If we solve the problem of what TFs to use, we can make a medicine pretty quickly.

In a way, we're carving out a region of this massive parameter space and saying, "If we can learn the distribution of effects even just in this small region, it's going to be really effective for us, and we can make really amazing products, unlike the world has ever seen." Over time, we can expand to the corpus of predicting every possible gene perturbation in every possible cell type.

I think that's maybe the way the analogy maps on, but it is true that we are vertically integrating here. We're generating our own data in a way that's proprietary. We think we have a much, much larger data set for this particular regime than the rest of the world combined. That enables us to build what we think are the best models.

In many cases, what we found is that unlike with LLMs, where a lot of the data that was necessary to build these was a common good—it was produced as a function of the internet and shared across everyone, it's pretty common across all the domains everyone wants to use it for—this biological data is still in its infancy. Imagine we're in the early 1980s and we are just now thinking about trying to create some of the first web pages. That's the era we're in.

We're going after and generating some of our own data in this very niche circumstance, building the very high-quality corpus, the Wikipedia that you might train your overly analogized- LLM on, and then building the first products based on that and then expanding from there. We think that's necessary because of where we are today. There isn't this Internet-like equivalent of data that everyone can go out and reap rewards from.

01:31:32 – Economic models for pharma

Dwarkesh Patel 01:31:32

Interesting. This is more a question about the broader pharma industry rather than just NewLimit. In the future, how are people going to make money? With the GLPs, we've got peptides from China that are just a gray market that people can easily consume. Presumably, with these future AI models, even if you have a patent on a molecule, finding an isomorphic molecule or an isomorphic treatment is relatively easy.

If you do come up with these crazy treatments and if pharma in general is able to come up with these crazy treatments, will they be able to make money?

Jacob Kimmel 01:32:07

The gray market piece, I'll put aside and say that’s IP enforcement at a geostrategic level that I'm not qualified to speak to. It comes down to IP enforcement effectively. For that gray market piece, another reason that the traditional pharmaceutical industry will still continue to reap the majority of rewards here is that most of the payment in the United States, which provides most of the revenue for drug discovery in the world, goes through a payment system that is not just direct-to-consumer. It goes through payers.

If you have the opportunity to either order a sketchy vial off of some website from some company in Shenzhen, or you can go through your doctor and get a prescription with a relatively low co-pay for Tirzepatide, the real thing, most patients will go for Tirzepatide. You and I probably live in a milieu of people who are much more comfortable with ordering the vials from Shenzhen than most people might be. I don't consider that to be a tremendous concern writ large.

The broader point is, if you have medicines with very long-term durability, how do you reimburse them? If the benefits are very long term and accrue in the out-years… A challenge we have in the US system is that the average person churns insurers every three to four years. That number fluctuates around, but that's the right order of magnitude. That means that if you had a medicine which dramatically reduced the cost of all other healthcare incidents, but it happened exactly five years after you got dosed with it, no insurer is technically economically incentivized to cover that.

I think there are a couple of models here that can make sense. One is something called pay-for-performance where, rather than reimbursing all of the cost of the drug upfront, you reimburse it over time. Say you get a medicine that just makes you generically healthier and you can measure the reduced rates of heart attack and reduced rates of obesity and various other things, and you get this one dose and it lasts for 10 years. Each year you would pay something like a tenth of the cost of the medicine contingent on the idea that it was actually still working for you and you had some way of measuring that.

That's a big challenge in this industry. How would you demonstrate that any one of these medicines is still working for the patient? In the few examples we have today, these are things like gene therapies where you can just measure the expression of the gene and it’s like, “Okay, the drug is still there.” It gets more complicated when you have some of these longer-term net benefits.

The idea would be that then each insurer is incentivized to just pay for the time of coverage that you're on their plan. We already have a framework for this post-Affordable Care Act in the US where pre-existing conditions no longer really exist. Patients are able to freely move between payers, and you could sort of treat the presence of one of these therapeutics lowering this patient's overall healthcare costs the same way we treat a pre-existing condition. This is something that the system is still overall figuring out. What I'm saying here is one hypothesis about what the future might look like, but there are alternative clever approaches people might think about for reimbursement.

I think over time we're going to move more toward a direct-to-consumer model for many of these medicines which preserve and promote health rather than just fixing disease. You're seeing what are really some of the most innovative examples of this right now from Lilly around the incretin mimetics, where they actually launched LillyDirect. For the first time, rather than going to a pharmacy, which interacts with a PBM, which interacts with your primary care physician… You can get a prescription from your doctor, go straight to Lilly, the source of the good stuff, and you're able to order high-quality drugs from them, and not involve some intermediary compounder in the middle that might not even make your molecules properly.

As these medicines develop that have actual consumer demand—because you feel it in your daily life and you're actually seeing a benefit from it, it's not just something that your physician is trying to get you to take—that model will start to dominate. That means that this payment over time for some of these long-term benefits might be able to be abstracted away from our current payer system where it churns every few years. A payment-over-time plan, the same way we finance other large purchases in life, seems very feasible.

Dwarkesh Patel 01:36:01

The reason I'm interested in this is that healthcare is already 20% of GDP. It's grown notable percentages in the last few years. This is a fraction that is quickly growing. The overwhelming majority of this is going towards administering treatments that have already been invented. That’s good, but nowhere near as good as spending this enormous sum of resources towards coming up with new treatments that in the future will improve the lives of people that will have these ailments.

One question is just how do we make it so that more… If we're going to spend 20% of GDP on healthcare, it should at least go towards coming up with new treatments rather than just paying nurses and doctors to keep administering stuff that kind of works now.

Two, if the cost of drugs, at least from the perspective of the payer, ends up being, you need a doctor to give you some scan before he can write you a prescription and then they need to administer it and they need to make sure that you're doing okay, etc… Even if for you to manufacture this therapy it might cost tens of dollars per patient, for the healthcare system overall, it might be tens of thousands of dollars per patient. I'm curious if you agree with those orders of magnitude.

Jacob Kimmel 01:37:29

I think that's correct. I think the stat is something like drugs are roughly 7% of healthcare spend. I could be a little bit wrong on that, but the OOM is right.

Dwarkesh Patel 01:37:36

Basically, even if we invent de-aging technology, or especially if we invent de-aging technology, how should we think about the way it will net out in the fraction of GDP that we have to spend on healthcare? Will that increase because everybody's lining up at the doctor's office to get a prescription and you gotta go into the clinic every week? Or will that decrease because the other downstream ailments from aging aren't coming about?

Jacob Kimmel 01:37:59

I think the latter is much more likely to be the case. Here are some quick heuristics. There are many reasons that healthcare costs so much in the US. One of them is something like Baumol's cost disease, which is very unrelated to pharmaceutical discoveries but is something that we will have to solve in the system. Part of it's the disintermediation of the actual customer and the actual provider. These are things that biotech probably isn't going to be able to solve as an industry alone. That's probably a larger economic problem.

But when you think about how this will affect the total amount of healthcare that will need to be delivered. If you have more of these medicines for everyone, medicines that keep you healthier longer rather than medicines that only fix a problem once you're already very sick, I think you actually avoid a lot of the types of administration costs. It’s not just administration like admins at hospitals, but the cost of administering existing medicines and therapies to you. That’s going down.

One stat on why I think that's true. Something like a third of all Medicare costs are spent in the final year of life, which is shocking when you realize that the average person on Medicare is probably a decade-plus covered by it. There's an incredible concentration of the actual expenses once someone is already terribly sick. Helping prevent you from ever having to access the intensive healthcare system, something like an inpatient hospital visit, if you can prevent even just a couple of those visits over a long period of someone's life with a medicine like an incretin mimetic, like a reprogramming medicine that keeps your liver, your immune system younger, on net that actually starts to drive healthcare spend down because you're shifting some of that burden from the administration system to the pharmaceutical system.

The pharmaceutical system is the only piece of healthcare where technology has made us more efficient. As drugs go generic, the cost of administering a given unit of healthcare is going down.

The grand social contract is that they eventually go generic. That's the way our current IP system works. So if you were to get the question of, “When would you like to be born as a patient?” you always want to be born as close to today as possible. Because for a given unit in terms of pharmaceuticals, for a given dollar unit of expense, you can access more pharmaceutical technology today than has ever been possible in history, even as healthcare costs everywhere else in the system have shot up. Pharmaceuticals are the one place where, because of the mechanism of things going generic and the fact that our old medicines continue to work and persist over time, you're able to get more benefit per dollar.

Dwarkesh Patel 01:40:21

Okay, final question. Pharma is spending billions of dollars per new drug it comes up with. Surely they have noticed that the lack of some general platform or some general model has made it more and more expensive and difficult to come up with new drugs. You say Perturb-seq has existed since 2016. As far as you can tell, you have the most amount of that kind of data which would feed into a general-purpose model.

What is the traditional pharma industry on the other coast up to? If I went to the head of R&D at Eli Lilly or Pfizer or something, do they think that this is like they have some different idea of the platform that needs to be built or they're like, “No, we're all in on the bespoke game, bespoke for each drug?”

Jacob Kimmel 01:41:06

I'll just correct one thing to make sure I'm not overstating. We have way more data for the limited subproblem we're tackling, which is overexpressing TFs in combinations. We have way more data than anyone, full stop, there. But even more specifically, I feel very, very confident we have more data than anyone looking at trying to reprogram a cell's age. That's where we're way larger than the rest of the world.

When we think about just general single-cell perturbation data of various flavors, there are other groups which have very large data sets as well. We're still differentiated because we do everything in human cells with the right number of chromosomes, whereas it's very common to do things in cancer cell lines which have 200 chromosomes. Is that human? I don't know. Depends on how you actually quantify these things.

So then, if you’re going to go ask the leaders of some of the traditional pharmaceutical firms, "Are you trying to build a general model?" I think some of them have in-house AI innovation teams that are working on this. There are really smart people there. But as a general trend, you can think about some of the modern pharmas a bit like venture capital firms. They've over time externalized a lot of their R&D. They often have divisions of external innovation, which you can think of as the corp dev version of venture capital. They work with the biotech ecosystem to have a number of smaller, nimble firms explore really pioneer ideas, the types of things we're working on, and then eventually partner with them once they have assets that are later downstream.

The industry has sort of bifurcated where smaller biotechs like ours take on most of the early discovery. I'm going to get it a little bit wrong from memory, but it's something like 70% of molecules approved in a given year come from originally small biotechs rather than large pharmas, even though you look at the actual dollars of R&D spend on the balance sheet and it's largely in big pharma.

Dwarkesh Patel 01:42:47

Another level of disintermediation.

Jacob Kimmel 01:42:52

Part of the reason for that difference in cost is they're running most of the trials. Most people partner with pharma to run trials where a lot of the costs are incurred. It's not just that all large pharmas are horribly inefficient or anything like that. Some of them would tell you, “These ideas are really exciting. We have an external innovation department, if we don't have one internally, or we're collaborating with a startup that's doing something similar.”

You can think of the market structure like you have a bunch of biotechs, which are the startups in your ecosystem, and then they're working with something like an oligopsony of pharmas. It's a limited number of buyers for this particular type of product, which is a therapeutic asset that is ready for a phase one, phase two trial. There's a very liquid market for the phase one, phase two assets, and that's the point at which these partnerships can come to fruition. That's what a lot of those leaders would say.

By contrast, for instance, Roche bought Genentech back in 2013. R&D is currently run by Aviv Regev, one of the scientists I admire most in the world, who's like a thousand times smarter than me. She's one of the people who invented this technology and has a big group doing this sort of work there. So it's not like every pharma takes that view, but that's a general trend.

Dwarkesh Patel 01:43:56

Full disclosure, I am a small angel investor in NewLimit now, but that did not influence the decision to have Jacob on. This is super fascinating. Thanks so much for coming on the podcast.

Jacob Kimmel 01:44:08

Awesome. Thanks, Dwarkesh.

Discussion about this video