Updated March 5, 2026
0:00 Welcome to the deep dive brought to you by Colaberry AI Research Labs and Carl Foundation. Today, let's unpack a challenge that honestly has historically felt a bit like trying to find a needle in a haystack, maybe blindfolded, bringing transformative new drugs to market. I I mean, we're talking about a process that classically spans ten to fifteen years. It burns through an average of, what, $2,600,000,000? That's the average. 0:23 Yeah. It's staggering. And the success rate, it's incredibly low. Less than ten percent of candidates even make it to patients. It's a process just fraught with difficulty. 0:32 Yeah. And what's really telling, I think, is where most failures happen. It's relatively late, usually during clinical development. Right. Often, it's down to unexpected safety issues or maybe even more fundamentally, the drug just doesn't show the activity you need in humans. 0:47 Which kind of tells you something isn't quite right much earlier in the pipeline, doesn't it? Like, in those preclinical stages. Finding the right target, discovering the right molecule. Exactly. That's the core issue. 0:57 Traditional methods, I mean, they've been the foundation for decades, but they really face inherent limitations here. Just think about the sheer time and cost involved in, experimental methods. Like the preclinical characterization of chemical entities or these huge high throughput screening campaigns often paired with SAR studies. Super labor intensive and expensive. Very. 1:18 Yeah. And they often just bump up against the sheer complexity of biology itself. Yeah. Diseases like cancer, for instance, they rarely boil down to hitting just one single target. Right? 1:28 You've got these incredibly intricate interconnected pathways. And the early clinical trials, the classical phase I designs like the three plus three dose escalation, they were slow. They didn't really account for how different patients can be. Not really. No. 1:41 And they forced critical decisions about safety and efficacy based on, frankly, quite limited early data. So okay. Bring AI into this picture. What changes? Well, fundamentally, you introduce the capacity to process, analyze, and actually interpret these truly massive complex high dimensional datasets, the kind of omics data we get now. 2:01 Data that traditional stats just couldn't handle. Exactly. Genomics, proteomics, transcriptomics. AI can handle that scale and complexity. And that enables a real shift, a fundamental shift towards more optimized, more data driven, and, frankly, more innovative approaches at pretty much every stage. 2:19 It's about capturing those hidden connections. Right? The nonlinear stuff, the variability that simpler models would just miss. Precisely. It potentially streamlines everything from how you process the initial data right through to predicting outcomes much more reliably. 2:33 Okay. So that's the promise. Our mission for this deep dive then is to really explore how AI is being woven into drug discovery and early clinical development. We'll lean on insights from recent analysis in the field, get specific about the technical methods, the results they're enabling. And importantly, the challenges that are still very much there. 2:51 Yeah. Because it's not magic. Definitely not. Okay. Let's start at ground zero. 2:55 Finding the right starting point, the therapeutic target, if you get that wrong. Everything downstream is, well, potentially wasted effort. You're building on shaky ground. So how is AI revolutionizing target ID, especially with those huge datasets you mentioned? Right. 3:11 The power here is its ability to ingest and integrate these massive multiomics datasets. So bringing together genomics, proteomics, transcriptomics, maybe metabolomics, all of it. Machine learning and especially deep learning algorithms are just uniquely equipped for this. They can identify key biomarkers, potential druggable targets from this incredibly high dimensional, often noisy, nonlinear data. Mhmm. 3:38 They overcome the limits of classical stats, which really struggle with that kind of complexity and sheer scale. So it's not just looking at genes or proteins in isolation. It's finding the patterns across them. Exactly right. And that naturally leads you towards network based approaches Mhmm. 3:52 Because diseases don't happen in a vacuum. They involve these intricate biological networks. AI analyzes these networks to find the critical nodes, whether it be proteins, genes, specific pathways that are central to the disease progressing. Those are your promising targets. Mhmm. 4:07 And specific deep learning models like, convolutional neural networks, CNNs, or recurrent neural networks, RNNs, they can be trained on known drug target interaction data. They've learned the really complex features that govern these relationships and then can predict entirely new interactions. So CNNs are good for patterns in structured data, maybe like molecular graphs Yeah. And RNNs for sequences. That's a good way to think about it. 4:34 Yeah. CNNs are great at pattern recognition and grid like data, while RNNs excel with sequential information, like amino acid or genetic sequences. It's like mapping the whole biological subway system, like you said, and finding those crucial transfer hubs. If you disrupt them You might halt the disease. Exactly. 4:51 Can AI go further? Can it actually suggest new molecules for these hubs? Absolutely. That's where generative models really shine. Things like generative adversarial networks or GANs. 4:59 Okay. They're powerful for generating new data that looks like the training data. So in drug discovery, you can use them to design novel molecules that are optimized to bind to specific nodes in that network. And reinforcement learning, RL models, are also fascinating here. They're really good at exploring unknown chemical spaces, sort of through trial and error. 5:18 It aligns perfectly with this, systems pharmacology idea, focusing on gene gene interactions, the whole network, not just hitting one target as hard as possible. How does RL work there? Like, rewards for good molecules? Pretty much. The RL model gets rewards for generating molecules that meet certain criteria, maybe predicted binding affinity or good ADMET properties, you know, absorption, metabolism, toxicity predictions. 5:44 And it learns iteratively getting better over time. Sounds incredibly powerful exploring chemical possibilities intelligently, but I bet implementing these isn't trivial. What are the big technical hurdles? Oh, certainly. Getting these complex models to generalize, well, what we call transferability between different datasets or tasks Mhmm. 6:02 Can be tricky. And optimizing those reward functions in RL, making sure they truly capture what makes a good drug molecule, that's really non trivial. Yeah. I can imagine. And just integrating the sheer volume of multimodal omics data requires pretty sophisticated data handling and model architectures. 6:19 And what's next? Even more data. Yeah. Integrating even more granular data, like from single cell analysis or spatial transcriptomics is a major next step. We're seeing AI like CNNs being used for automated pattern recognition in the image data from spatial transcriptomics. 6:36 So you can link where a cell is in the tissue to what genes it's expressing. Exactly. Linking spatial context to gene expression. Very powerful. Okay. 6:45 So you found a potential target node. Now you need to know if you can actually design a drug for it. That's drug ability. Right? What does that mean technically, and how does AI predict it? 6:55 Right. Drug ability. Basically asks, does this target, usually a protein, have the right biochemical and structural features for a drug like a small molecule or maybe an antibody to bind to it effectively and safely. Like, does it have a good pocket for the drug to sit in? Is it stable? 7:11 Precisely. Things like that. So supervised machine learning algorithms are trained on large datasets of known druggable and nondruggable targets. These models learn complex patterns from the target's amino acid sequence, its predicted three d structure, functional notes, all that stuff, to predict the likelihood that a new target is actually druggable. And speaking of structure, we absolutely have to talk about AlphaFold. 7:34 That felt like a real game changer in just seeing these targets. It was truly unprecedented. Yeah. Yeah. AlphaFold coming out of DeepMind used these incredibly sophisticated neural networks, and it just revolutionized protein structure prediction. 7:48 Suddenly, we had highly accurate three d structures predicted directly from amino acid sequences. The database now has, what, over 200,000,000 structures Mhmm. Pretty much all known proteins. Which is incredible. Before that, getting a structure could take years of hard experimental work, x-ray crystallography, cryo EM. 8:05 Exactly. Difficult, slow, expensive work. Now you potentially have a high quality blueprint for almost any protein almost instantly. How does that accelerate structure based drug design, SPDD? Traumatically. 8:17 Having these structures lets researchers use other AI models to identify potential binding sites, map out crucial structural features, see where proteins might interact, all on the target. That knowledge is just essential for rational SBDD campaigns. Yeah. And we now have neural network models specifically designed to find these binding pockets at different scales looking at the whole three d shape Yeah. The atoms involved, even just features on the protein surface. 8:42 Okay. So target ID, drug ability confirmed. Step one done. Step two, find or design the actual molecule. How's AI transforming that core drug discovery process? 8:53 Well, it significantly enhances virtual screening. That's a crucial early step where you computationally sift through these enormous libraries of chemical compounds looking for potential hits. Yeah. Classical computer aided drug design, CAD, methods like traditional docking simulations, they have limitations in speed and accuracy, especially when you're talking about libraries with billions of compounds. And AI helps speed that up and make it more accurate? 9:15 Yes. On both fronts. Take Ligand based virtual screening, LBVS. That relies on knowing molecules that already work. AI, especially using what's called deep QSR, with modern ways to represent molecules like chemical graphs and advanced deep learning architectures, It allows much faster, much more accurate screening of these ultra large chemical spaces. 9:38 Billions of compounds. Easily billions now. Yeah. And then in structure based virtual screening, SBVS, which uses the target's three d structure, AI improves everything, identifying the binding pockets better, classifying potential binders versus nonbinders, and, crucially, improving the scoring functions. Scoring functions. 9:56 They rank the candidates. Right? Tell you which ones are most promising. Exactly. They predict the binding affinity, essentially how tightly the molecule sticks to the target. 10:04 And getting that right is absolutely critical for prioritizing what you actually test in the lab. So better scoring means fewer duds tested experimentally. Hopefully. Yes. And these emerging deep learning based scoring functions, particularly ones using CNN models that look at the whole three d protein ligand complex. 10:22 They've shown superior performance compared to the older traditional scoring methods in predicting that true binding affinity. That's a huge step for predicting potency. Okay. Screening existing libraries is one thing. But what about designing molecules completely from scratch? 10:37 Tailoring them. That's de novo drug design, and it's probably one of the most transformative applications. Here, AI models generate entirely new molecular structures optimized for specific properties you want. Like binding affinity or maybe making sure it's easy to synthesize or has a good safety profile. All of the above. 10:55 We mentioned reinforcement learning earlier that's used for iterative refinement. The model proposes a molecule, gets feedback, the reward based on how well it scores on your target criteria, and then it learns to generate better molecules next time. Constantly improving. Right. And other generative models like JANs and variational auto encoders, VAEs, these are DL architectures that learn the underlying patterns and chemical structures they're used to create totally novel chemical space, exploring possibilities that might not exist in any current library. 11:24 It's like having an incredibly smart chemist who can just dream up perfect molecules for your specific problem. Kind of. Yeah. Uh-huh. Although a key challenge, always, is making sure these cool and silico designs are actually synthesizable in the lab and, you know, actually work biologically. 11:42 That's still a hurdle. Right. The real world check. Exactly. And once you have a promising initial candidate, a lead, it usually needs refining to improve its properties. 11:50 That's lead optimization. And AI helps there too. Definitely. For instance, AI enhances and accelerates molecular dynamics or MD simulations. MD simulations. 11:59 That's watching how the atoms move. Yeah. It simulates the physical movements of atoms and molecules over time. It's essential for understanding how a drug actually behaves in a dynamic biological environment, how it binds its stability, how it interacts with cell membranes or enzymes or transporters in the body. AI can help predict energy landscapes faster or analyze complex simulation results more efficiently. 12:22 It lets researchers simulate for longer times or look at more complex systems than they could before. And predicting how the body handles the drug. Beatty met absorption, distribution, metabolism, excretion, toxicity, that seems absolutely critical for avoiding those late stage clinical failures. Hugely critical. And AI models are proving really valuable here. 12:41 They predict ADMET properties, potential drug drug interactions, adverse reactions by analyzing molecular structures, maybe using graph neural networks and integrating data on known metabolic pathways, transporters, toxicity endpoints. AI builds these predictive models. So you can flag problematic candidates early. That's the goal. Identify candidates with likely unfavorable ADMAP profiles or toxicity issues much, much earlier in the discovery phase. 13:07 That lets you discontinue them before sinking huge amounts of money into preclinical or clinical testing. Mhmm. It directly reduces those costly late stage failures. It really sounds like AI is becoming essential for just integrating and making sense of all these incredibly diverse data types, chemical structure, bioactivity, ADMET. Which brings us nicely to multimodal models and AI enhanced QSR quantitative structure activity relationship. 13:31 Okay. Multimodal frameworks are specifically designed to integrate all these different data types, chemical structure, biological assay results, Olmics data, clinical observations, perhaps, into a more holistic predictive model, And AI fundamentally enhances QSR. QSR predicts activity from structure. Right? Right. 13:47 How does AI make that better? Traditional QSR often relied on predefined sort of hand engineered features or descriptors of molecules. AI driven QSR uses much more sophisticated machine learning algorithms, random forests, support vector machines, SVMs, and especially deep neural networks, DNNs. These models learn complex nonlinear relationships directly from large datasets of structures and their measured activities. DNNs can automatically learn hierarchical features, basically discovering the relevant molecular descriptors themselves from the raw data. 14:19 So you don't have to guess the important features beforehand? Less so. Yes. And graph based QSR is particularly powerful here. It applies things like CNNs or graph neural networks, CNNs directly to the molecular graph, you know, atoms or nodes, bonds or edges. 14:33 It naturally captures the spatial and topological information baked into the molecule structure, and that often leads to more accurate predictions of properties like binding affinity, solubility, toxicity. Can you give us a concrete example? How does this integrated AI approach look in practice? Yeah. The the review we looked at highlights the Pac Man framework and its extension Pac Man RL. 14:53 It's a good example. Pacman basically integrates molecular structures with biological data. Things like cancer cell, gene expression profiles, protein interaction networks using some clever attention based neural networks. And it predicts how sensitive a specific cancer cell line will be to a specific compound. Okay. 15:11 Predicting sensitivity. But Pac Man RL takes it a step further. It uses reinforcement learning to actually generate new personalized anticancer compounds conditioned on a specific cancer cell's transcriptomic profile. Woah. Generate new drugs tailored to one cell's genetics. 15:29 That's the idea. The model gets rewarded for generating molecules predicted to be highly effective against cells with that particular gene expression pattern. It's aiming right towards personalized therapy, right from the molecular design stage. Designing a drug not just for a disease, but tailored to an individual's unique molecular landscape, that really does feel like the frontier. It is So, okay. 15:50 You've discovered and optimized a molecule using all this AI magic. Now it has to move into testing in humans' clinical trials. How's AI impacting those critical early clinical development stages? Well, it's helping early trials move beyond just finding the maximum tolerated dose, the MTD. It's enabling the design of much more efficient, more informative clinical development strategies right from the get go. 16:14 Including maybe finding the right patients faster. Recruitment can be such a bottleneck. Absolutely. That's a huge area. AI algorithms, often using natural language processing, NLP, can analyze vast amounts of unstructured text in electronic health records, EHRs. 16:29 They can quickly assess patient eligibility against often very complex trial inclusion and exclusion criteria. This can dramatically speed up patient screening and recruitment. Finding those needles in the haystack. Exactly. Making sure suitable candidates are identified efficiently. 16:44 AI also helps optimize site selection, figuring out where the patients are, considering site capacity and performance. And can AI predict how a trial might actually play out before you even enroll the first patient? Predictive modeling is a key application. Yes. AI can simulate different trial designs. 16:59 It can predict potential outcomes, things like dose escalation behavior, lightning toxicity profiles, even early efficacy signals. It does this based on historical trial data, the parameters of the trial you're proposing, and characteristics of the target patient population. So you can sort of stress test different designs in the computer first. That's a great way to put it. You run simulations in silico and focus your resources on designs that show the highest statistical power and the best likelihood of success. 17:28 It helps reduce those painful late stage trial failures. It sounds like running the trial virtually multiple times to find the best way. Precisely. And this extends into protocol optimization too. AI tools can simulate different scenarios to fine tune the details, dosage regimens, treatment duration, specific patient subgroups within the trial protocol. 17:48 And adaptive trials, making changes on the fly. AI strongly supports adaptive trial designs. It enables that real time analysis of accumulating data during the trial, allowing for pre planned adjustments. Maybe you need to adjust the sample size or drop an ineffective treatment arm. It makes trials more efficient. 18:05 It's also a key enabler for decentralized clinical trials, DCTs, helping process all the data coming in from remote patient monitoring devices. Now you hear about some really almost futuristic concepts, synthetic control arms, digital twins. Yeah. These are significant innovations really pushing the envelope. Synthetic control arms or SCAs are largely enabled by AI and having access to to large real world data sources, EHRs, insurance claims data. 18:34 Instead of randomizing patients to a placebo, AI uses sophisticated matching and modeling techniques to simulate a control group based on historical data from similar patients treated in the past. Why do that? It addresses ethical concerns, especially in diseases where giving a bazebo isn't appropriate. It tackles logistical hurdles, and it can significantly reduce trial costs and timelines. Okay. 18:53 That makes sense. And digital twins. Digital twins take it even further. The ambition is to create detailed virtual replicas of individual patients or maybe patient groups using all their multimodal data genomic clinical imaging. The idea is you could then test treatments in silico on the virtual twin before administering them to the real patient or use them to optimize trial designs for specific populations. 19:17 A virtual patient to test on first. Yeah. The implications for personalized medicine trial efficiency, it's enormous. How does this all tie into that broader goal of precision medicine? AI is absolutely fundamental to precision medicine. 19:32 It's key for identifying biomarkers, genetic markers, protein levels, imaging features that predict how a patient will respond to a specific therapy. So it's predictive biomarkers. It helps distinguish them from prognostic biomarkers, which just tell you about the likely course of the disease. Right. Machine learning algorithms analyze all that complex data, genomic, proteomic, clinical, to find patterns linked to differential drug response. 19:54 Yeah. That's what enables truly personalized treatment strategies. Any exam? Sure. AI applied to analyzing histology images, pathology slides can help identify the likely origin of a metastatic cancer when the primary site is unknown. 20:07 That's crucial for guiding treatment. And we're seeing explainable AI methods often using NLP to analyze doctor's notes alongside trial criteria to improve patient matching for complex trials, like phase I oncology studies where eligibility can be very nuanced. And how are the regulators viewing all this? Are these AI approaches aligning with what bodies like the FDA wanna see? There's actually good alignment with certain initiatives. 20:32 Look at the FDA's project optimus in oncology. That's pushing for better dose optimization early on. Moving beyond just finding the maximum tolerated dose, the MTD. Towards finding the optimal dose. Exactly. 20:43 The optimal biological dose, the OBD. The dose that gives the best balance of efficacy and safety. AI really supports the intensive data analysis you need to characterize those dose response relationships and safety profiles much more thoroughly right from the early stages. Okay. So AI is touching pretty much every critical step here, from the very beginning, finding the target, all the way to refining trials and personalizing treatment, limitations, the challenges AI can't fix, at least right now? 21:14 Absolutely. It's a powerful tool, but definitely not a panacea. There are significant challenges. Computational resources are one thing, but maybe more fundamentally, it's about data quality, ethical considerations, and fitting into the established regulatory pathways. Data quality seems like it would be a huge one. 21:31 Garbage in, garbage out. It's paramount. Mhmm. And the major source of potential problems is bias. AI models are only as good as the data they learn from. 21:40 If your training datasets are unrepresentative, maybe historical trials mostly enrolled certain demographics, lack diversity, then the models trained on that data can inherit and even amplify those biases. Meaning the predictions might not work well for everyone. Exactly. They might be less accurate or less generalizable, especially for underrepresented patient groups. And just relying on historical data means dealing with stuff that might be incomplete, inconsistent, collected differently over time. 22:08 It's messy. I've heard about synthetic data, SD generation. Is that a potential fix for some of the data imbalance issues? How does it work? What are the risks? 22:17 Yeah. SD generation uses AI models, GMs, VAEs, transformers, diffusion models to create artificial data that statistically mimics real world data. It can be useful for augmenting datasets, maybe creating more data points for rare conditions or underrepresented groups to help balance the training data. Examples. You see things like CSC for generating single cell RNA sec data, DeepNovo for synthetic peptide spectra and proteomics, DeepDoc using synthetic interaction data for binding prediction, like like that. 22:47 Mhmm. But there's a definite risk of overfitting. If this synthetic data is too perfect to idealize, if it doesn't capture the real world messiness and variability Then the model trained on it might look great on paper, but fail in the real world. Precisely. It might not generalize well when it sees actual noisy patient data. 23:05 Ultimately, whether your data is real or synthetic, the AI model's output depends critically on the sensitivity and specificity of the input. High quality input is nonnegotiable. Makes sense. Beyond data and computation, what about the ethical and regulatory side? Big considerations there. 23:22 Using large amounts of patient data, even anonymized, for AI training raises significant privacy concerns. You have to navigate regulations like HIPAA in The US, GDPR in Europe very carefully. Mhmm. And then there's the regulatory pathway itself. The way AI models learn and iterate making predictions, it doesn't always fit neatly into the established process for drug approval. 23:43 Regulators need substantial verifiable evidence of safety and efficacy from rigorous trials. Bridging that gap between AI's predictions and regulatory requirements sounds challenging. It is. Gaining regulatory confidence in AI drive insights and decisions is definitely an ongoing process. Okay. 23:58 Thinking about the core science, are there fundamental limits? Things AI just can't fix right now no matter how smart the algorithm. Yes. And this is crucial. AI cannot fix fundamental problems with the underlying preclinical models themselves. 24:13 The lab models. Exactly. If the in vitro cell lines you're using or the animal models or the tissue assays, if they don't accurately reflect the complex biology of human disease, especially really complex heterogeneous diseases like many cancers, then even if a drug looks fantastic in those models, it's still likely to fail when it gets into human clinical trials. AI can help analyze the data from those models better, maybe even help select better models. But it can't make a bad model relevant. 24:41 It can't magically imbue a flawed preclinical model with clinical relevance it simply doesn't possess. That remains a huge bottleneck in the whole process, the model problem. That's a really critical reminder. The garbage in, garbage out principle applies to the biology too, not just the data. So looking ahead then, wrapping this up, what's the path forward for AI in this space? 25:01 Well, it's pretty clear. AI offers really profound advantages across the whole pipeline. It tackles inefficiencies, boosts our ability to find novel targets, helps optimize molecules, and fundamentally improves how we design and run early clinical trials. It's a key driver for precision medicine. But it's not just about buying some AI software, is it? 25:20 Not at all. Successfully integrating these sophisticated AI tools and methods into the existing r and d processes, that's a huge challenge. It requires significant changes in infrastructure, in workflows, and honestly, in mindset within pharma and biotech. So what's needed to make that integration work, to make it truly transformative? Collaboration and interdisciplinary expertise. 25:43 That's absolutely essential. You need these multitasker teams as the review called them. Chemists, biologists, clinicians, data scientists, AI specialists, all working together seamlessly. Breaking down silos. Exactly. 25:55 And going back to the limitations, it all hinges on generating, managing, and utilizing high quality, diverse, robust data across that entire pipeline. It's really that combination integrated AI tools, diverse expert teams, and top notch data that holds the promise to truly accelerate and derisk drug development, and ultimately get better treatments to patients faster. Okay. Final thought then. Given the sheer speed we're seeing, AlphaFold predicting structures we couldn't before, digital twins simulating trials, how does this fundamentally change the nature of scientific discovery itself? 26:29 And what new skills, what new approaches are gonna be most critical for researchers and clinicians trying to navigate this landscape as it keeps evolving so rapidly? Thank you for listening in. Subscribe and follow Colaberry and CRL on social media links in the description, and check out our website www.colaberry.ai backslash podcast for more insights like this.