Updated March 5, 2026
0:00 Welcome to Colaberry AI podcast brought to you by Colaberry AI Research Labs and CARL Foundation. Welcome to the deep dive where we plunge into the most critical insights from cutting edge research. Today, we're taking a deep dive into the accelerating impact of artificial intelligence in dermatology. It's a field, really ripe for this kind of disruption. Given how much dermatology relies on visual pattern recognition, AI's capabilities and image analysis are just such a natural fit. 0:26 Our journey today is anchored in a pretty comprehensive systematic review. It's gonna guide us through the technical methods, the real world clinical applications, and also the regulatory landscape shaping this, well, this rapidly evolving domain. Exactly. And our mission for this deep dive is really to extract the most important insights and maybe more importantly, the practical implications of AI in dermatological care. We'll be focusing specifically on the technical methods being used, the results people are actually getting, and, critically, what things to try are emerging from the latest research. 0:59 For both practitioners and, you know, developers, we're here to understand not just what AI can do, but how it functions and, yeah, the challenges that still remain in its safe and effective deployment. Alright. Let's unpack this then. AI's presence in health care isn't, like, totally new. Right? 1:13 It goes back to the fifties apparently with a bigger presence since the eighties through neural networks. But why dermatology specifically? Why is it such an ideal candidate, as you say, for AI assisted diagnosis today? It really is a perfect match. I mean, think about it. 1:27 The field is so inherently visual. Dermatology relies heavily on identifying very specific morphological characteristics, visual patterns on the skin. You're looking at lesions, rashes, other conditions, all presenting with distinct visual cues. So this reliance on image based diagnosis makes it, well, uniquely suited for AI's computer vision capabilities. Back in the early two thousands, we saw the first explorations of machine learning, especially artificial neural networks, ANNs, for tasks like distinguishing benign versus malignant pigmented lesions. 1:58 But the, the really significant leap came around 2006. That's when deep learning really entered the picture. This allowed computational models, particularly convolutional neural networks, CNNs, to learn incredibly intricate patterns directly from huge image datasets. You didn't need manual feature engineering anymore. And since about 2016, the growth in deep learning research within dermatology has just been, well, exponential. 2:19 It's fundamentally reshaping the landscape. And it's not just stuck in research labs, is it? We're actually seeing AI in tools that patients themselves can use. Can you give us a sense of that reach? Oh, absolutely. 2:30 The review highlights examples like patient self checking applications. Quite striking, actually. One case showed nearly a million AI dermatologist users performing millions of online checks, and that led to identifying tens of thousands of skin diseases. Wow. Yeah. 2:46 It really illustrates the immense potential for AI to increase accessibility at least for preliminary dermatological screening. That is a staggering reach. So given that kind of practical impact, let's dig into the core technical methods. How are these systems actually built? What are the fundamental machine learning approaches at play here? 3:05 Okay. Okay. So to get under the hood, we start with the basics of machine learning. Broadly, you can think of three main streams. The most common one in clinical medicine is supervised learning. 3:14 Right. Here, the AI learns from data that's been meticulously labeled. Imagine giving the AI thousands of skin lesion images, each one precisely tagged by a dermatologist, melanoma, benign, etcetera. The model learns to map images to those labels. A typical example might be a study titled something like skin disease detection using deep learning. 3:33 Pretty straightforward concept. Label data. Got it. What else? Then there's unsupervised learning. 3:39 Here, the model looks for patterns in data without any predefined labels. So it might group similar looking skin lesions based purely on their visual characteristics. This can be really useful for discovering maybe new classifications or phenotypes. There was a study mentioned characterizing epidermal necrolysis using this approach. Interesting. 3:57 Finding patterns we didn't predefine. And the third? The third is reinforcement learning. This is a bit more complex. You have an AI agent that learns through trial and error, essentially. 4:07 It gets rewards or punishments based on its actions, guiding it towards optimal strategies over time. While it's less common right now in dermatology diagnostics, it holds, well, significant promise for things like long term diagnostic support or treatment planning systems that adapt based on outcomes. Okay. Supervised, unsupervised reinforcement. And within that, you mentioned deep learning as the big leap. 4:31 What makes it so uniquely powerful for images in this field? Right. So deep learning is a specific subset of machine learning built on those artificial neural networks, ANNs. What makes deep neural networks DNNs different is their depth. They have many, many hidden layers. 4:47 We're talking hundreds, sometimes thousands in practice. Wow. Okay. And this layered structure lets them learn what we call hierarchical feature representations directly from the raw image pixels. So they automatically figure out simple things like edges first, then textures, and then more complex structures all the way up to intricate lesion patterns. 5:07 That ability is why DNNs and especially these convolutional neural networks, CNNs, have become the workhorse for advanced skin image analysis. Makes sense. The trade off, though, is computational cost. Training these models takes a lot of power. Usually, you need specialized hardware like GPUs, and it can take days, sometimes weeks to train a complex model. 5:27 Right. Intensive stuff. So if CNN's other workhorse is tailor made for computer vision, what are their specific superpowers in dermatology? Where do they really shine in practice? They really are the engine here. 5:38 Their core application, the foundation really, is image classification, just telling different types of skin lesions apart. Okay. Basic categorization. But it goes beyond that. They're also great at object detection. 5:50 Models like, YOLO or SSD, they can precisely locate and draw bounding boxes around lesions within a larger image. So they pinpoint the area of concern. So finding the spot. Exactly. And even more granular is image segmentation. 6:07 This is where the AI outlines the exact region of the lesion, pixel by pixel, separating it from the surrounding healthy skin. Architectures like U Net are particularly good at this. Pixel perfect outlines. Precisely. And another really significant development, maybe a key thing to try for boosting performance, especially when you don't have enough data or the data is skewed, is using generative adversarial networks or JANs. 6:29 JANs. How do they help? We'll take StyleGAN two for instance. It can generate remarkably realistic synthetic images of things like melanocytic nivi. This is incredibly valuable when your real dataset lacks enough examples of certain rare but really important lesion types. 6:44 So you can make more data for the rare stuff. Essentially, yes, to augment the training set. And similarly, another type, CycleGAN can do things like translate a regular clinical photo into a dermatoscopic style image or vice versa. This helps make the main diagnostic model better at handling images from different sources or devices, makes it more robust, generalizable. That's fascinating using AI to generate its own training data, essentially. 7:11 No. So what's really cutting edge? Looking at the last couple years, maybe 2023 to now, what new architectures or approaches are pushing the boundaries? Yeah. The field moves fast. 7:20 We're definitely seeing some exciting progress. For instance, improved segmentation models. People are taking established architectures like UNet and modifying them, say, using advanced encoders like EfficientNet b three. These have shown pretty significant jumps in segmentation accuracy, especially for detailed histopathological images, sometimes boosting combined accuracy by nearly 10 percentage points. A big jump. 7:42 Definitely. Another trend is hybrid architectures. Combining models like Resunet, which is good for segmentation, with optimization algorithms. There was an example using ant colony optimization to fine tune the model's parameters. That approach achieved really high accuracy and dice coefficients, meaning great segmentation and classification. 8:02 So mixing and matching techniques. Exactly. We're also seeing the rise of CNN transformer hybrids, combining the strengths of, say, a ResNet 50, which is a classic CNN Mhmm. With a vision transformer or VIT component. These hybrids have shown they can outperform earlier classifiers on dermoscopic images, especially when you use clever techniques like focal loss to handle those tricky imbalanced datasets where you have way more benign than malignant samples. 8:26 Transformers are popping up everywhere now. They are. And generative models keep evolving too. There's Efficient Jan developed in 2023, which can generate really high quality skin lesion segmentation masks. It even has lightweight mobile versions that perform surprisingly well. 8:42 These really represent the leading edge. Okay. So we've got these powerful, evolving tools. Let's shift to the clinical process. How does this automated decision making actually work in practice? 8:53 Can you walk us through that journey from snapping a picture to potentially getting an AI assisted diagnosis? Sure. It typically starts, obviously, with digital image acquisition. This usually means either dermoscopic images, those are the ones taken with special magnifying devices for lesions, or dermatopathologic images, which are the high res scans of histology slides from biopsies. The way these images are represented digitally, including different color models like RGB or Scilab, is actually quite important because color is such a key diagnostic feature in dermatology. 9:27 Right. The color tells you a lot. Absolutely. Then comes dermoscopy preprocessing. This is a really critical sequence of steps to clean up the image and optimize it for the AI. 9:36 It involves things like checking image quality, trying to correct for a variable lighting using gamma correction or other algorithms, maybe transforming the image into a different color space to make the lesion stand out more. Like enhancing the contrast. Yeah. Contrast correction is common too. And, crucially, there's often a dedicated hair removal step. 9:55 Hair can really obscure lesions and throw off the analysis, so digitally removing it is often necessary. I can imagine. For dermapathology, the histology slides are usually much more structured and detailed, so they tend to need less preprocessing. Okay. So image cleaned up, maybe hair removed. 10:11 What next? Then we move to the inference and decision stage. This might involve some postprocessing, like merging small regions together or smoothing boundaries. Then the AI measures key characteristics of the lesion, its structure, colors, geometric features like perimeter, area, that sort of thing. Extracting the important features. 10:30 Exactly. And, finally, these processed images and extracted features are fed into the AI models we talked about, ANNs, SVMs, decision trees, or often directly into the CNNs to get a diagnostic suggestion. Increasingly, people use ensemble classifiers. That means combining the predictions from several different models to get a more robust and hopefully more accurate final assessment. Combining strengths. 10:52 Makes sense. So with all this processing and modeling, how do clinicians and developers actually know if these AI systems are any good? Are they reliable? What are the key technical metrics you use to measure that? Yes. 11:06 The classification quality measures. These are absolutely vital for objectively assessing how well the model performs compared to the ground truth, which is usually the diagnosis from expert dermatologists. It all starts with the confusion matrix. It's a simple table that categorizes every prediction the AI makes. K. 11:23 You have true positives, t p, the AI correctly identified a condition, like calling a melanoma melanoma, False positives, FP. The AI flagged something incorrectly, like calling a benign mole melanoma. That's a type y error. Right. A false alarm. 11:38 Then false negatives, FM. This is the dangerous one. Where the AI missed a condition, like calling an actual melanoma benign, a type two error. And finally, true negatives, TN. The AI correctly identified something as not the condition, like calling a benign mole benign. 11:53 TPFP, FN TN. Got it. From that matrix, we derive the key metrics. Sensitivity, also called recall or true positive rate, measures how well the model finds all the actual positive cases. You want high sensitivity for dangerous conditions like melanoma. 12:08 Don't wanna miss any. Exactly. Specificity or true negative rate measures how well the model correctly identifies the negative cases. High specificity avoids unnecessary biopsies or anxiety from false positives. Then there's positive predictive value, PPV, or precision of all the times the AI predicted positive, how many were actually positive, and negative predictive value, NPV, of all the times it predicted negative, how many were actually negative? 12:36 And, of course, overall accuracy, what percentage of all decisions were correct? Seems straightforward enough. Well, there's a catch. Accuracy alone can be misleading, especially in medicine where datasets are often imbalanced. You might have way, way more benign moles than melanomas. 12:50 Ah, right. So high accuracy could just mean it's good at spotting the common benign stuff. Precisely. That's why we rely heavily on metrics like balanced accuracy, BA, which averages the accuracy for each class separately, giving equal weight to rare and common conditions. And the f one score, which is a harmonic mean of precision, DPV, and recall sensitivity, also good for imbalance. 13:14 Finally, a really important one is the area under the ARFOC curve, AUC. The ARFOC curve plots sensitivity against one specificity at various thresholds. The AUC gives a single number summarizing the model's overall ability to discriminate between positive and negative cases across all thresholds. Bigger AUC is generally better. Okay. 13:35 AUC is a key one to look for then. It seems understanding these metrics properly is crucial for anyone using or evaluating these tools. Absolutely crucial. You can't interpret the results meaningfully without grasping what sensitivity, specificity, AUC, and these other measures actually tell you about the model's performance in a clinical context. Okay. 13:51 So the tech is sophisticated. The metrics are complex. But what about putting these tools into actual clinics? What's the real world regulatory situation? Are there guardrails? 13:59 That's a huge piece of the puzzle. Regulations on AI in medicine are evolving rapidly, but they are definitely critical. In the European Union, for example, under the new AI act, diagnostic AI systems like these are typically classified as high risk. That means they face stringent requirements for evaluation, documentation, safety, and effectiveness before they can be marketed. High risk makes sense. 14:22 What about The US? In The US, the FDA regulates AI primarily as software as a medical device or SAM D. They often use the five ten k premarket notification pathway. Manufacturers have to demonstrate that their device is substantially equivalent to an existing legal device, and, crucially, they need processes for managing updates, ensuring safety, improving effectiveness. So similar goals, different frameworks. 14:45 Are there challenges in applying these rules? Oh, definitely. Implementation isn't always smooth. The review mentions cases like a UK teledermatology app that got suspended because it shared data without proper consent. Or in Germany, some AI tools had to be temporarily withdrawn pending audits under the new AI Act rules. 15:05 Even in The US, the FDA has scrutinized apps, for instance, a melanoma risk assessment tool, over a lack of transparency in how it worked. It highlights this ongoing tension between innovation and ensuring patient safety and data privacy. So it's still a work in progress? Very much so. But there are milestones, like the FDA's approval in January 2024 of Dermasensor. 15:26 That was landmark the first AI skin cancer detection device cleared for nonspecialists use like primary care physicians. Other than big. It is. It validates the potential, but also really underscores the need for continued vigilance and adaptation and how we regulate these powerful technologies. Right. 15:41 And given these challenges, how are the AI models themselves verified? How do we know they're robust, especially against subtle tricks or manipulations? That brings us to model verification, which includes dealing with things like adversarial attacks. Adversarial attacks. Sounds ominous. 15:56 Well, they can be. These are deliberate, often tiny calculated changes made to an input image, maybe adding a specific pattern of noise to a photo of a mole that's totally imperceptible to a human eye. Okay. But that tiny change can completely fool the AI model, causing it to misclassify the image. For example, it might make the AI label a malignant lesion as benign. 16:19 Methods like FGSM or PGD are used to generate these attacks. Wow. That's a serious vulnerability. It is. But these attacks have a dual role. 16:28 Yes. They represent a potential security risk that needs to be addressed, but they are also used proactively by developers for robustness testing. You intentionally attack your own model to find its weaknesses and then try to make it more resilient. Testing its limits. What about understanding why the AI makes a certain decision? 16:45 You can't just have a black box in medicine. Exactly. That's where explainability or xAI comes in. It's all about making the AI's decision making process transparent and understandable to humans. Why is that so crucial? 16:57 Several reasons. It builds trust. Doctors and patients are more likely to accept AI recommendations if they understand the reasoning. It allows clinicians to potentially spot errors if the AI seems to be focusing on the wrong thing. It helps meet regulatory requirements, both the FDA and the EU AI act emphasize transparency, and it can help identify and mitigate biases in the model. 17:18 Makes sense. How do you achieve explainability? What are the methods? There are several popular approaches. One common technique is generating attention heat maps, like using Grad Cam. 17:28 These visually highlight the specific areas within the image that the AI model paid the most attention to when making its classification. So you can see if it's focusing on the actual lesion or something irrelevant. Like a visual clue to its thinking. Precisely. There are also model agnostic methods like Lyme or ESHE. 17:47 These work by analyzing how changing different input features affects the AI's outbound, giving you a sense of which features were most influential. There was even a clinical trial mentioned where providing XAI explanations actually improved dermatologists' accuracy in diagnosing melanoma by about two point eight percentage points, and it increased their confidence in the diagnosis. So it demonstrably helps the human user? It does. The downside is that implementing XAI can add computational overhead, and there aren't really uniform standards yet for what constitutes a good explanation, but its importance for clinical adoption is undeniable. 18:23 Okay. After all this tech, regulation, and explainability Yeah. What about the doctors? How do they actually interact with these AI tools day to day? Is it AI versus doctor or AI with doctor? 18:33 The overwhelming evidence points towards collaboration. AI as a supporting tool, absolutely not a replacement. Systematic reviews and meta analyses consistently show that while some CNNs can achieve diagnostic performance sensitivity, specificity that's comparable, and even slightly higher than individual dermatologists on specific tasks Okay. The real power comes from human AI interaction. The studies find that AI support consistently improves the diagnostic accuracy of clinicians, especially those with less experience. 19:01 For instance, one study showed diagnostic rates improving for both dermatologists and residents when they used an AI decision support tool. The goal is clearly augmentation enhancing the doctor's capabilities, not automation. That collaborative vision feels like a really key takeaway. So let's bring it down to earth. Where is AI making the biggest waves right now in actual clinical practice? 19:25 And looking ahead, what are the most exciting things to try for researchers and practitioners wanting to move this field forward? Right. So in terms of current impact, AI is really enhancing precision and efficiency in several core areas. Detecting skin cancer, obviously, is a major one, but also analyzing chronic conditions like ulcers or psoriasis and even helping select optimal treatments or assisting in surgical robotics. Okay. 19:48 Cancer detection is a big one. Any specific examples? Yeah. In melanoma detection, numerous studies compare AI, usually CNNs, to human experts. You see AI models achieving high AUCs, maybe point eight seven or higher, distinguishing melanoma from nevi and other lesions. 20:04 Some studies show CNNs with higher specificity than dermatologists on average, meaning fewer false positives. And, critically, as we just discussed, other studies demonstrate AI improving physician accuracy when used as an assistant. Sometimes it picks up on subtle image variations like slight zooms or rotations that a human might easily overlook, but which can impact an AI's consistency. Interesting. What about outside this specialist clinic? 20:27 Mobile apps, telemedicine? That's another huge area. Mobile and telemedicine apps are leveraging AI to broaden access. You have symptom checking apps like ISA showing decent sensitivity. Tools like Panderm have even reported outperforming clinicians in early melanoma detection in some studies and significantly improving doctor accuracy when used collaboratively. 20:47 Other models provide hierarchical results like likely benign versus likely malignant with high sensitivity, and general smartphone apps like SkinVision claim very high sensitivity around ninety five percent for detecting potential skin cancers, empowering users for self assessment. So AI is definitely out there in various forms. Now for the people building and using these systems, the researchers, the clinicians, what are the most important tactical takeaways and specific recommendations? What are the key things to try based on this systematic review? Okay. 21:18 This is where we get really practical for those in the field. Let's distill the actionable insights. First, some key observations. CNNs are still dominant. No question. 21:26 Architectures like ResNet, DenseNet, often used with transfer learning, are widespread for diagnostic imaging. Still the workhorses. But there's a clear emergence of transformer based models. Things like SWIN transformer, various vision transformer flavors, they're hitting state of the art accuracy levels. The caveat is they typically need more computational power and more data. 21:45 Ensemble learning is definitely a trend worth noting. Combining predictions from multiple models, like the Skynet example fusing MobileNet v two, ResNet 18, and VGG 11, seems to consistently boost performance, getting impressive AECs. It shows complimentary architectures work well together. So teamwork makes the dream work even for AIs. You could say that. 22:06 Also, the growing role of data augmentation is clear. Both traditional methods, rotation, scaling, and using JAN generated images from a Style Machine or CycleGAN are consistently shown to improve accuracy often by three to 7%. This is crucial for tackling data scarcity and imbalance. Despite that, persistent class imbalance remains a major challenge. Data sets just have far fewer examples of rare conditions like melanoma compared to common ones like nevi. 22:31 Specialized strategies like those used in a model called DSCC net are needed to really push up AUC insensitivity for those minority classes. Still need to focus on the rare but critical cases. Absolutely. Two other observations. A general lack of standardization in how different studies evaluate their models makes direct comparisons difficult. 22:52 And as we discussed, limited interpretability in many published models is still a barrier to clinical trust. Okay. Those are the observations. What about the concrete recommendations, the things to try? Right. 23:03 The actionable stuff for architecture selection. If you're focusing on melanoma classification, the review suggests DenseNet one twenty one is often a good choice using transfer learning. It balances network depth with avoiding overfitting. For precise skin lesion segmentation, especially if data is limited, attention unit variants are recommended for their ability to capture fine boundary details. Okay. 23:26 If you're tackling large scale classifications, say, over 10,000 images and you have the computational resources, then vision transformers like Swin Transformer are definitely worth trying for potentially higher accuracy. But be aware, they are resource hungry. Go big with transformers if you can. And if your goal is maximizing overall diagnostic performance, building ensembles of complementary CNN architecture similar to that SCINET approach seems to be a very effective strategy. Combine different strengths. 23:53 What else? Strongly recommend standardizing experimental protocols. Always use k fold cross validation for robust evaluation, and benchmark your models on publicly available standard datasets like ISIC or Ham 1,000. That's essential for results to be comparable across the field. Use the common yardsticks. 24:12 Exactly. For data enrichment, the advice is to combine traditional augmentation with GANs like style n two or CycleGAN to generate synthetic images, especially for those rare lesion types. But always verify their impact. Make sure you're boosting sensitivity without sending false positives through the roof. Augment wisely. 24:29 And critically, ensure interpretability. The recommendation is to publish saliency maps, Grad Cam, Lime, alongside your model results. Show that your network is focusing on the right spots. Consider using tools like SHP for deeper feature analysis. It builds confidence. 24:45 Show your work, basically. Pretty much. And finally, keep up with regular bibliometric and statistical analysis. Track trends, see what new architectures are emerging, and always use proper statistical tests like ANOVA when comparing performance between different models to ensure the differences are actually significant. Stay current and be rigorous. 25:04 That's a very solid set of directives. So bringing it all together, these comparative analyses of architectures and datasets, what do they really tell us about the trade offs? It really boils down to strategic choices based on your specific needs and resources. When you compare architectures, yes, models like the separable vision transformer or SWIN transformer might achieve the absolute highest accuracy getting into the mid nineties percent range, but they need serious compute power. High performance, high cost. 25:32 Right. Then you have models like DSCC net that maybe didn't have the top accuracy, but achieved an outstanding AUC, nearly point nine nine five because they specifically tackled class imbalance so well. That's vital if detecting that rare class is paramount. And the SkinNet Ensemble showed you can get great results in AOC around point nine six just by cleverly combining several lighter CNNs. So the key thing to try is balancing that accuracy target against your resource constraints and your specific clinical goal. 25:59 Is it peak accuracy or robust detection of rare events? Got it. And the datasets, ISAAC, h a m 1,000, BCN two of a thousand. Comparing them is also revealing. ISAAC is huge, but suffers from significant class imbalance and lacks diversity in skin prototypes. 26:15 HEM 1,000 is widely used, but also imbalanced and have limited clinical variety, not many images from tricky locations like palms or soles. VCN 20,000 is newer and tries to address some of this, including lesions from difficult anatomical sites and more pigmented lesion types with good annotations. But its main limitation is being from a single center, which might limit its broad population diversity. So no perfect dataset. Choose carefully based on your model's intended use. 26:39 Exactly. Understanding these dataset characteristics, number of images, classes evaluation metrics used in benchmark studies, and their inherent biases or limitations is crucial if you want your model to generalize well to new unseen data. Ultimately, these comparisons along with resources provided in the review like a detailed machine learning flowchart and a method comparison table act as practical guides. They help practitioners select the best methods based on their task, their data, even the complexity of image backgrounds. It provides a kind of roadmap for best practices. 27:06 This deep backgrounds. It provides a kind of road map for best practices. This deep dive has certainly illuminated the immense technical power and potential of AI in dermatology. We've seen how it can speed up diagnosis, potentially improve accuracy, and even help democratize access to screening. It feels like more than just a new tool. 27:25 It's a catalyst changing how we approach this area of medicine. I think that's right. And, ultimately, the power of AI isn't just in its ability to diagnose, is it? It's in its potential to genuinely enhance clinical decision making, to augment the expert, and to expand access to specialized care. It really pushes us, I think, to redefine the boundaries of medical expertise and think hard about ethical responsibility in this increasingly data driven world. 27:51 So the question for you listening is, what possibilities does this spark for your understanding of the future of health care? Thank you for listening in. Subscribe and follow Colaberry on social media links in the description, and check out our website, www.colaberry.ai backslash podcast for more insights like this.