Aiding Medical Diagnosis through Image Synthesis and Classification

Choudhary, Kanishk

计算机科学 > 计算机视觉与模式识别

arXiv:2506.00786v1 (cs)

[提交于 2025年6月1日 ]

标题：通过图像合成与分类辅助医学诊断

标题： Aiding Medical Diagnosis through Image Synthesis and Classification

Authors:Kanishk Choudhary

摘要：医学专业人员，尤其是那些正在接受培训的人员，通常依赖于视觉参考材料来支持准确的诊断并培养模式识别技能。然而，现有的资源可能缺乏广泛的临床学习所需的多样性和可访问性。本文介绍了一个系统，该系统旨在根据文本描述生成逼真的医疗图像，并通过分类模型验证其准确性。使用低秩适应（LoRA）在由九种结直肠组织病理学组织类型组成的PathMNIST数据集上微调了一个预训练的稳定扩散模型。生成模型多次使用不同的训练参数配置进行训练，同时由领域特定提示引导以捕获有意义的特征。为了确保质量控制，使用相同的数据库训练了一个ResNet-18分类模型，在检测结直肠组织病理学医疗图像的正确标签方面达到了99.76%的准确率。生成的图像随后通过训练好的分类器和迭代过程进行过滤，其中不准确的输出被丢弃并重新生成，直到正确分类为止。实验中表现最佳的生成模型版本实现了0.6727的F1分数，精确度和召回率分别为0.6817和0.7111。某些类型的组织，如脂肪组织和淋巴细胞，达到了完美的分类分数，而其他类型的组织由于结构复杂性更具挑战性。由于系统在生成和分类部分的高准确性，这种方法展示了合成特定领域医疗图像的一种可靠方法，具有在诊断支持和临床教育中的潜在应用。未来的工作包括提高特定提示的准确性并扩展到医学影像的其他领域。

摘要： Medical professionals, especially those in training, often depend on visual reference materials to support an accurate diagnosis and develop pattern recognition skills. However, existing resources may lack the diversity and accessibility needed for broad and effective clinical learning. This paper presents a system designed to generate realistic medical images from textual descriptions and validate their accuracy through a classification model. A pretrained stable diffusion model was fine-tuned using Low-Rank Adaptation (LoRA) on the PathMNIST dataset, consisting of nine colorectal histopathology tissue types. The generative model was trained multiple times using different training parameter configurations, guided by domain-specific prompts to capture meaningful features. To ensure quality control, a ResNet-18 classification model was trained on the same dataset, achieving 99.76% accuracy in detecting the correct label of a colorectal histopathological medical image. Generated images were then filtered using the trained classifier and an iterative process, where inaccurate outputs were discarded and regenerated until they were correctly classified. The highest performing version of the generative model from experimentation achieved an F1 score of 0.6727, with precision and recall scores of 0.6817 and 0.7111, respectively. Some types of tissue, such as adipose tissue and lymphocytes, reached perfect classification scores, while others proved more challenging due to structural complexity. The self-validating approach created demonstrates a reliable method for synthesizing domain-specific medical images because of high accuracy in both the generation and classification portions of the system, with potential applications in both diagnostic support and clinical education. Future work includes improving prompt-specific accuracy and extending the system to other areas of medical imaging.

评论：	8页，6个图。已投稿审稿中。
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.00786 [cs.CV]
	(或者 arXiv:2506.00786v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.00786

提交历史

来自： Kanishk Choudhary [查看电子邮件]
[v1] 星期日， 2025 年 6 月 1 日 02:25:43 UTC (5,806 KB)

计算机科学 > 计算机视觉与模式识别

标题：通过图像合成与分类辅助医学诊断

标题： Aiding Medical Diagnosis through Image Synthesis and Classification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 通过图像合成与分类辅助医学诊断 显示英文标题

标题： Aiding Medical Diagnosis through Image Synthesis and Classification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过图像合成与分类辅助医学诊断