Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

Madan, Chetan; Satia, Aarjav; Basu, Soumen; Gupta, Pankaj; Dutta, Usha; Arora, Chetan

电气工程与系统科学 > 图像与视频处理

arXiv:2507.10869 (eess)

[提交于 2025年7月15日 ]

标题：关注纹理：重新思考掩码自编码器在医学图像分类中的预训练

标题： Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

Authors:Chetan Madan, Aarjav Satia, Soumen Basu, Pankaj Gupta, Usha Dutta, Chetan Arora

摘要：掩码自编码器（MAEs）已成为自然图像中自监督表示学习的主导策略，其中模型被预训练以重建被遮罩的块，原始和重建的RGB值之间的像素级均方误差（MSE）作为损失。我们观察到MSE鼓励模糊的图像重建，但在自然图像中仍然有效，因为它保留了主要的边缘。然而，在医学成像中，当纹理线索对于视觉异常的分类更为重要时，该策略会失败。受放射组学研究中灰度共生矩阵（GLCM）特征的启发，我们提出了一种基于MAE的新型预训练框架GLCM-MAE，使用基于匹配GLCM的重建损失。 GLCM捕获图像中的强度和空间关系，因此提出的损失有助于保留形态特征。此外，我们提出了一种新的公式，将匹配的GLCM矩阵转换为可微分的损失函数。我们证明了使用所提出的GLCM损失在医学图像上进行无监督预训练可以提高下游任务的表示效果。 GLCM-MAE在四个任务中均优于当前最先进的方法——通过超声图像检测胆囊癌提高2.1%，通过超声检测乳腺癌提高3.1%，通过X光检测肺炎提高0.5%，通过CT检测冠状病毒提高0.6%。源代码和预训练模型可在以下地址获取：https://github.com/ChetanMadan/GLCM-MAE.

摘要： Masked Autoencoders (MAEs) have emerged as a dominant strategy for self-supervised representation learning in natural images, where models are pre-trained to reconstruct masked patches with a pixel-wise mean squared error (MSE) between original and reconstructed RGB values as the loss. We observe that MSE encourages blurred image re-construction, but still works for natural images as it preserves dominant edges. However, in medical imaging, when the texture cues are more important for classification of a visual abnormality, the strategy fails. Taking inspiration from Gray Level Co-occurrence Matrix (GLCM) feature in Radiomics studies, we propose a novel MAE based pre-training framework, GLCM-MAE, using reconstruction loss based on matching GLCM. GLCM captures intensity and spatial relationships in an image, hence proposed loss helps preserve morphological features. Further, we propose a novel formulation to convert matching GLCM matrices into a differentiable loss function. We demonstrate that unsupervised pre-training on medical images with the proposed GLCM loss improves representations for downstream tasks. GLCM-MAE outperforms the current state-of-the-art across four tasks - gallbladder cancer detection from ultrasound images by 2.1%, breast cancer detection from ultrasound by 3.1%, pneumonia detection from x-rays by 0.5%, and COVID detection from CT by 0.6%. Source code and pre-trained models are available at: https://github.com/ChetanMadan/GLCM-MAE.

评论：	将出现在MICCAI 2025上
主题：	图像与视频处理 (eess.IV) ; 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2507.10869 [eess.IV]
	(或者 arXiv:2507.10869v1 [eess.IV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.10869

提交历史

来自： Soumen Basu [查看电子邮件]
[v1] 星期二， 2025 年 7 月 15 日 00:12:26 UTC (634 KB)

电气工程与系统科学 > 图像与视频处理

标题：关注纹理：重新思考掩码自编码器在医学图像分类中的预训练

标题： Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 图像与视频处理

标题： 关注纹理：重新思考掩码自编码器在医学图像分类中的预训练 显示英文标题

标题： Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：关注纹理：重新思考掩码自编码器在医学图像分类中的预训练