Enhanced Lung Cancer Survival Prediction using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets

Salmanpour, Mohammad R.; Gorji, Arman; Mousavi, Amin; Jouzdani, Ali Fathi; Sanati, Nima; Maghsudi, Mehdi; Leung, Bonnie; Ho, Cheryl; Yuan, Ren; Rahmim, Arman

计算机科学 > 计算机视觉与模式识别

arXiv:2412.00068 (cs)

[提交于 2024年11月25日 ]

标题：使用半监督伪标签和从多样化PET/CT数据集学习的增强型肺癌生存预测

标题： Enhanced Lung Cancer Survival Prediction using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets

Authors:Mohammad R. Salmanpour, Arman Gorji, Amin Mousavi, Ali Fathi Jouzdani, Nima Sanati, Mehdi Maghsudi, Bonnie Leung, Cheryl Ho, Ren Yuan, Arman Rahmim

摘要：目标：本研究探索了一种使用多样化数据集的半监督学习（SSL）伪标签策略，以提升肺癌（LCa）生存预测性能，并分析来自PET/CT扫描的手工特征和深度影像组学特征（HRF/DRF），通过混合机器学习系统（HMLS）。方法：我们收集了199名同时拥有PET和CT图像的LCa患者，这些数据来自癌症影像档案库（TCIA）和本地数据库，同时从TCIA获得了408名头颈癌（HNCa）的PET/CT图像。我们在ViSERA软件中，通过PySERA和3D自动编码器，分别提取了215个HRF和1024个DRF，这些特征来自分割后的原发肿瘤。监督策略（SL）采用HMLSs： PCA连接到HRF和DRF上的4个分类器。 SSL策略通过添加由随机森林算法标记的408个伪标签HNCa病例（共199个LCa病例），扩展了数据集，使用相同的HMLS技术。此外，主成分分析（PCA）结合4个生存预测算法用于生存风险比分析。结果：SSL策略优于SL方法（p值<0.05），使用PET的DRF和PCA+多层感知机（MLP）实现了平均准确率0.85，而SL策略使用CT的DRF和PCA+K最近邻（KNN）则为0.65。此外， PCA结合从CT提取的HRF和DRF上的分量梯度提升生存分析，具有平均c指数0.80，Log Rank p值<<0.001，外部测试确认。结论：从HRF和SL转向DRF和SSL策略，特别是在数据点有限的情况下，使单独的CT或PET能够显著实现高预测性能。

摘要： Objective: This study explores a semi-supervised learning (SSL), pseudo-labeled strategy using diverse datasets to enhance lung cancer (LCa) survival predictions, analyzing Handcrafted and Deep Radiomic Features (HRF/DRF) from PET/CT scans with Hybrid Machine Learning Systems (HMLS). Methods: We collected 199 LCa patients with both PET & CT images, obtained from The Cancer Imaging Archive (TCIA) and our local database, alongside 408 head&neck cancer (HNCa) PET/CT images from TCIA. We extracted 215 HRFs and 1024 DRFs by PySERA and a 3D-Autoencoder, respectively, within the ViSERA software, from segmented primary tumors. The supervised strategy (SL) employed a HMLSs: PCA connected with 4 classifiers on both HRF and DRFs. SSL strategy expanded the datasets by adding 408 pseudo-labeled HNCa cases (labeled by Random Forest algorithm) to 199 LCa cases, using the same HMLSs techniques. Furthermore, Principal Component Analysis (PCA) linked with 4 survival prediction algorithms were utilized in survival hazard ratio analysis. Results: SSL strategy outperformed SL method (p-value<0.05), achieving an average accuracy of 0.85 with DRFs from PET and PCA+ Multi-Layer Perceptron (MLP), compared to 0.65 for SL strategy using DRFs from CT and PCA+ K-Nearest Neighbor (KNN). Additionally, PCA linked with Component-wise Gradient Boosting Survival Analysis on both HRFs and DRFs, as extracted from CT, had an average c-index of 0.80 with a Log Rank p-value<<0.001, confirmed by external testing. Conclusions: Shifting from HRFs and SL to DRFs and SSL strategies, particularly in contexts with limited data points, enabling CT or PET alone to significantly achieve high predictive performance.

评论：	12页，7幅图
主题：	计算机视觉与模式识别 (cs.CV) ; 数据分析、统计与概率 (physics.data-an)
引用方式：	arXiv:2412.00068 [cs.CV]
	(或者 arXiv:2412.00068v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2412.00068

提交历史

来自： Mohammad R. Salmanpour [查看电子邮件]
[v1] 星期一， 2024 年 11 月 25 日 23:58:37 UTC (853 KB)

计算机科学 > 计算机视觉与模式识别

标题：使用半监督伪标签和从多样化PET/CT数据集学习的增强型肺癌生存预测

标题： Enhanced Lung Cancer Survival Prediction using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 使用半监督伪标签和从多样化PET/CT数据集学习的增强型肺癌生存预测 显示英文标题

标题： Enhanced Lung Cancer Survival Prediction using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用半监督伪标签和从多样化PET/CT数据集学习的增强型肺癌生存预测