Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > eess > arXiv:1911.07372

Help | Advanced Search

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:1911.07372 (eess)
[Submitted on 18 Nov 2019 ]

Title: Deep Learning for the Digital Pathologic Diagnosis of Cholangiocarcinoma and Hepatocellular Carcinoma: Evaluating the Impact of a Web-based Diagnostic Assistant

Title: 基于深度学习的胆管癌和肝细胞癌数字病理诊断:评估基于网络的诊断助手的影响

Authors:Bora Uyumazturk, Amirhossein Kiani, Pranav Rajpurkar, Alex Wang, Robyn L. Ball, Rebecca Gao, Yifan Yu, Erik Jones, Curtis P. Langlotz, Brock Martin, Gerald J. Berry, Michael G. Ozawa, Florette K. Hazard, Ryanne A. Brown, Simon B. Chen, Mona Wood, Libby S. Allard, Lourdes Ylagan, Andrew Y. Ng, Jeanne Shen
Abstract: While artificial intelligence (AI) algorithms continue to rival human performance on a variety of clinical tasks, the question of how best to incorporate these algorithms into clinical workflows remains relatively unexplored. We investigated how AI can affect pathologist performance on the task of differentiating between two subtypes of primary liver cancer, hepatocellular carcinoma (HCC) and cholangiocarcinoma (CC). We developed an AI diagnostic assistant using a deep learning model and evaluated its effect on the diagnostic performance of eleven pathologists with varying levels of expertise. Our deep learning model achieved an accuracy of 0.885 on an internal validation set of 26 slides and an accuracy of 0.842 on an independent test set of 80 slides. Despite having high accuracy on a hold out test set, the diagnostic assistant did not significantly improve performance across pathologists (p-value: 0.184, OR: 1.287 (95% CI 0.886, 1.871)). Model correctness was observed to significantly bias the pathologist decisions. When the model was correct, assistance significantly improved accuracy across all pathologist experience levels and for all case difficulty levels (p-value: < 0.001, OR: 4.289 (95% CI 2.360, 7.794)). When the model was incorrect, assistance significantly decreased accuracy across all 11 pathologists and for all case difficulty levels (p-value < 0.001, OR: 0.253 (95% CI 0.126, 0.507)). Our results highlight the challenges of translating AI models to the clinical setting, especially for difficult subspecialty tasks such as tumor classification. In particular, they suggest that incorrect model predictions could strongly bias an expert's diagnosis, an important factor to consider when designing medical AI-assistance systems.
Abstract: 虽然人工智能(AI)算法在各种临床任务上继续与人类表现相媲美,但如何最佳地将这些算法纳入临床工作流程的问题仍相对未被探索。 我们研究了AI如何影响病理学家区分两种原发性肝癌亚型——肝细胞癌(HCC)和胆管癌(CC)的任务表现。 我们开发了一个使用深度学习模型的AI诊断助手,并评估了其对不同专业水平的十一位病理学家的诊断表现的影响。 我们的深度学习模型在内部验证集的26张切片上达到了0.885的准确率,在独立测试集的80张切片上达到了0.842的准确率。 尽管在保留测试集上具有高准确率,但该诊断助手并未显著提高所有病理学家的表现(p值:0.184,OR:1.287(95% CI 0.886,1.871))。 观察到模型的正确性显著影响了病理学家的决策。 当模型正确时,辅助显著提高了所有病理学家经验水平和所有病例难度水平的准确性(p值:<0.001,OR:4.289(95% CI 2.360,7.794))。 当模型错误时,辅助显著降低了所有11位病理学家和所有病例难度水平的准确性(p值 < 0.001,OR:0.253(95% CI 0.126,0.507))。 我们的结果突显了将AI模型转化为临床环境的挑战,特别是对于像肿瘤分类这样的困难亚专业任务。 特别是,它们表明错误的模型预测可能会强烈影响专家的诊断,这是在设计医疗AI辅助系统时需要考虑的重要因素。
Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract
Subjects: Image and Video Processing (eess.IV)
Cite as: arXiv:1911.07372 [eess.IV]
  (or arXiv:1911.07372v1 [eess.IV] for this version)
  https://doi.org/10.48550/arXiv.1911.07372
arXiv-issued DOI via DataCite

Submission history

From: Bora Uyumazturk [view email]
[v1] Mon, 18 Nov 2019 00:14:54 UTC (1,222 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • TeX Source
  • Other Formats
view license
Current browse context:
eess.IV
< prev   |   next >
new | recent | 2019-11
Change to browse by:
eess

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号