中国寄生虫学与寄生虫病杂志 ›› 2024, Vol. 42 ›› Issue (5): 582-593.doi: 10.12140/j.issn.1000-7423.2024.05.004

• 论著 • 上一篇    下一篇

基于影像组学及临床特征的机器学习模型鉴别肝细粒棘球蚴病病灶活性的研究

汪占金1(), 陈志恒1, 李富源1, 蔡俊杰1, 薛张佗1, 周瀛2, 曹云太3, 王展4,*()   

  1. 1 青海大学临床医学院,青海 西宁 810000
    2 青海大学附属医院肝胆胰二科,青海 西宁 810000
    3 青海大学附属医院影像中心,青海 西宁 810000
    4 青海大学附属医院医工结合与转化应用部,青海 西宁 810000
  • 收稿日期:2024-05-16 修回日期:2024-09-04 出版日期:2024-10-30 发布日期:2024-10-24
  • 通讯作者: * 王展(1985—),男,博士,副主任医师,从事棘球蚴病人工智能诊治研究。E-mail:ufofu01@163.com
  • 作者简介:汪占金(1999—),男,硕士研究生,从事棘球蚴病人工智能诊治研究。E-mail:18197256027@163.com
  • 基金资助:
    国家自然科学基金(82160131);青海省科技厅青年基金(2021-ZJ-963Q)

Identification of lesion activities in haptic cystic echinococcosis using machine learning model based on radiomics and clinical features

WANG Zhanjin1(), CHEN Zhiheng1, LI Fuyuan1, CAI Junjie1, XUE Zhangtuo1, ZHOU Ying2, CAO Yuntai3, WANG Zhan4,*()   

  1. 1 Clinical Medical School, Qinghai University, Xining 810000, Qinghai, China
    2 Department of Hepatobiliary and Pancreatic Surgery, Qinghai University Affiliated Hospital, Xining 810000, Qinghai, China
    3 Imaging Center, Qinghai University Affiliated Hospital, Xining 810000, Qinghai, China
    4 Department of Medical Engineering and Translational Applications, Qinghai University Affiliated Hospital, Xining 810000, Qinghai, China
  • Received:2024-05-16 Revised:2024-09-04 Online:2024-10-30 Published:2024-10-24
  • Contact: * E-mail: ufofu01@163.com
  • Supported by:
    National Natural Science Foundation of China(82160131);Qinghai Provincial Department of Science and Technology(2021-ZJ-963Q)

摘要:

目的 开发影像组学和临床特征的机器学习模型,以精准鉴别肝细粒棘球蚴病(HCE)病灶的生物活性。 方法 收集2018—2022年就诊于青海大学附属医院肝胆胰外科的521例HCE患者和就诊于果洛州人民医院普外科和玉树州人民医院普外科的236例HCE患者的CT图像及临床资料,提取影像特征并进行筛选。对临床资料采用单因素及多因素Logistic回归分析,筛选构建模型的特征。采用Logistic回归(LR)、支持向量机(SVM)、K-近邻算法(KNN)、随机森林(RandomForest)、极限梯度提升(XGBoost)、轻量级梯度提升机(LightGBM)、极端随机树(ExtraTrees)等7种机器学习算法构建影像组学模型和临床模型,结合影像组学模型和临床模型的预测结果,基于软投票法构建联合模型,采用Delong检验比较影像组学模型、临床模型和临床-影像联合模型的性能,并通过外部验证评估模型性能。 结果 共430例患者被纳入进行模型开发训练,171例患者作为外部验证,筛选出51个影像特征及5个临床特征用于构建模型。7种机器学习模型中,以XGBoost算法性能表现最佳,其构建的临床模型在训练集和外部验证集上的AUC值均最大,分别为0.977[95%置信区间(95% CI):0.964~0.990]和0.839(95% CI:0.776~0.901);其构建的影像组学模型AUC值均最大,分别为0.998(95% CI:0.997~1.000和0.874(95% CI:0.822~0.927);其构建的联合模型AUC值均最大,分别为1.000(95% CI:0.999~1.000)和0.931(95% CI:0.894~0.968)。DeLong检验结果表明,联合模型在训练集上的性能优于临床模型(Z = 2.154,P < 0.05),与影像组学模型差异无统计学意义(Z = 0.562,P > 0.05);在外部验证集上的性能优于临床模型和影像组学模型(Z = 3.338、3.331,P < 0.05)。校准曲线和决策分析(DCA)曲线表明,联合模型在训练集和外部验证集的校准性能最佳、净收益最高,在不同数据集上性能稳定,在外部验证中展现了良好的泛化能力和可靠性。 结论 基于影像组学以及临床数据开发的机器学习模型能够精准鉴别肝细粒棘球蚴病病灶的生物活性,联合模型具更高的诊断精度和临床应用潜力,可为HCE患者的治疗方案提供参考。

关键词: 肝细粒棘球蚴病, 病灶活性, 机器学习模型, 影像组学, 临床特征

Abstract:

Objective To develop machine learning models utilizing radiomic and clinical features to precisely identify the biological activity of haptic cystic echinococcosis (HCE). Methods The CT images and clinical data of 521 HCE patients treated at the Hepatobiliary and Pancreatic Surgery Department of Qinghai University Affiliated Hospital, along with 236 HCE patients treated at the General Surgery Departments of Guoluo Prefectural People’s Hospital and Yushu Prefectural People’s Hospital in 2018-2022, were collected. Radiomics features were extracted and screened accordingly. Univariate and multivariate logistic regression analyses were performed on the clinical data to select features for model construction. To construct radiomics and clinical models, seven machine learning algorithms were employed including Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RandomForest), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Extra Trees. A clinical-image combined model was constructed based on the prediction from radiomics model combining clinical model, using soft voting method. DeLong’s test was used to compare the performances of the radiomics model, clinical model, and combined clinical-imaging model. In addition, external validation was utilized to assess the model’s performance. Results A total of 430 patients were included for model development and training, while 171 patients were designated for external validation. Fifty-one radiomics features and five clinical features were selected for model construction. Among the seven machine learning models, the XGBoost algorithm demonstrated the best performance, achieving area under the curve (AUC) values of 0.977 [95% confidence interval (CI): 0.964-0.990] and 0.839 (95% CI: 0.776-0.901) on the training and external validation sets, respectively. The radiomics model achieved AUC values of 0.998 (95% CI: 0.997-1.000) and 0.874 (95% CI: 0.822-0.927), while the combined model obtained AUC values of 1.000 (95% CI: 0.999-1.000) and 0.931 (95% CI: 0.894-0.968). The DeLong test results indicated that the performance of the combined model was superior to that of the clinical model in the training set (Z = 2.154, P < 0.05) and showed no statistically significant difference when compared to the radiomics model (Z = 0.562, P > 0.05); however, its performance on the external validation set was better than both the clinical and radiomics models (Z = 3.338, 3.331; P < 0.05). Calibration plots and decision curve analysis (DCA) indicated that the combined model exhibited the best calibration performance in both the training and external validation sets, yielding the highest net benefit, demonstrating consistent performance across different datasets, and displaying good generalizability and reliability in external validation. Conclusion The machine learning model, developed based on radiomic and clinical data, can precisely identify the biological activity of HCE lesions. The combined model exhibits higher diagnostic accuracy and clinical application potential, providing reference for making treatment plan for HCE patients.

Key words: Haptic cystic echinococcosis, Lesion activity, Machine learning model, Radiomics, Clinical features

中图分类号: