Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.
Data annotation
Data curation
Magnetic resonance imaging
Myeloma
Journal
Insights into imaging
ISSN: 1869-4101
Titre abrégé: Insights Imaging
Pays: Germany
ID NLM: 101532453
Informations de publication
Date de publication:
16 Feb 2024
16 Feb 2024
Historique:
received:
25
08
2023
accepted:
06
12
2023
medline:
16
2
2024
pubmed:
16
2
2024
entrez:
16
2
2024
Statut:
epublish
Résumé
MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation. Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods. A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for "composing" whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered. MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects. This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging. • Heterogeneous data in the MALIMAR study required the development of novel curation strategies. • Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. • Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple "image marts".
Identifiants
pubmed: 38361108
doi: 10.1186/s13244-023-01591-7
pii: 10.1186/s13244-023-01591-7
doi:
Types de publication
Journal Article
Langues
eng
Pagination
47Subventions
Organisme : Cancer Research UK
ID : C7273/A28677
Pays : United Kingdom
Informations de copyright
© 2024. The Author(s).
Références
Messiou C, Porta N, Sharma B et al (2021) Prospective evaluation of whole-body MRI versus FDG PET/CT for lesion detection in participants with myeloma. Radiology 3:e210048
pubmed: 34559006
pmcid: 8489453
National Institute for Health and Care Excellence (2016) NICE guideline: myeloma: diagnosis and management
Dimopoulos MA, Hillengass J, Usmani S et al (2015) Role of magnetic resonance imaging in the management of patients with multiple myeloma: a consensus statement. J Clin Oncol 33:657–664
doi: 10.1200/JCO.2014.57.9961
pubmed: 25605835
Rajkumar SV, Dimopoulos MA, Palumbo A et al (2014) International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma. Lancet Oncol 15:e538–e548
doi: 10.1016/S1470-2045(14)70442-5
pubmed: 25439696
Messiou C, Hillengass J, Delorme S et al (2019) Guidelines for acquisition, interpretation, and reporting of whole-body MRI in myeloma: myeloma response assessment and diagnosis system (MY-RADS). Radiology 291:5–13
doi: 10.1148/radiol.2019181949
pubmed: 30806604
Satchwell L, Wedlake L, Greenlay E et al (2022) Development of machine learning support for reading whole body diffusion-weighted MRI (WB-MRI) in myeloma for the detection and quantification of the extent of disease before and after treatment (MALIMAR): protocol for a cross-sectional diagnostic test accuracy study. BMJ Open 12:e067140
doi: 10.1136/bmjopen-2022-067140
pubmed: 36198471
pmcid: 9535185
Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM (2021) “Everyone wants to do the model work, not the data work”: data cascades in high-stakes AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3411764.3445518
Wennmann M, Klein A, Bauer F et al (2022) Combining deep learning and radiomics for automated, objective, comprehensive bone marrow characterization from whole-body MRI: a multicentric feasibility study. Invest Radiol 57:752–763
doi: 10.1097/RLI.0000000000000891
pubmed: 35640004
Gu R, Antonelli M, Mehta P et al Automatic segmentation of whole-body MRI using UnnU-Net: feasibility of whole-skeleton ADC evaluation in plasma cell disorders. In: Proc. Int. Soc. Magn. Reson. Med., 31st Annual Meeting, London, 2022, abstract #2162
Liu J, Guo W, Zeng P et al (2022) Vertebral MRI-based radiomics model to differentiate multiple myeloma from metastases: influence of features number on logistic regression model performance. Eur Radiol 32:572–581
doi: 10.1007/s00330-021-08150-y
pubmed: 34255157
Liu J, Wang C, Guo W et al (2021) A preliminary study using spinal MRI-based radiomics to predict high-risk cytogenetic abnormalities in multiple myeloma. Radiol Med 126:1226–1235. https://doi.org/10.1007/s11547-021-01388-y
Hwang E-J, Jung J-Y, Lee SK, Lee S-E, Jee W-H (2019) Machine learning for diagnosis of hematologic diseases in magnetic resonance imaging of lumbar spines. Sci Rep 9:1–9
doi: 10.1038/s41598-019-42579-y
Xiong X, Wang J, Hu S, Dai Y, Zhang Y, Hu C (2021) Differentiating between multiple myeloma and metastasis subtypes of lumbar vertebra lesions using machine learning–based radiomics. Front Oncol 11:128
Jerebko AK, Schmidt G, Zhou X et al (2007) Robust parametric modeling approach based on domain knowledge for computer aided detection of vertebrae column metastases in MRI. Inf Process Med Imaging 20:713–724
He J and Zhang K (2021) Medical image analysis of multiple myeloma based on convolutional neural network. Exp Syst 2022;39:e12810. https://doi.org/10.1111/exsy.12810
Zhou C, Chan H-P, Hadjiiski LM, Dong Q (2021) Deep learning based risk stratification for treatment management of multiple myeloma with sequential MRI scans. In: Medical Imaging Proceedings, vol 11597. p 1159716. https://doi.org/10.1117/12.2582203
Qaiser T, Winzeck S, Barfoot T et al (2021) Multiple instance learning with auxiliary task weighting for multiple myeloma classification. arXiv:2107.07805. https://doi.org/10.48550/arXiv.2107.07805
Hwang E-J, Kim S, Jung J-Y (2022) Fully automated segmentation of lumbar bone marrow in sagittal, high-resolution T1-weighted magnetic resonance images using 2D U-NET. Comput Biol Med 140:105105
doi: 10.1016/j.compbiomed.2021.105105
pubmed: 34864583
Pauly O, Glocker B, Criminisi A et al (2011) Fast multiple organ detection and localization in whole-body MR Dixon sequences. Med Image Comput Comput Assist Interv 14:239–247
Rockall AG, Li X, Johnson N et al (2023) Development and evaluation of machine learning in whole-body magnetic resonance imaging for detecting metastases in patients with lung or colon cancer: a diagnostic test accuracy study. Invest Radiol 10:1097
Wolz R, Chu C, Misawa K, Mori K K, Rueckert D (2012) Multi-organ abdominal CT segmentation using hierarchically weighted subject-specific atlases. Med Image Comput Comput Assist Interv 15:10–17
Xu L, Tetteh G, Lipkova J et al (2018) Automated whole-body bone lesion detection for multiple myeloma on 68Ga-pentixafor PET/CT imaging using deep learning methods. Contrast Media Mol Imaging 2018:2391925. https://doi.org/10.1155/2018/2391925
Kirillov A, Mintun E, Ravi N et al (2023) Segment anything. arXiv:2304.02643. https://doi.org/10.48550/arXiv.2304.02643
Wasserthal J, Breit H-C, Meyer MT et al (2023) TotalSegmentator: robust segmentation of 104 anatomic structures in CT Images. Radiol Artif Intell 5(5):e230024
Basty N, Thanaj M, Cule M et al (2021) Swap-free fat-water separation in Dixon MRI using conditional generative adversarial networks. arXiv preprint arXiv:2107.14175
Doran SJ, Kumar S, Orton M et al (2021) “Real-world” radiomics from multi-vendor MRI: an original retrospective study on the prediction of nodal status and disease survival in breast cancer, as an exemplar to promote discussion of the wider issues. Cancer Imaging 21:1–18
doi: 10.1186/s40644-021-00406-6
Messiou C, Booth T, Robinson D et al (2017) Guidance on the use of patient images obtained as part of standard care for teaching, training and research. Available via https://www.sor.org/getmedia/48ce6a7b-88b8-4c02-a402-ab5afcca89fe/bfcr177_use_of_pateint_images.pdf
Nind T, Sutherland J, McAllister G et al (2020) An extensible big data software architecture managing a research resource of real-world clinical radiology data linked to other health data from the whole Scottish population. Gigascience 9:giaa095
doi: 10.1093/gigascience/giaa095
pubmed: 32990744
pmcid: 7523405