Test Data Set: UPENN-GBM
Upon request, we provide access to a KHEOPS album (UPENN_GBM_OS-MGMT) that contains a subset of 200 patients from the UPENN-GBM1 Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System imaging collection from the Cancer Imaging Archive (TCIA)2. Please note that by requesting access, you agree to abide by the TCIA Data Usage Policies and Restrictions.
Available Information
The UPENN_GBM_OS-MGMT KHEOPS album contains:
- MR T1, T1c, T2, FLAIR imaging of a subset of 200 Glioblastoma (GBM) patients from the UPENN-GBM data collection.
- Segmentation masks (DCM-SEG) for individual tumor (sub)regions (ET: enhancing tumor, NET: non-enhancing tumor, edema) as well as their combinations (ET-NET, ET-NET-Edema).
- Segmentation masks for the entire brain, as well as for the brain without tumor.
Segmentations were performed using the DeepBraTumIA GBM segmentation tool.
Additionally, we provide outcome and clinical covariates (subset from UPENN-GBM) in a format ready for upload to QuantImage:
- Outcomes:
- Survival modeling: patient id, time, event. (save as csv file)
- Classification: patient_id, mgmt_status. (save as csv file )
- Clinical covariates:
- patient_id, mgmt_status, sex, age_y, idh_status, gross_total_resection_over_90_percent (save as csv file)
Feature Extraction
We provide an example QuantImage configuration file (save as yaml file) to perform the following image pre-processing and feature extraction steps:
- Pre-processing:
- Resampling: Images and masks are resampled to 1mm x 1mm x 1mm spacing
- Standardization: For each MR sequence type (T1, T1c, T2, FLAIR) image intensities prior to feature extraction are z-score standardized using the mean and variance of intensity values in the healthy brain (brain mask, excluding ET, NET and edema regations) as reference.
- Radiomics feature extraction:
- Selection of pyradiomics as feature extractor, with following configuration:
- Extraction of shape, firstorder and texture (GLCM, GLSZM, GLRLM, NGTDM, GLDM) features from original image.
- Configuration of feature extraction settings follows standard pyradiomics syntax and can be adjusted accordingly.
- Selection of pyradiomics as feature extractor, with following configuration:
Step-by-step tutorial
- On the QuantImage v2 dashboard, locate the KHEOPS album ‘UPENN_GBM_OS-MGMT’ and click on ‘Extract Feactures’.
- In the ‘Feature Extraction’ pop-up
- Select ‘Import Configuration (Expert)’ and navigate to the path where you have saved the provided (example configuration file).
- Select the ROIs from which you would like to extract radiomics features for subsequent modeling. For a first quick trial, we advise to select only one or two ROIs (e.g. Edema and ET-NET).
- Click on the ‘Extract Features’ button.
- QuantImage will now be extracting radiomics features from the selected ROIs, and indicate the current status (‘Extraction in Progress (XX%)’). Extraction from 2 ROIs with the default settings will give you a break of about 15 minutes time for preparing your favourite beverage before continuing this tutorial.
- Once extraction has finished, the ‘Explore Features’and ‘Re-extract Features’ buttons become available on the QI Dashboard for the respective KHEOPS album.
- Open the Feature Explorer by clicking on ‘Explore Features’ and navigate to the ‘Outcomes’ tab.
- Create a new outcome by providing a name (e.g. ‘Overall Survival Prediction’) and an outcome type (e.g. ‘Survival’).
- Import an outcome csv file by clicking on ‘Import Labels’ and selecting an outcome file from your file-system. We provide outcome files for a classification task (MGMT status prediction, (here)) for a survival prediction task (Overall Survival (here)). Save one of these as csv and import.
- Click on ‘Save labels’ and wait until QI has finished processing saving these results.
- Navigate to the ‘Visualization’ tab. You should see a heatmap of the values of all features (y axis) and patients (x axis). The filter tree on the left allows to remove/add filters extracted from a specific modality / specific ROI / specfic feature group or feature. On the bottom, you find tools for quick semi-automatic feature selection.
- Adjust the ‘Correlation Threshold’ to e.g. 0.8 and select ‘Drop correlated features’ to remove redundant features that are highly correlated.
- Click on ‘Rank by F-value’ to order the appearance of features along the heatmap’s y-axis.
- Adjust the slider to e.g. 20 and select ‘Keep 20 best-ranked features’
- Now create a new collection with the selected features by clicking on the green button above the heatmap and providing a name that allows you to remember the selection steps you have applied to obtain this collection.
- Now, we train and evaluate a model on this collection for the selected Outcome. Navigate to the ‘Model Training’ tab and click on ‘Train & Test Model’. After quick computation you will be presented with a results page that provide information about training and test performance, as well as model sepcific details (e.g. which algorithm, how many and which features, how many and which patients in training and test sets, etc.)
- The model’s performance may be sensitive to the datasubset used for model training and testing. To investigate the overall predictive value and robustness of your feature set, you may want to repeat model training and evaluation for the same feature set, but different compositions of the train and test sets. To reshuffle cohort membership in QuantImage, navigate to the ‘Data Splitting’ tab, move the slider that regulates the percentage of patients in the training and testing set away from and back to its target value (e.g. 80% training, 20% testing). Now, return to the ‘Model Training’ tab and click on ‘Train a new XXX model using current outcome YYY’. Once computation has finished, a new row will be added to the results table. Repeat.
- Create a new outcome by providing a name (e.g. ‘Overall Survival Prediction’) and an outcome type (e.g. ‘Survival’).
- Remarks:
- The number of radiomics can grow rapidly when extracting features from multiple imaging modalities and ROIs, particularly, when filter-based features (e.g. Laplacian of Gaussian, or wavelet) are extracted. For very large feature sets, visualization looses its value and computation of correlation across all feature pairs becomes untracktable. In his case, the ‘Visualization’ tab proposeds a rapid preselection mechanism based on univariate predictiveness. If using this mechanism is not desired, a subset of features needs to be selected using the filter mechanism to enable feature visualization.
- QuantImage supports inclusion of additional clinical covariates in the model training and evaluation process. The ‘Clinical Feature’ tab provides the entry point for providing such non-imaging features. You may test this functionality using the clinical covariate file provided (here). However, please note that this functionality is still under development: Clincial features selected in the ‘Visualization’ tab are not visualized and may not be listed among the selected features in the ‘Model Training’ section.
References
-
Bakas, S., Sako, C., Akbari, H., Bilello, M., Sotiras, A., Shukla, G., Rudie, J. D., Flores Santamaria, N., Fathi Kazerooni, A., Pati, S., Rathore, S., Mamourian, E., Ha, S. M., Parker, W., Doshi, J., Baid, U., Bergman, M., Binder, Z. A., Verma, R., … Davatzikos, C. (2021). Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM) (Version 2) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.709X-DN49 ↩
-
Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. In Journal of Digital Imaging (Vol. 26, Issue 6, pp. 1045–1057). Springer Science and Business Media LLC. https://doi.org/10.1007/s10278-013-9622-7 ↩