Dataset Download

Below is a list of cohorts along with their corresponding clinical datasets. Click on the provided links to download the clinical data. Please note: The source links for these cohorts are provided, so please also cite and reference the original data if you use these datasets.

For access to the corresponding pre-processed gene expression data (mRNA TPM from pre-treatment), please contact marinka@hms.harvard.edu and cc wanxiang_shen@u.nus.edu with a formal data request.

Cohort Cancer Type Patients (R/NR) Group Reference Download
Choueiri KIRC 16(3/13) Small cohort Choueiri et al. Clinical Cancer Research, 2016 Clinical data, mRNA data
Miao KIRC 17(5/12) Small cohort Miao et al. Science, 2018 Clinical data, mRNA data
Snyder BLCA 21(7/14) Small cohort Snyder et al. PLoS Med. 2017 Clinical data, mRNA data
Zhao GBM 25(11/14) Small cohort Zhao et al. Nature Medicine, 2019 Clinical data, mRNA data
SU2CLC2 LUSC 25(8/17) Small cohort Ravi et al. Nature Genetics, 2023 Clinical data, mRNA data
Hugo SKCM 26(14/12) Small cohort Hugo et al. Cell, 2016 Clinical data, mRNA data
Allen SKCM 39(13/26) Medium cohort Van Allen et al. Science, 2015 Clinical data, mRNA data
MGH SKCM 34(12/22) Medium cohort Freeman et al Cell Rep. Med, 2022 Clinical data, mRNA data
Kim STAD 45(12/33) Medium cohort Kim et al. Nature Medicine, 2018 Clinical data, mRNA data
Riaz SKCM 51(10/41) Medium cohort Riaz et al. Cell, 2017 Clinical data, mRNA data
Rose BLCA 89(16/73) Medium cohort Rose et al. BJC 2021 Clinical data, mRNA data
Gide SKCM 73(40/33) Medium cohort Gide et al. Cancer Cell, 2019 Clinical data, mRNA data
SU2CLC1 LUAD 102(38/64) Large cohort Ravi et al. Nature Genetics, 2023 Clinical data, mRNA data
Liu SKCM 107(41/66) Large cohort Liu et al. Nature Medicine. 2019 Clinical data, mRNA data
IMmotion150 KIRC 165(48/117) Large cohort McDermott et al. Nature Medicine, 2018 Clinical data, mRNA data
IMVigor210 BLCA 298(68/230) Large cohort IMvigor210 Study Group. The Lancet, 2017 Clinical data, mRNA data

Additionally, we provide paired patient samples consisting of pre- and post-ICI treatment mRNA expression data. Since some patients have multiple post-treatment samples, the dataset includes a total of 86 pre-post treatment pairs involving 78 patients across three cohorts: Riaz (n=43), MGH (n=27), and Gide (n=16). These patients were treated with PD-1 (n=71), CTLA-4 + PD-1 (n=9), or CTLA-4 (n=6).

Cohort Cancer Type Patients (R/NR) Group Pre-treatment data Post-treatment data
Riaz(n=43), MGH(n=27), Gide(n=16) SKCM 86(22/64) Pre-Post treatment Pairs Clinical data, mRNA data Clinical data, mRNA data

Model Download

We provide pre-trained and fine-tuned Compass models for specific use cases. Click the links below to download.

No. Model Description Download
1 PT Model Base model pre-trained on pan-cancer TCGA transcriptomic datasets (33 cancer types), used for concept feature extraction. Download
2 PFT Model Partially fine-tuned model (PFT) on all ICI-patients (n = 1,133) for response prediction. Download
3 LFT Model Linear-probing fine-tuned model (LFT) on all ICI-patients (n = 1,133) for response prediction. Download
4 Atezo Model Multi-stage fine-tuned model (PFT->PFT) developed on bladder cancer patients (n = 354) for Atezolizumab response prediction. Download
5 Ipi Model Multi-stage fine-tuned model (PFT->LFT) developed on melanoma patients (n = 57) for Ipilimumab response prediction. Download
6 Nivo Model Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 105) for Nivolumab response prediction. Download
7 Pembro Model Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 120) for Pembrolizumab response prediction. Download
8 Leave-Choueiri PFT Model trained on 1,117 patients excluding the Choueiri cohort (16 patients). Download
9 Leave-Miao PFT Model trained on 1,116 patients excluding the Miao cohort (17 patients). Download
10 Leave-Snyder PFT Model trained on 1,112 patients excluding the Snyder cohort (21 patients). Download
11 Leave-Zhao PFT Model trained on 1,108 patients excluding the Zhao cohort (25 patients). Download
12 Leave-SU2CLC2 PFT Model trained on 1,108 patients excluding the SU2CLC2 cohort (25 patients). Download
13 Leave-Hugo PFT Model trained on 1,107 patients excluding the Hugo cohort (26 patients). Download
14 Leave-Allen PFT Model trained on 1,094 patients excluding the Allen cohort (39 patients). Download
15 Leave-MGH PFT Model trained on 1,099 patients excluding the MGH cohort (34 patients). Download
16 Leave-Kim PFT Model trained on 1,088 patients excluding the Kim cohort (45 patients). Download
17 Leave-Riaz PFT Model trained on 1,082 patients excluding the Riaz cohort (51 patients). Download
18 Leave-Rose PFT Model trained on 1,044 patients excluding the Rose cohort (89 patients). Download
19 Leave-Gide PFT Model trained on 1,060 patients excluding the Gide cohort (73 patients). Download
20 Leave-SU2CLC1 PFT Model trained on 1,031 patients excluding the SU2CLC1 cohort (102 patients). Download
21 Leave-Liu PFT Model trained on 1,026 patients excluding the Liu cohort (107 patients). Download
22 Leave-IMmotion150 PFT Model trained on 968 patients excluding the IMmotion150 cohort (165 patients). Download
23 Leave-IMVigor210 PFT Model trained on 835 patients excluding the IMVigor210 cohort (298 patients). Download

Other Materials

Below is a list of additional datasets, including gene ID mapping, cancer type encoding, input examples, high-level concepts, and more.

Data Description Download
Cancer Code Encoding for 33 cancer types. Download
Gene Code Encoding for 15,672 genes. Download
Concepts Details of 44 high-level concepts, including their corresponding gene sets, genes, and references. Download
Gene ID Map A comprehensive gene ID mapping file, including ENS IDs, gene names, gene types, and Entrez gene IDs. Download
Input TPM Example An example dataset from Gide cohort for Compass input used in response prediction. The first column represents cancer types, while the remaining columns contain gene expression TPM values. Download
Input Clinical Data Clinical information corresponding to the Compass Input Example dataset from Gide cohort. Download
PT-Training Example Sample dataset for Compass model pre-training, used to train the model. Download
PT-Test Example Sample dataset for testing model performance during the pre-training. Download
Toy Raw Counts Example raw count data, used for illustrating the conversion from raw counts to TPM values. Download
Toy TPM Example TPM data derived from raw counts, used for illustrating raw count to TPM value conversion. Download
Gencode v36 Annotation The version 36 Gencode annotation file. Download

Code Download

Access our code from GitHub repositories: