Dataset Download

Access Note: Some cohorts require formal data access approval from their original repositories (EGA or dbGaP). For these, download raw data from the original repository and process it using our mRNA TPM processing pipeline.

Open-access cohorts can be downloaded directly using the links below. To obtain pre-processed gene expression data (pre-treatment mRNA TPM), please send a formal request to marinka@hms.harvard.edu (cc: wanxiang_shen@u.nus.edu) by completing our data request form.

Cohort Cancer Type Patients (R/NR) Group Reference Accession ID Download
IMmotion150 KIRC 165 (48/117) Large cohort McDermott et al. Nat Med, 2018 EGA: EGAS00001002928 Request from repository
IMvigor210 BLCA 298 (68/230) Large cohort IMvigor210 Study Group. Lancet, 2017 EGA: EGAS00001002556 Request from repository
Miao et al. KIRC 17 (5/12) Small cohort Miao et al. Science, 2018 dbGaP: phs001493.v1.p1 Request from repository
Ravi et al. (SU2C-MARK) NSCLC 102 (38/64) LUAD; 25 (8/17) LUSC Large & Small cohorts Ravi et al. Nat Genet, 2023 dbGaP: phs002822.v1.p1 Request from repository
Liu et al. SKCM 107 (41/66) Large cohort Liu et al. Nat Med, 2019 dbGaP: phs000452.v3.p1 Request from repository
Van Allen et al. SKCM 39 (13/26) Medium cohort Van Allen et al. Science, 2015 dbGaP: phs000452.v3.p1 Request from repository
Freeman et al. (MGH) SKCM 34 (12/22) Medium cohort Freeman et al. Cell Rep. Med, 2022 dbGaP: phs002683.v1.p1 Request from repository
Zhao et al. GBM 25 (11/14) Small cohort Zhao et al. Nat Med, 2019 SRA: PRJNA482620 Clinical data, mRNA data
Kim et al. STAD 45 (12/33) Medium cohort Kim et al. Nat Med, 2018 ENA: PRJEB25780 Clinical data, mRNA data
Gide et al. SKCM 73 (40/33) Medium cohort Gide et al. Cancer Cell, 2019 ENA: PRJEB23709 Clinical data, mRNA data
Riaz et al. SKCM 51 (10/41) Medium cohort Riaz et al. Cell, 2017 BioProject: PRJNA356761 Clinical data, mRNA data
Hugo et al. SKCM 26 (14/12) Small cohort Hugo et al. Cell, 2016 GEO: GSE78220 Clinical data, mRNA data
Rose et al. BLCA 89 (16/73) Medium cohort Rose et al. BJC, 2021 GEO: GSE176307 Clinical data, mRNA data
Snyder et al. BLCA 21 (7/14) Small cohort Snyder et al. PLoS Med, 2017 Zenodo: 10.5281/zenodo.546110 Clinical data, mRNA data
Choueiri et al. KIRC 16 (3/13) Small cohort Choueiri et al. Clin Cancer Res, 2016 CRI iAtlas (Open) Clinical data, mRNA data

Additionally, we provide paired patient samples consisting of pre- and post-ICI treatment mRNA expression data. Since some patients have multiple post-treatment samples, the dataset includes a total of 86 pre-post treatment pairs involving 78 patients across three cohorts: Riaz (n=43), Freeman (n=27), and Gide (n=16). These patients were treated with PD-1 (n=71), CTLA-4 + PD-1 (n=9), or CTLA-4 (n=6).

Cohort Cancer Type Patients (R/NR) Group Pre-treatment data Post-treatment data
Riaz(n=43), Freeman(n=27), Gide(n=16) SKCM 86(22/64) Pre-Post treatment Pairs Clinical data, mRNA data Clinical data, mRNA data

Model Download

We provide pre-trained and fine-tuned Compass models for specific use cases. Click the links below to download.

No. Model Description Download
1 PT Model Base model pre-trained on pan-cancer TCGA transcriptomic datasets (33 cancer types), used for concept feature extraction. Download
2 PFT Model Partially fine-tuned model (PFT) on all ICI-patients (n = 1,133) for response prediction. Download
3 LFT Model Linear-probing fine-tuned model (LFT) on all ICI-patients (n = 1,133) for response prediction. Download
4 Atezo Model Multi-stage fine-tuned model (PFT->PFT) developed on bladder cancer patients (n = 354) for Atezolizumab response prediction. Download
5 Ipi Model Multi-stage fine-tuned model (PFT->LFT) developed on melanoma patients (n = 57) for Ipilimumab response prediction. Download
6 Nivo Model Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 105) for Nivolumab response prediction. Download
7 Pembro Model Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 120) for Pembrolizumab response prediction. Download
8 Leave-Choueiri PFT Model trained on 1,117 patients excluding the Choueiri cohort (16 patients). Download
9 Leave-Miao PFT Model trained on 1,116 patients excluding the Miao cohort (17 patients). Download
10 Leave-Snyder PFT Model trained on 1,112 patients excluding the Snyder cohort (21 patients). Download
11 Leave-Zhao PFT Model trained on 1,108 patients excluding the Zhao cohort (25 patients). Download
12 Leave-SU2CLC2 PFT Model trained on 1,108 patients excluding the Ravi-2 cohort (25 patients). Download
13 Leave-Hugo PFT Model trained on 1,107 patients excluding the Hugo cohort (26 patients). Download
14 Leave-Allen PFT Model trained on 1,094 patients excluding the Allen cohort (39 patients). Download
15 Leave-MGH PFT Model trained on 1,099 patients excluding the Freeman (MGH) cohort (34 patients). Download
16 Leave-Kim PFT Model trained on 1,088 patients excluding the Kim cohort (45 patients). Download
17 Leave-Riaz PFT Model trained on 1,082 patients excluding the Riaz cohort (51 patients). Download
18 Leave-Rose PFT Model trained on 1,044 patients excluding the Rose cohort (89 patients). Download
19 Leave-Gide PFT Model trained on 1,060 patients excluding the Gide cohort (73 patients). Download
20 Leave-SU2CLC1 PFT Model trained on 1,031 patients excluding the Ravi-1 cohort (102 patients). Download
21 Leave-Liu PFT Model trained on 1,026 patients excluding the Liu cohort (107 patients). Download
22 Leave-IMmotion150 PFT Model trained on 968 patients excluding the IMmotion150 cohort (165 patients). Download
23 Leave-IMVigor210 PFT Model trained on 835 patients excluding the IMVigor210 cohort (298 patients). Download

Other Materials

Below is a list of additional datasets, including gene ID mapping, cancer type encoding, input examples, high-level concepts, and more.

Data Description Download
Cancer Code Encoding for 33 cancer types. Download
Gene Code Encoding for 15,672 genes. Download
Concepts Details of 44 high-level concepts, including their corresponding gene sets, genes, and references. Download
Gene ID Map A comprehensive gene ID mapping file, including ENS IDs, gene names, gene types, and Entrez gene IDs. Download
Input TPM Example An example dataset from Gide cohort for Compass input used in response prediction. The first column represents cancer types, while the remaining columns contain gene expression TPM values. Download
Input Clinical Data Clinical information corresponding to the Compass Input Example dataset from Gide cohort. Download
PT-Training Example Sample dataset for Compass model pre-training, used to train the model. Download
PT-Test Example Sample dataset for testing model performance during the pre-training. Download
Toy Raw Counts Example raw count data, used for illustrating the conversion from raw counts to TPM values. Download
Toy TPM Example TPM data derived from raw counts, used for illustrating raw count to TPM value conversion. Download
Gencode v36 Annotation The version 36 Gencode annotation file. Download

Code Download

Access our code from GitHub repositories: