Dataset Download
Below is a list of cohorts along with their corresponding clinical datasets. Click on the provided links to download the clinical data. Please note: The source links for these cohorts are provided, so please also cite and reference the original data if you use these datasets.
For access to the corresponding pre-processed gene expression data (mRNA TPM from pre-treatment), please contact marinka@hms.harvard.edu and cc wanxiang_shen@u.nus.edu with a formal data request.
Additionally, we provide paired patient samples consisting of pre- and post-ICI treatment mRNA expression data. Since some patients have multiple post-treatment samples, the dataset includes a total of 86 pre-post treatment pairs involving 78 patients across three cohorts: Riaz (n=43), MGH (n=27), and Gide (n=16). These patients were treated with PD-1 (n=71), CTLA-4 + PD-1 (n=9), or CTLA-4 (n=6).
Cohort | Cancer Type | Patients (R/NR) | Group | Pre-treatment data | Post-treatment data |
---|---|---|---|---|---|
Riaz(n=43), MGH(n=27), Gide(n=16) | SKCM | 86(22/64) | Pre-Post treatment Pairs | Clinical data, mRNA data | Clinical data, mRNA data |
Model Download
We provide pre-trained and fine-tuned Compass models for specific use cases. Click the links below to download.
No. | Model | Description | Download |
---|---|---|---|
1 | PT Model | Base model pre-trained on pan-cancer TCGA transcriptomic datasets (33 cancer types), used for concept feature extraction. | Download |
2 | PFT Model | Partially fine-tuned model (PFT) on all ICI-patients (n = 1,133) for response prediction. | Download |
3 | LFT Model | Linear-probing fine-tuned model (LFT) on all ICI-patients (n = 1,133) for response prediction. | Download |
4 | Atezo Model | Multi-stage fine-tuned model (PFT->PFT) developed on bladder cancer patients (n = 354) for Atezolizumab response prediction. | Download |
5 | Ipi Model | Multi-stage fine-tuned model (PFT->LFT) developed on melanoma patients (n = 57) for Ipilimumab response prediction. | Download |
6 | Nivo Model | Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 105) for Nivolumab response prediction. | Download |
7 | Pembro Model | Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 120) for Pembrolizumab response prediction. | Download |
8 | Leave-Choueiri | PFT Model trained on 1,117 patients excluding the Choueiri cohort (16 patients). | Download |
9 | Leave-Miao | PFT Model trained on 1,116 patients excluding the Miao cohort (17 patients). | Download |
10 | Leave-Snyder | PFT Model trained on 1,112 patients excluding the Snyder cohort (21 patients). | Download |
11 | Leave-Zhao | PFT Model trained on 1,108 patients excluding the Zhao cohort (25 patients). | Download |
12 | Leave-SU2CLC2 | PFT Model trained on 1,108 patients excluding the SU2CLC2 cohort (25 patients). | Download |
13 | Leave-Hugo | PFT Model trained on 1,107 patients excluding the Hugo cohort (26 patients). | Download |
14 | Leave-Allen | PFT Model trained on 1,094 patients excluding the Allen cohort (39 patients). | Download |
15 | Leave-MGH | PFT Model trained on 1,099 patients excluding the MGH cohort (34 patients). | Download |
16 | Leave-Kim | PFT Model trained on 1,088 patients excluding the Kim cohort (45 patients). | Download |
17 | Leave-Riaz | PFT Model trained on 1,082 patients excluding the Riaz cohort (51 patients). | Download |
18 | Leave-Rose | PFT Model trained on 1,044 patients excluding the Rose cohort (89 patients). | Download |
19 | Leave-Gide | PFT Model trained on 1,060 patients excluding the Gide cohort (73 patients). | Download |
20 | Leave-SU2CLC1 | PFT Model trained on 1,031 patients excluding the SU2CLC1 cohort (102 patients). | Download |
21 | Leave-Liu | PFT Model trained on 1,026 patients excluding the Liu cohort (107 patients). | Download |
22 | Leave-IMmotion150 | PFT Model trained on 968 patients excluding the IMmotion150 cohort (165 patients). | Download |
23 | Leave-IMVigor210 | PFT Model trained on 835 patients excluding the IMVigor210 cohort (298 patients). | Download |
Other Materials
Below is a list of additional datasets, including gene ID mapping, cancer type encoding, input examples, high-level concepts, and more.
Data | Description | Download |
---|---|---|
Cancer Code | Encoding for 33 cancer types. | Download |
Gene Code | Encoding for 15,672 genes. | Download |
Concepts | Details of 44 high-level concepts, including their corresponding gene sets, genes, and references. | Download |
Gene ID Map | A comprehensive gene ID mapping file, including ENS IDs, gene names, gene types, and Entrez gene IDs. | Download |
Input TPM Example | An example dataset from Gide cohort for Compass input used in response prediction. The first column represents cancer types, while the remaining columns contain gene expression TPM values. | Download |
Input Clinical Data | Clinical information corresponding to the Compass Input Example dataset from Gide cohort. | Download |
PT-Training Example | Sample dataset for Compass model pre-training, used to train the model. | Download |
PT-Test Example | Sample dataset for testing model performance during the pre-training. | Download |
Toy Raw Counts | Example raw count data, used for illustrating the conversion from raw counts to TPM values. | Download |
Toy TPM | Example TPM data derived from raw counts, used for illustrating raw count to TPM value conversion. | Download |
Gencode v36 Annotation | The version 36 Gencode annotation file. | Download |
Code Download
Access our code from GitHub repositories: