Compass - Download

Dataset Download

Access Note: Some cohorts require formal data access approval from their original repositories (EGA or dbGaP). For these, download raw data from the original repository and process it using our mRNA TPM processing pipeline.

Open-access cohorts can be downloaded directly using the links below. To obtain pre-processed gene expression data (pre-treatment mRNA TPM), please send a formal request to marinka@hms.harvard.edu (cc: wanxiang_shen@u.nus.edu) by completing our data request form.

Cohort	Cancer Type	Patients (R/NR)	Group	Reference	Accession ID	Download
IMmotion150	KIRC	165 (48/117)	Large cohort	McDermott et al. Nat Med, 2018	EGA: EGAS00001002928	Request from repository
IMvigor210	BLCA	298 (68/230)	Large cohort	IMvigor210 Study Group. Lancet, 2017	EGA: EGAS00001002556	Request from repository
Miao et al.	KIRC	17 (5/12)	Small cohort	Miao et al. Science, 2018	dbGaP: phs001493.v1.p1	Request from repository
Ravi et al. (SU2C-MARK)	NSCLC	102 (38/64) LUAD; 25 (8/17) LUSC	Large & Small cohorts	Ravi et al. Nat Genet, 2023	dbGaP: phs002822.v1.p1	Request from repository
Liu et al.	SKCM	107 (41/66)	Large cohort	Liu et al. Nat Med, 2019	dbGaP: phs000452.v3.p1	Request from repository
Van Allen et al.	SKCM	39 (13/26)	Medium cohort	Van Allen et al. Science, 2015	dbGaP: phs000452.v3.p1	Request from repository
Freeman et al. (MGH)	SKCM	34 (12/22)	Medium cohort	Freeman et al. Cell Rep. Med, 2022	dbGaP: phs002683.v1.p1	Request from repository
Zhao et al.	GBM	25 (11/14)	Small cohort	Zhao et al. Nat Med, 2019	SRA: PRJNA482620	Clinical data, mRNA data
Kim et al.	STAD	45 (12/33)	Medium cohort	Kim et al. Nat Med, 2018	ENA: PRJEB25780	Clinical data, mRNA data
Gide et al.	SKCM	73 (40/33)	Medium cohort	Gide et al. Cancer Cell, 2019	ENA: PRJEB23709	Clinical data, mRNA data
Riaz et al.	SKCM	51 (10/41)	Medium cohort	Riaz et al. Cell, 2017	BioProject: PRJNA356761	Clinical data, mRNA data
Hugo et al.	SKCM	26 (14/12)	Small cohort	Hugo et al. Cell, 2016	GEO: GSE78220	Clinical data, mRNA data
Rose et al.	BLCA	89 (16/73)	Medium cohort	Rose et al. BJC, 2021	GEO: GSE176307	Clinical data, mRNA data
Snyder et al.	BLCA	21 (7/14)	Small cohort	Snyder et al. PLoS Med, 2017	Zenodo: 10.5281/zenodo.546110	Clinical data, mRNA data
Choueiri et al.	KIRC	16 (3/13)	Small cohort	Choueiri et al. Clin Cancer Res, 2016	CRI iAtlas (Open)	Clinical data, mRNA data

Additionally, we provide paired patient samples consisting of pre- and post-ICI treatment mRNA expression data. Since some patients have multiple post-treatment samples, the dataset includes a total of 86 pre-post treatment pairs involving 78 patients across three cohorts: Riaz (n=43), Freeman (n=27), and Gide (n=16). These patients were treated with PD-1 (n=71), CTLA-4 + PD-1 (n=9), or CTLA-4 (n=6).

Cohort	Cancer Type	Patients (R/NR)	Group	Pre-treatment data	Post-treatment data
Riaz(n=43), Freeman(n=27), Gide(n=16)	SKCM	86(22/64)	Pre-Post treatment Pairs	Clinical data, mRNA data	Clinical data, mRNA data

Model Download

We provide pre-trained and fine-tuned Compass models for specific use cases. Click the links below to download.

No.	Model	Description	Download
1	PT Model	Base model pre-trained on pan-cancer TCGA transcriptomic datasets (33 cancer types), used for concept feature extraction.	Download
2	PFT Model	Partially fine-tuned model (PFT) on all ICI-patients (n = 1,133) for response prediction.	Download
3	LFT Model	Linear-probing fine-tuned model (LFT) on all ICI-patients (n = 1,133) for response prediction.	Download
4	Atezo Model	Multi-stage fine-tuned model (PFT->PFT) developed on bladder cancer patients (n = 354) for Atezolizumab response prediction.	Download
5	Ipi Model	Multi-stage fine-tuned model (PFT->LFT) developed on melanoma patients (n = 57) for Ipilimumab response prediction.	Download
6	Nivo Model	Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 105) for Nivolumab response prediction.	Download
7	Pembro Model	Multi-stage fine-tuned model (PFT->PFT) developed on melanoma patients (n = 120) for Pembrolizumab response prediction.	Download
8	Leave-Choueiri	PFT Model trained on 1,117 patients excluding the Choueiri cohort (16 patients).	Download
9	Leave-Miao	PFT Model trained on 1,116 patients excluding the Miao cohort (17 patients).	Download
10	Leave-Snyder	PFT Model trained on 1,112 patients excluding the Snyder cohort (21 patients).	Download
11	Leave-Zhao	PFT Model trained on 1,108 patients excluding the Zhao cohort (25 patients).	Download
12	Leave-SU2CLC2	PFT Model trained on 1,108 patients excluding the Ravi-2 cohort (25 patients).	Download
13	Leave-Hugo	PFT Model trained on 1,107 patients excluding the Hugo cohort (26 patients).	Download
14	Leave-Allen	PFT Model trained on 1,094 patients excluding the Allen cohort (39 patients).	Download
15	Leave-MGH	PFT Model trained on 1,099 patients excluding the Freeman (MGH) cohort (34 patients).	Download
16	Leave-Kim	PFT Model trained on 1,088 patients excluding the Kim cohort (45 patients).	Download
17	Leave-Riaz	PFT Model trained on 1,082 patients excluding the Riaz cohort (51 patients).	Download
18	Leave-Rose	PFT Model trained on 1,044 patients excluding the Rose cohort (89 patients).	Download
19	Leave-Gide	PFT Model trained on 1,060 patients excluding the Gide cohort (73 patients).	Download
20	Leave-SU2CLC1	PFT Model trained on 1,031 patients excluding the Ravi-1 cohort (102 patients).	Download
21	Leave-Liu	PFT Model trained on 1,026 patients excluding the Liu cohort (107 patients).	Download
22	Leave-IMmotion150	PFT Model trained on 968 patients excluding the IMmotion150 cohort (165 patients).	Download
23	Leave-IMVigor210	PFT Model trained on 835 patients excluding the IMVigor210 cohort (298 patients).	Download

Other Materials

Below is a list of additional datasets, including gene ID mapping, cancer type encoding, input examples, high-level concepts, and more.

Data	Description	Download
Cancer Code	Encoding for 33 cancer types.	Download
Gene Code	Encoding for 15,672 genes.	Download
Concepts	Details of 44 high-level concepts, including their corresponding gene sets, genes, and references.	Download
Gene ID Map	A comprehensive gene ID mapping file, including ENS IDs, gene names, gene types, and Entrez gene IDs.	Download
Input TPM Example	An example dataset from Gide cohort for Compass input used in response prediction. The first column represents cancer types, while the remaining columns contain gene expression TPM values.	Download
Input Clinical Data	Clinical information corresponding to the Compass Input Example dataset from Gide cohort.	Download
PT-Training Example	Sample dataset for Compass model pre-training, used to train the model.	Download
PT-Test Example	Sample dataset for testing model performance during the pre-training.	Download
Toy Raw Counts	Example raw count data, used for illustrating the conversion from raw counts to TPM values.	Download
Toy TPM	Example TPM data derived from raw counts, used for illustrating raw count to TPM value conversion.	Download
Gencode v36 Annotation	The version 36 Gencode annotation file.	Download

Code Download

Access our code from GitHub repositories:

GitHub: Immune-compass code

GitHub: Data pre-processing code