CPTAC Pan-Cancer Data
pancancer-logo
Welcome to the CPTAC Pan-Cancer Analysis Page! This page provides information about the data generated by the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) through the application of large-scale proteomic and genomic analysis to conduct a comprehensive and interconnected proteogenomic characterization of the most prevalent types of cancer.

Along with clinical and imaging data, this proteogenomic resource will further our understanding of the molecular changes that drive cancer development and progression, and support efforts to identify commonalities across different types of cancer and develop new strategies for diagnosis, treatment, and prevention.

On this page, you will find a collection of the papers focused on a first analysis of the data, as well as links to the data itself and supplementary materials from these publications.
Publications
The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) investigates tumors from a proteogenomic perspective, creating rich multi-omics datasets connecting genomic aberrations to cancer phenotypes. To facilitate pan-cancer investigations, we have generated harmonized genomic, transcriptomic, proteomic, and clinical data for >1000 tumors in 10 cohorts to create a cohesive and powerful dataset for scientific discovery. We outline efforts by the CPTAC pan-cancer working group in data harmonization, data dissemination, and computational resources for aiding biological discoveries. We also discuss challenges for multi-omics data integration and analysis, specifically the unique challenges of working with both nucleotide sequencing and mass spectrometry proteomics data.
Cancer Cell 2023
Li, Y., Dou, Y., da Veiga Leprevost, F., Geffen, Y., et al. (2023). Proteogenomic data and resources for pan-cancer analysis, Cancer Cell, 41, 1397-1406.
Large-scale omics profiling has uncovered a vast array of somatic mutations and cancer-associated proteins, posing substantial challenges for their functional interpretation. Here we present a network-based approach centered on FunMap, a pan-cancer functional network constructed using supervised machine learning on extensive proteomics and RNA sequencing data from 1,194 individuals spanning 11 cancer types. Comprising 10,525 protein-coding genes, FunMap connects functionally associated genes with unprecedented precision, surpassing traditional protein-protein interaction maps. Network analysis identifies functional protein modules, reveals a hierarchical structure linked to cancer hallmarks and clinical phenotypes, provides deeper insights into established cancer drivers and predicts functions for understudied cancer-associated proteins. Additionally, applying graph-neural-network-based deep learning to FunMap uncovers drivers with low mutation frequency. This study establishes FunMap as a powerful and unbiased tool for interpreting somatic mutations and understudied proteins, with broad implications for advancing cancer biology and informing therapeutic strategies.
Nature Cancer 2024
Shi, Z., Lei, J. T., Elizarraras, J. M., & Zhang, B. (2024). Mapping the functional network of human cancer through machine learning and pan-cancer proteogenomics. Nature cancer, 10.1038/s43018-024-00869-z.
Fewer than 200 proteins are targeted by cancer drugs approved by the Food and Drug Administration (FDA). We integrate Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteogenomics data from 1,043 patients across 10 cancer types with additional public datasets to identify potential therapeutic targets. Pan-cancer analysis of 2,863 druggable proteins reveals a wide abundance range and identifies biological factors that affect mRNA-protein correlation. Integration of proteomic data from tumors and genetic screen data from cell lines identifies protein overexpression- or hyperactivation-driven druggable dependencies, enabling accurate predictions of effective drug targets. Proteogenomic identification of synthetic lethality provides a strategy to target tumor suppressor gene loss. Combining proteogenomic analysis and MHC binding prediction prioritizes mutant KRAS peptides as promising public neoantigens. Computational identification of shared tumor-associated antigens followed by experimental confirmation nominates peptides as immunotherapy targets. These analyses, summarized at https://targets.linkedomics.org, form a comprehensive landscape of protein and peptide targets for companion diagnostics, drug repurposing, and therapy development.
Cell 2024
Savage, S.R., Yi, X., Lei, J.T., Wen B., et al. (2024). Pan-cancer proteogenomics expands the landscape of therapeutic targets. Cell, S0092-8674(24)00583-X.
By combining mass-spectrometry-based proteomics and phosphoproteomics with genomics, epi-genomics, and transcriptomics, proteogenomics provides comprehensive molecular characterization of cancer. Using this approach, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) has characterized over 1,000 primary tumors spanning 10 cancer types, many with matched normal tissues. Here, we present LinkedOmicsKB, a proteogenomics data-driven knowledge base that makes consistently processed and systematically precomputed CPTAC pan-cancer proteogenomics data available to the public through ∼40,000 gene-, protein-, mutation-, and phenotype-centric web pages. Visualization techniques facilitate efficient exploration and reasoning of complex, interconnected data. Using three case studies, we illustrate the practical utility of LinkedOmicsKB in providing new insights into genes, phosphorylation sites, somatic mutations, and cancer phenotypes. With precomputed results of 19,701 coding genes, 125,969 phosphosites, and 256 genotypes and phenotypes, LinkedOmicsKB provides a comprehensive resource to accelerate proteogenomics data-driven discoveries to improve our understanding and treatment of human cancer. A record of this paper’s transparent peer review process is included in the supplemental information.
Cell Systems 2023
Liao, Y., Savage, S.R., Dou, Y., Shi, Z., Yi, X., Jiang, W., Lei, J.T., & Zhang, B. (2023). A proteogenomics data-driven knowledge base of human cancer. Cell Systems, S2405-4712(23)00214-4.
Despite the successes of immunotherapy in cancer treatment over recent decades, less than <10%–20% cancer cases have demonstrated durable responses from immune checkpoint blockade. To enhance the efficacy of immunotherapies, combination therapies suppressing multiple immune evasion mechanisms are increasingly contemplated. To better understand immune cell surveillance and diverse immune evasion responses in tumor tissues, we comprehensively characterized the immune landscape of more than 1,000 tumors across ten different cancers using CPTAC pan-cancer proteogenomic data. We identified seven distinct immune subtypes based on integrative learning of cell type compositions and pathway activities. We then thoroughly categorized unique genomic, epigenetic, transcriptomic, and proteomic changes associated with each subtype. Further leveraging the deep phosphoproteomic data, we studied kinase activities in different immune subtypes, which revealed potential subtype-specific therapeutic targets. Insights from this work will facilitate the development of future immunotherapy strategies and enhance precision targeting with existing agents.
Cell 2024
Petralia F., Ma W., Yaron TM., Caruso FP., et al. (2024). Pan-Cancer Proteogenomics Characterization of Tumor Immunity. Cell, 187, 1–23.
The availability of data from profiling of cancer patients with multiomics is rapidly increasing. However, integrative analysis of such data for personalized target identification is not trivial. Multiomics2Targets is a platform that enables users to upload transcriptomics, proteomics, and phosphoproteomics data matrices collected from the same cohort of cancer patients. After uploading the data, Multiomics2Targets produces a report that resembles a research publication. The uploaded matrices are processed, analyzed, and visualized using the tools Enrichr, KEA3, ChEA3, Expression2Kinases, and TargetRanger to identify and prioritize proteins, genes, and transcripts as potential targets. Figures and tables, as well as descriptions of the methods and results, are automatically generated. Reports include an abstract, introduction, methods, results, discussion, conclusions, and references and are exportable as citable PDFs and Jupyter Notebooks. Multiomics2Targets is applied to analyze version 3 of the Clinical Proteomic Tumor Analysis Consortium (CPTAC3) pan-cancer cohort, identifying potential targets for each CPTAC3 cancer subtype. Multiomics2Targets is available from https://multiomics2targets.maayanlab.cloud/.
Cell Reports Medicine 2024
Deng, E. Z., Marino, G. B., Clarke, D. J. B., Diamant, I., et al. (2024). Multiomics2Targets identifies targets from cancer cohorts profiled with transcriptomics, proteomics, and phosphoproteomics. Cell reports methods, 100839, S2667-2375(24)00212-1.
Cancer driver events refer to key genetic aberrations that drive oncogenesis; however, their exact molecular mechanisms remain insufficiently understood. Here, our multi-omics pan-cancer analysis uncovers insights into the impacts of cancer drivers by identifying their significant cis-effects and distal trans-effects quantified at the RNA, protein, and phosphoprotein levels. Salient observations include the association of point mutations and copy-number alterations with the rewiring of protein interaction networks, and notably, most cancer genes converge toward similar molecular states denoted by sequence-based kinase activity profiles. A correlation between predicted neoantigen burden and measured T cell infiltration suggests potential vulnerabilities for immunotherapies. Patterns of cancer hallmarks vary by polygenic protein abundance ranging from uniform to heterogeneous. Overall, our work demonstrates the value of comprehensive proteogenomics in understanding the functional states of oncogenic drivers and their links to cancer development, surpassing the limitations of studying individual cancer types.
Cell 2023
Li, Y., Porta-Pardo, E., Tokheim, C., Bailey, M.H., et al. (2023). Pan-cancer proteogenomics connects oncogenic drivers to functional states. Cell 186, 1–24.
Post-translational modifications (PTMs) play key roles in regulating cell signaling and physiology in both normal and cancer cells. Advances in mass spectrometry enable high-throughput, accurate, and sensitive measurement of PTM levels to better understand their role, prevalence, and crosstalk. Here, we analyze the largest collection of proteogenomics data from 1,110 patients with PTM profiles across 11 cancer types (10 from the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium [CPTAC]). Our study reveals pan-cancer patterns of changes in protein acetylation and phosphorylation involved in hallmark cancer processes. These patterns revealed subsets of tumors, from different cancer types, including those with dysregulated DNA repair driven by phosphorylation, altered metabolic regulation associated with immune response driven by acetylation, affected kinase specificity by crosstalk between acetylation and phosphorylation, and modified histone regulation. Overall, this resource highlights the rich biology governed by PTMs and exposes potential new therapeutic avenues.
Cell 2023
Geffen, Y., Anand, S., Akiyama, Y., Yaron, T.M., et al. (2023). Pan-cancer analysis of post-translational modifications reveals shared patterns of protein regulation. Cell, 186, 1–23.e1–e14.
We introduce a pioneering approach that integrates pathology imaging with transcriptomics and proteomics to identify predictive histology features associated with critical clinical outcomes in cancer. We utilize 2,755 H&E-stained histopathological slides from 657 patients across 6 cancer types from CPTAC. Our models effectively recapitulate distinctions readily made by human pathologists: tumor vs. normal (AUROC = 0.995) and tissue-of-origin (AUROC = 0.979). We further investigate predictive power on tasks not normally performed from H&E alone, including TP53 prediction and pathologic stage. Importantly, we describe predictive morphologies not previously utilized in a clinical setting. The incorporation of transcriptomics and proteomics identifies pathway-level signatures and cellular processes driving predictive histology features. Model generalizability and interpretability is confirmed using TCGA. We propose a classification system for these tasks, and suggest potential clinical applications for this integrated human and machine learning approach. A publicly available web-based platform implements these models.
Cell Reports Medicine 2023
Wang, J.M., Hong, R., Demicco, E.G., Tan, J., et al. (2023). Deep learning integrates histopathology and proteogenomics at a pan-cancer level. Cell Reports Medicine Vol. 4,101173.
DNA methylation plays a critical role in establishing and maintaining cellular identity. However, it is frequently dysregulated during tumor development and is closely intertwined with other genetic alterations. Here, we leveraged multi-omic profiling of 687 tumors and matched non-involved adjacent tissues from the kidney, brain, pancreas, lung, head and neck, and endometrium to identify aberrant methylation associated with RNA and protein abundance changes and build a Pan-Cancer catalog. We uncovered lineage-specific epigenetic drivers including hypomethylated FGFR2 in endometrial cancer. We showed that hypermethylated STAT5A is associated with pervasive regulon downregulation and immune cell depletion, suggesting that epigenetic regulation of STAT5A expression constitutes a molecular switch for immunosuppression in squamous tumors. We further demonstrated that methylation subtype-enrichment information can explain cell-of-origin, intra-tumor heterogeneity, and tumor phenotypes. Overall, we identified cis-acting DNA methylation events that drive transcriptional and translational changes, shedding light on the tumor’s epigenetic landscape and the role of its cell-of-origin.
Cancer Cell 2023
Liang, W.W., Lu, R.J., Jayasinghe, R.G., Foltz, S.M., et al. (2023). Integrative multi-omic cancer profiling reveals DNA methylation patterns associated with therapeutic vulnerability and cell-of-origin. Cancer Cell, 41, 1-19.
Supplementary Information
Additional Resources
Cancer Data Service
Description - A reharmonized genomic data freeze corresponding to a Pan-Cancer analysis of 10 tumor types is available via the NCI Cancer Data Service (CDS). More information about the NCI CDS is available here: https://datacommons.cancer.gov/repository/cancer-data-service. The primary study is registered at dbGaP Study Accession: phs001287.v16.p6 and the genomic raw sequencing files for this study are available at the Genomic Data Commons (GDC).
Warning