Overview
Translated non-canonical proteins derived from noncoding regions or alternative open reading frames (ORFs) can contribute to critical and diverse cellular processes. In the context of cancer, they also represent an under-appreciated source of targets for cancer immunotherapy through their tumor-enriched expression or by harboring somatic mutations that produce neoantigens. We introduce the largest database and proteogenomic analysis of searchable peptides to assess the prevalence of non-canonical ORFs (ncORFs) in cancer proteomes using a pan-cancer approach across more than 900 patient proteome and 18,991,018 MS/MS from 26 immunopeptidome datasets in fourteen cancer types. The integrative proteogenomic analysis of whole-cell proteomes and immunopeptidomes revealed peptide support for a nonredundant set of 9,760 ncORFs from upstream, downstream, and out-of-frame ncORFs as well as 12,811 long-noncoding RNAs, pseudogenes, and miscellaneous RNAs. Notably, 6,486 ncORFs were derived from differentially expressed protein-coding genes and 340 were ubiquitously translated across eight or more cancers. Collectively, we used a combination of bottom-up proteogenomic search and peptide-centric search to identify translated non-canonical proteins with putative important regulatory roles in cancer.