Overview
We have developed an open-source database to showcase translatable circular RNAs (circRNAs). We performed a comprehensive proteogenomic analysis integrating mass spectrometry-based proteomics from three proteomes in the CPTAC portal and transcriptomic data from both short-read and long-read sequencing of metastatic colorectal cancer (mCRC) cell lines and patient tissues. We detected open reading frames (ORFs) unique to the backspliced region of circRNAs to highlight peptides not encoded by their linear transcripts. This enabled us to identify novel 6,848 peptides derived from 4,583 circRNAs, including 5,854 from short-read sequencing, 994 from long-read sequencing, and 304 from both approaches. We hope this database will serve as a resource for evaluating the coding potential of circRNAs that could aid future mechanistic studies exploring their function in cancer, especially in mCRC. Please use this website, PepCircDB (Peptide Encoding CircRNA DataBase), to explore the results or download the data.
Types of Peptides

This database was developed based on our study leveraging both short- and long-read sequencing to maximize the number of novel, bona fide circRNA isoforms. We devised an open-source bioinformatics pipeline, CHRIS (CHaracterizing CircRNAs by Integrative Sequencing, available at https://github.com/ChrisMaherLab/CHRIS), to rescue circRNAs that previously eluded short-read based approaches. CHRIS uses a two-pass approach to identify circRNAs that are labeled as "rescued" when they pass the first-pass analysis, and circRNAs that are labeled as "chimeric read support" when they pass the second-pass analysis. Please refer to the paper for more details.
