We have developed an open-source database to showcase translatable circular RNAs (circRNAs). We performed a comprehensive proteogenomic analysis integrating mass spectrometry-based proteomics from three proteomes in the CPTAC portal and transcriptomic data from both short-read and long-read sequencing of metastatic colorectal cancer (mCRC) cell lines and patient tissues. We detected open reading frames (ORFs) unique to the backspliced region of circRNAs to highlight peptides not encoded by their linear transcripts. This enabled us to identify 8,004 novel peptides derived from 5,462 circRNAs, including 5,694 from short-read sequencing, 1,960 from long-read sequencing, and 350 from both approaches. We hope this database will serve as a resource for evaluating the coding potential of circRNAs that could aid future mechanistic studies exploring their function in cancer, especially in mCRC. Please use this website, PepCircDB (Peptides in CircRNAs Database), to explore the results or download the data.
Types of Peptides
This database was developed based on our study leveraging both short- and long-read sequencing to maximize the number of novel, bona fide circRNA isoforms. We devised an open-source bioinformatics pipeline, CHRIS (CHaracterizing CircRNAs by Integrative Sequencing, available at https://github.com/ChrisMaherLab/CHRIS), to rescue circRNAs that previously eluded short-read based approaches. CHRIS uses a two-pass approach to identify circRNAs that are labeled as "rescued" when they pass the first-pass analysis, and circRNAs that are labeled as "chimeric read support" when they pass the second-pass analysis. Please refer to the paper for more details.