[Back to Issue 4 ToC] [Back to Journal Contents] [Back to Biochemistry (Moscow) Home page]

Sample-Wise and Gene-Wise Comparisons Confirm a Greater Similarity of RNA and Protein Expression Data at the Level of Molecular Pathways and Suggest an Approach for the Data Quality Check in High-Throughput Expression Databases


Mikhail Raevskiy1,a, Maxim Sorokin2,3,4,b, Aleksandra Emelianova1,c, Galina Zakharova1,d, Elena Poddubskaya1,e, Marianna Zolotovskaia4,5,f, and Anton Buzdin1,5,6,g*

1Digital Biodesign and Personalized Healthcare Research Center, Sechenov First Moscow State Medical University, 119991 Moscow, Russia

2Omicsway Corp., 340 S Lemon Ave, 6040, Walnut, 91789 CA, USA

3Oncobox Ltd., 121205 Moscow, Russia

4Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Moscow Region, Russia

5Sechenov First Moscow State Medical University, 119991 Moscow, Russia

6Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia

Received October 31, 2023; Revised March 13, 2024; Accepted March 13, 2024
Identification of genes and molecular pathways with congruent profiles in the proteomic and transcriptomic datasets may result in the discovery of promising transcriptomic biomarkers that would be more relevant to phenotypic changes. In this study, we conducted comparative analysis of 943 paired RNA and proteomic profiles obtained for the same samples of seven human cancer types from The Cancer Genome Atlas (TCGA) and NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) [two major open human cancer proteomic and transcriptomic databases] that included 15,112 protein-coding genes and 1611 molecular pathways. Overall, our findings demonstrated statistically significant improvement of the congruence between RNA and proteomic profiles when performing analysis at the level of molecular pathways rather than at the level of individual gene products. Transition to the molecular pathway level of data analysis increased the correlation to 0.19-0.57 (Pearson) and 0.14-057 (Spearman), or 2-3-fold for some cancer types. Evaluating the gain of the correlation upon transition to the data analysis the pathway level can be used to refine the omics data by identifying outliers that can be excluded from the comparison of RNA and proteomic profiles. We suggest using sample- and gene-wise correlations for individual genes and molecular pathways as a measure of quality of RNA/protein paired molecular data. We also provide a database of human genes, molecular pathways, and samples related to the correlation between RNA and protein products to facilitate an exploration of new cancer transcriptomic biomarkers and molecular mechanisms at different levels of human gene expression.
KEY WORDS: transcriptomics, proteomics, high-throughput analysis of human gene expression, cancer genomics, pathway activation level

DOI: 10.1134/S0006297924040126

Publisher’s Note. Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.