CancerProteome

Database content

Overview of CancerProteome

CancerProteome provides a user-friendly searching interface.

Exploring quantification and modification proteome helps revealing the alterations at protein levels, which leads to new biological and diagnostic knowledge with the potential to improve our understanding of malignant transformation and therapeutic outcomes. Besides canonical proteins, an increasing number of novel functional microproteins have been identified as oncogenic drivers or tumour suppressors in cancer. Here, we developed the CancerProteome database to comprehensively decipher proteome landscape across various cancer types.

Collection of MS data

We collected all available proteome datasets of clinical samples and cell lines from widely used proteome storage databases, such as CPTAC (https://pdc.cancer.gov), PRIDE (1), MassIVE.quant (2), PeptideAtlas (3), jPOST (4), Panorama (5) and iProX (6). The raw data and clinical-related information were downloaded for in-depth analysis. Both quantitative and PTM proteome across 21 cancer types were re-analyzed. Additionally, we acquired transcriptome datasets of corresponding patients of 10 cancer types, including 1,560 tumor and 368 control samples.

Information of MS-based datasets

Omics	Cancer abbreviation	Cancer	Resource	Clinical characteristic(sample number)	Sample type

Construction of proteins and microproteins theory library

We integrated all the ribo-seq-supported open reading frames (ORFs) from RPFdb (7), nuORFdb (8), TransLnc (9) and IEAtlas (10) along with their basic annotations. All ORFs with ‘NTG’ start codons and ‘TAA/TGA/TAG’ stop codons were performed following analysis. Based on the genome coordinates of ORFs and the corresponding annotation files, we obtained the sequences of ORFs using the ‘getblast’ function in the R ‘bedtoolsr’ package with default parameters. Only ORFs that produced peptides longer than 8 amino acids (aa) but smaller than 100 aa were further retained, defined as microproteins as previous studies (11,12). Those microproteins whose sequences were entirely contained within other (micro-)proteins were removed. After integrating with known protein information from human UniProt database (13), benchmarked proteins and microproteins theory library were constructed for further analysis.

Web interface

Quick start in CancerProteome

1. Main functions of the database are provided in menu bar form.

2. Click the icon of mouse button to start a quick search.

Three kinds of search ways in CancerProteome

1. Choose the cancer you are interested in.

2. Input gene symbol.

3. Choose the cancer you are interested in.

4. Choose the type of protein post-translational modification you are interested in.

5. Input gene symbol.

6. Choose the resource you are interested in.

Browse in CancerProteome

For the browse page, we provide differentially expressed proteins identified by proteome across cancers in CancerProteome (default significance thresholds: FDR<0.05 and fold change (FC) <1/1.5 or 1*1.5).

1. The size of the dot indicates the number of differentially expressed proteins across cancer.

2. The color of the dot indicates the type of omics data.

3. Click to browse differentially expressed proteins in specific omics data.

4. Mouse over the text to display the full name of the cancer.

Search result in CancerProteome

The result page of proteins is displayed as below.

1. Cancer.

2. Gene basic information.

3. Protein basic information.

4. Expression basic informaton.

5. Resource.

6. Click to view the detail information.

7. Protein post-translational modification basic information.

Basic Information

For each protein or modification site, we provide basic information, which is displayed as below.

1. You could directly go to the module of interest by click the axis.

2. The basic information of the protein/site.

3. Click the protein sequence to check the details.

Quantitative Proteome

1. Browse the hidden information.

2. Boxplot with a comparison of protein levels between tumor and control samples.

3. The number of tumor/control samples in the resource.

4. The correlation scatter for RNA and protein levels. Correlation coefficients are Spearman’s rho and calculated across cancer samples.

5. The protein abundances across different cancer proteomes.

6. Boxplot with a comparison of RNA levels between tumor and control samples.

Post-translational Modifications

1. Click to browse the sequence near the modification site.

2. Boxplot with a comparison of protein modification levels between tumor and control samples.

3. We applied linear regression to test all pairwise associations between modification site and its corresponding proteins across cancer samples.

4. The abundances of modification sites derived from the same protein across cancers.

Protein-Protein Association

1. Choose the cancer and display the pair of association you are interested in.

2. The network displays the proteins associated with the selected protein.

3. Click the point to display the detailed information of the pairwise correlation. Correlation coefficients are Spearman’s rho and calculated across cancer samples.

4. The valid information: "Yes" represents the associations were supported by the corresponding databases.

5. The correlation scatter for selected and associated protein levels. Correlation coefficients are Spearman’s rho and calculated across cancer samples.

Functional Enrichment

1. Choose the type of omics data you are interested in.

2. Choose the cancer you are interested in.

3. Choose the threshold of fold change you are interested in.

4. Choose the threshold of FDR you are interested in.

5. The result table of differentially expressed proteins/sites/genes.

6. The KEGG terms enriched for differentially expressed proteins/sites/genes.

7. The GO:BP terms enriched for differentially expressed proteins/sites/genes.

8. The Hallmark terms enriched for differentially expressed proteins/sites/genes.

9. The Reactome terms enriched for differentially expressed proteins/sites/genes.

Drug Sensitivity

1. The network displays the drugs associated with the selected protein.

2. Click the point to display the detailed information of the pairwise correlation.

3. Drug information.

4. Correlation information.

5. Cancer information for cell lines.

6. The correlation scatter for drug sensitivity and protein abundances. Correlation coefficients are Spearman’s rho and calculated across cancer cell lines.