In 2000, Weinberg et al first proposed six hallmarks of cancer (including Sustaining Proliferative Signaling, Evading Growth Suppressors, Resisting Cell Death, Enabling Replicative Immortality, Inducing Angiogenesis and Activating Invasion and Metastasis) that provided a logical framework for conceptualize the variety of neoplastic diseases. In 2011, to depict the feature of cancers more fully, they added another four hallmarks including Genome Instability and Mutation, Tumor-Promoting Inflammation, Reprogramming Energy Metabolism and Evading Immune Destruction. It is undoubtedly that the framework constructed by hallmarks had greatly improved the analysis on diversity of cancers. However, thus far, there is no gene sets collected for different hallmarks which make it hard to clarify the gene alternation features (including mutation, DNA methylation and copy number variation) in each hallmark. Recently, many research have focused on the integration of various cancer-related pathways or genes for analysis and found some significant results. This strongly emphasized the importance for constructing gene sets for hallmarks.
Hallmarks of cancer depict the logical framework for conceptualize the variety of neoplastic diseases. Over the past decades, the framework constructed by hallmarks had greatly promoted the analysis on diversity of cancers. The analysis and feature recognition of cancer data based on the hallmark conceptual framework has the potential great advantage: reduce cancers’ feature dimension, be explicable and a potential avenue for exploring mechanism of carcinogenesis. However, the overlap rate of the hallmark genes in current researches is low because of using different extraction methods, there are no gene sets collected systematically for different hallmarks so far. To this end, we developed a specifically designed database Cancer Hallmark Genes(CHG). The database is focused on integrating and annotating Hallmark features’ potential roles in human cancer processes. CHG collects and statistically analyzes 2838 genes in 55 pathways related to hallmarks of 22697 cancer samples, including the frequencies of mutation, methylation and CNV (copy number variation) for these genes in 34 types of cancer. Moreover, CHG evaluated the relationship of ten hallmarks and eventually divided the hallmarks into four classes through fisher’s exact test and unsupervised hierarchical clustering. CHG also constructed hallmark networks, calculated the degree and the between of genes belong to each networks and found positive correlation between two network centralities. Furthermore, CHG was used to predict the prognosis of breast cancer patients, and the accurate verification was obtained on independent data sets. The comprehensive database constructed by hallmarks feature set including specific gene sets, DNA mutation, CNV and epigenetic information providing us a new view for analysis on diversity and development of cancers.