LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

GMHCC: High-throughput Analysis of Biomolecular Data using Graph-based Multiple Hierarchical Consensus Clustering.

Photo from wikipedia

MOTIVATION Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the… Click to show full abstract

MOTIVATION Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the main challenges in analyzing this biomolecular data is to cluster their subtypes into subpopulations to facilitate subsequent downstream analysis. Recently, many clustering methods have been developed to address the biomolecular data. However, the computational methods often suffer from many limitations such as high dimensionality, data heterogeneity and noise. RESULTS In our study, we develop a novel Graph-based Multiple Hierarchical Consensus Clustering (GMHCC) method with an unsupervised graph-based feature ranking and a graph-based linking method to explore the multiple hierarchical information of the underlying partitions of the consensus clustering for multiple types of biomolecular data. Indeed, we first propose to use a graph-based unsupervised feature ranking model to measure each feature by building a graph over pairwise features and then providing each feature with a rank. Subsequently, to maintain the diversity and robustness of basic partitions, we propose multiple diverse feature subsets to generate several basic partitions and then explore the hierarchical structures of the multiple basic partitions by refining the global consensus function. Finally, we develop a new graph-based linking method, which explicitly considers the relationships between clusters to generate the final partition. Experiments on multiple types of biomolecular data including thirty-five cancer gene expression datasets and eight single-cell RNA-seq datasets validate the effectiveness of our method over several state-of-the-art consensus clustering approaches. Furthermore, differential gene analysis, gene ontology enrichment analysis, and KEGG pathway analysis are conducted, providing novel insights into cell developmental lineages and characterization mechanisms. AVAILABILITY The source code is available at GitHub: https://github.com/yifuLu/GMHCC. The software and the supporting data can be downloaded from: https://figshare.com/articles/software/GMHCC/17111291. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Keywords: consensus clustering; graph based; multiple hierarchical; biomolecular data; analysis

Journal Title: Bioinformatics
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.