Integration of biological data via NMF for identification of human disease-associated gene modules through multi-label classification

Syed Alberuni; Sumanta Ray

doi:10.1371/journal.pone.0305503

Abstract

Proteins associated with multiple diseases often interact, forming disease modules that are critical for understanding disease mechanisms. This study integrates protein-protein interactions (PPIs) and Gene Ontology data using non-negative matrix factorization (NMF) to identify gene modules associated with human diseases. We leverage two biological sources of information, protein-protein interactions (PPIs) and Gene Ontology data, to find connections between novel genes and diseases. The data sources are first converted into networks, which are then clustered to obtain modules. Two types of modules are then integrated through an NMF-based technique to obtain a set of meta-modules that preserve the essential characteristics of interaction patterns and functional similarity information among the proteins/genes. Each meta-module is labeled based on its statistical and biological properties, and a multi-label classification technique is employed to assign new disease labels to genes. We identified 3,131 gene-disease associations, validated through a literature review, Gene Ontology, and pathway analysis.

Citation: Alberuni S, Ray S (2024) Integration of biological data via NMF for identification of human disease-associated gene modules through multi-label classification. PLoS ONE 19(12): e0305503. https://doi.org/10.1371/journal.pone.0305503

Editor: Hilary A. Coller, UC Los Angeles: University of California Los Angeles, UNITED STATES OF AMERICA

Received: May 31, 2024; Accepted: November 21, 2024; Published: December 12, 2024

Copyright: © 2024 Alberuni, Ray. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The author declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Introduction

One of the most challenging areas of computational biology research is analyzing the complex structure of Human Disease Networks (HDN), which represent an interconnected network of intricate relationships among human diseases [1, 2]. HDNs are formed by mapping human diseases or disorders based on their genetic origins or other characteristics. In this network, two diseases are linked by an edge if they share at least one associated gene. This network typically originates from bipartite networks containing information on both genes and diseases [3]. Understanding the phenotype similarities between different diseases is important for comprehending the underlying mechanisms of these complex networks, as most human diseases are linked to groups of genes rather than a single gene [4]. Genetic heterogeneity is a major challenge in the identification of the genetic causes of complex diseases [5, 6]. Unlike Mendelian diseases, which are caused by single gene mutations, complex diseases are influenced by an association of multiple genetic, environmental, and lifestyle factors. Gene modules are sets of genes that function in specific biological operations or pathways [7]. The identification of gene modules associated with diseases provides insights into the molecular mechanisms underlying them. Understanding the interactions between genes within these modules can help uncover drug targets, regulatory pathways, and biomarkers for specific complex diseases [8, 9]. Studying disease similarities is crucial for understanding underlying causes and developing treatments for complex diseases. Researchers use data mining and text mining techniques to predict and analyze disease-associated gene modules. The initial concept of the human disease network structure was proposed by Goh et al. [10]. Using the disease/disorder gene association information gathered from the Online Mendelian Inheritance in Man (OMIM: https://www.ncbi.nlm.nih.gov/omim) database, Goh et al. created a bipartite network consisting of disorder and disease genes connected by shared gene-disease relationships [11].

There are several related studies in the literature that have focused on the detection of disease-associated gene modules. In a study by Bandyopadhyay et al. [12], the authors developed a multi-objective framework to identify human protein complexes. These complexes were found to be linked to several disorder classes, including endocrine disorders, cancer, and others. Qiu et al. [13] introduced a novel network-based method that integrates gene expression and protein-protein interaction data to identify disease-associated gene modules. This approach is based on a regression model with a diffusion kernel, termed RegMOD. Wang et al. [14] proposed a novel algorithm, N2V-HC, to predict disease-associated modules based on deep learning of biological complex networks. The method involved three steps: integrating GWAS, eQTL summaries, and human interactomes; learning the node representation features in the network; and detecting modules using hierarchical clustering. Bhattacharjee et al. [15] introduced a new framework for analyzing the physical layout of disease-related PPI networks using graphlet distribution. In another study, Liu et al. [16] systematically analyzed the properties of disease proteins across 22 disorder classes. They developed two new measurement techniques, the enrichment ratio and P-value, for investigating disease proteins throughout cells/tissues and identified relationships between tissues or cells and disorder classes. Schulte-Sasse, Roman, et al. [17] proposed a machine learning method to predict 165 new cancer genes by integrating different omics datasets such as DNA methylation, gene expression and mutation with PPI networks. This EMOGI method was used to identify complex diseases and precision oncology. Hong-Dong Li et al. [18] introduced a new path for predicting genes linked to disease through the integration of various types of annotated protein sets from the molecular signature database (MSigDB) using the signal matrix as a gene feature to train the predicted disease gene model. Yang et al. [19] proposed a novel framework to identify functionally homogeneous gene modules and candidate disease-associated genes using a network-based semi-supervised learning, non-negative matrix factorization model which is called MapGene, a disease protein-prioritization algorithm. In the study, they predict the top 10 new disease genes linked with Parkinson’s disease and diabetes mellitus.

In this work, we present a framework for integrating two different biological datasets aimed at predicting disease-associated modules. Unlike most previous studies that used a single dataset, the current trend in understanding cellular processes and molecular interactions involves the integration of multiple biological datasets. The integration process involves disease-associated human PPIs and Gene Ontology-based similarity, using an NMF-based clustering technique as part of our proposed framework. We detect meta-modules, which are then utilized to predict disease modules. To ensure consistency, all data sources are converted into their respective biological networks for integration purposes. Disease-associated PPI information and Gene Ontology information are converted into the disease-associated human PPI network and the GO semantic similarity network, respectively. We create two sets of clusters or modules from these networks using the K-means clustering algorithm, which are then combined using the NMF-based clustering technique. The resulting meta-modules are classified into 22 disease or disorder classes based on their Z-scores using multi-label classification. The proposed framework for integrating biological datasets and utilizing non-negative matrix factorization (NMF), is illustrated in Fig 1. Additionally, GO and KEGG pathway enrichment analyses were conducted to evaluate the biological significance of these meta-modules. The uniqueness and motivation of the study:

Instead of many earlier studies that used a single dataset, our study integrates three biological data sources namely gene-disease association data, human protein-protein interaction data, and Gene Ontology data.
A non-negative matrix factorization (NMF) based clustering technique is applied here to integrate multiple biological data sources and obtain a set of meta-modules that preserve the essential characteristics of interaction patterns and functional similarity information among proteins/genes. These meta-modules are more robust and biologically relevant compared to typical single dataset analyses.
The meta-modules are labeled based on the z-score calculated using the expected number of disease-associated genes within each obtained meta-module. We applied multi-label classification to classify and annotate the unknown genes within the meta-modules. As each meta-module takes multiple labels corresponding to one of the 22 disease classes, we first determine which genes are not linked to any of the classes. Based on the labels of genes that share the same meta-module, these genes are predicted to be associated with a disease class.

Download:

Fig 1. Workflow describing the step-by-step process of the proposed framework.

Panel A: This panel illustrates the pre-processing of the raw datasets. The Human Disease Network consists of 22 disease/disorder classes. Their associated genes are mapped into the HPRD and STRING database to construct a PPI network and to create a semantic similarity network from Gene Ontology datasets. Panel B: the two networks (PPI Network and semantic similarity network) are partitioned using a simple k-means clustering. As a result, we got two categories of modules, which are integrated using a non-negative matrix factorization-based technique to identify meta-modules. Panel C: illustrates the labeling of meta-modules using the Z-Scores and the identification of human disease-associated gene modules through multi-label classification.

https://doi.org/10.1371/journal.pone.0305503.g001

Method

This section first discusses the workflow of the proposed method. Next, the background and formal details are described.

Workflow

Fig 1 discusses the workflow of the proposed method. The steps are described below.

Preprocessing of raw datasets: See panel-A of the Fig 1. The study leverages three biological data sources: namely (i) gene-disease association data, (ii) human protein-protein interaction (PPI) data, and (iii) Gene Ontology (GO) data. We downloaded the first one from the human disease network constructed by Goh et. al. which consists of 22 diseases/disorders classes (for a more detailed view of the network see S1 Fig) and their associated genes. For the human PPI data, we downloaded the interaction information from the Human Protein Reference Database (HPRD) [20] and STRING database [21]. The gene ontology dataset was obtained from the Gene Ontology Consortium(https://geneontology.org/).

Network construction from HDN: See panel-A of the Fig 1. The Human Disease Network consists of 22 disease/disorder classes and 7,070 genes associated with the disease/disorders classes. For each of the disease/disorder classes, we mapped the associated genes into the HPRD and STRING databases [https://string-db.org/] [21] to construct a network. Consequently, 22 PPI networks were constructed and subsequently merged to get a large network consisting of 15,246 interactions and 1,806 unique human proteins. For the construction and merging of the network, we considered the first neighbour of each protein/gene from the HPRD and STRING databases. The network is represented as an adjacency matrix of dimensions 1806 × 1806, where each entry in the matrix is represented by 0 or 1, indicating the presence or absence of a PPI connection. To construct the second network we calculate the semantic similarity score between the gene ontology terms associated with proteins/genes. Here the semantic similarity network was created using the relevance similarity measure proposed in [22]. As the score is between 0 to 1, each score was treated as the weight of an edge of the network. A high score indicates the presence of strong functional similarity between proteins/genes. For each score that was treated as the weight of an edge computation of semantic similarity, we used an R package called Csbl.go. [23]. Both of the network has the same dimensions.

Clustering of networks and integration: See panel-B of the Fig 1. The two networks (PPI network and semantic similarity network) are partitioned using a simple k-means clustering, with the number of clusters (k) determined by Silhouette Analysis (For more information, see S2 Fig). From the PPI network, four clusters are identified and from the semantic similarity network, 6 clusters are identified by the clustering analysis. Thus, two categories of modules are obtained, one derived from interaction information and the other from ontology information. The two categories of modules are then integrated by using a matrix factorization-based framework. Here, we use a non-negative matrix factorization technique [24] to fuse these two information obtaining a set of meta-modules (see section [Formal details and background] for a detailed description of nmf-based clustering). Using silhouette analysis, seven meta-modules are identified (details information, see S3 Fig).

Labeling meta-modules: The obtained meta-modules are labeled based on the Z–Score calculated using the expected number of disease-associated genes within each obtained meta-module. We calculate Z − Score of each meta module m_i and for each disease class d_j as: , where is the total number of genes of module m_i associated with disease d_j, represents expected number of genes of module m_i associated with the disease d_j. is obtained by computing the intersection between the genes of modules m_i with the genes associated with disease d_j. For , we take the expected number of genes associated with a specific disease d_j that should be present within a module m_i. This is written as |, where is number of genes within a meta-module m_i, whereas is the number of genes associated with disease d_j and N represents total genes associated with all diseases.

Identification of novel associations: See the panel-C of the Fig 1. We utilized multi-label classification to classify and annotate the unidentified genes within the meta-modules. Since each meta-module receives multiple labels corresponding to one of the 22 disease classes, we first identify genes that are not associated with any of these classes. These genes are then predicted to be associated with the disease classes corresponding to the labels of genes sharing the same meta-module.

Formal details and background

Here we have discussed the formal details and background of the developed methods.

Gene ontology-based semantic similarity

Gene Ontology(GO)-based semantic similarity (SS) is a pivotal concept in bioinformatics, allowing the comparison of GO terms or entities annotated with these terms by leveraging the ontology’s structure, properties, and annotation corpora. Over the past decade, the diversity and number of SS measures based on GO have significantly increased. These measures are applied in various domains, including functional coherence evaluation, protein interaction prediction, and disease gene prioritization [25–27]. A semantic similarity measure is characterized as a function which, when provided with two ontology terms or collections of terms annotating two entities, produces a numeric outcome indicating the proximity in meaning between them. This measure can be calculated for a pair of GO terms, referred to as term similarity, or for two gene products, each annotated with a set of GO terms, referred to as gene product similarity.

In the context of Gene Ontology (GO), two primary approaches are used to measure Semantic Similarity (SS): internal methods based on ontology structure and external methods based on external corpora. The simplest structural methods calculate the distance between two nodes as the number of edges in the path between them. This edge-counting approach is intuitive but assumes that all semantic links have the same weight, thereby defining SS as the inverse score of the semantic distance.

External methods frequently employ information-theoretic principles, which are less affected by variations in link density within the ontology graph. Measures based on information content (IC) stem from the concept that the similarity between two concepts can be gauged by how much information they share.

When dealing with gene products annotated with multiple GO terms, measuring gene product semantic similarity (SS) involves comparing sets of terms. Two primary strategies are commonly employed: pairwise and group-wise. Pairwise methods aggregate individual similarities between all terms annotating two gene products to derive an overall measure of functional similarity. Group-wise methods, on the other hand, directly compute gene product similarity using one of three approaches: set-based, graph-based, or vector-based. Set-based approaches consider only direct annotations and may overlook shared ancestry among GO terms. Graph-based approaches represent gene products as sub-graphs of the GO, incorporating both direct and inherited annotations for a more comprehensive model. Vector-based approaches represent gene products in vector space and determine functional similarity using vector similarity measures. Group-wise methods can also incorporate the Information Content (IC) of terms to weigh set similarity effectively.

This comprehensive understanding and selection of SS measures are crucial for biomedical researchers to choose the most appropriate approaches for their specific applications, especially given the increased complexity of GO and the need for efficient computation in the future generation of SS measures.

NMF based clustering

NMF.

Non-negative Matrix Factorization (NMF) is a dimensionality reduction technique widely used in various fields such as signal processing, image analysis, and topic modelling [28, 29]. It involves decomposing a given non-negative matrix into the product of two lower-dimensional matrices, both of which contain only non-negative values. NMF is particularly useful when dealing with data where the non-negativity constraint is meaningful, such as in image or text data.

The general idea of NMF can be expressed using the following equation for a given matrix Z of size A × B: Z ≈ PQ^T. Here, Z is the original matrix, P is an A × K matrix, Q is a B × K matrix, and K is the reduced dimensionality. The aim is to find the matrices P and Q such that their product approximates Z almost approximately, and K is typically chosen to be much smaller than both A and B.

The matrices P and Q are subject to the constraint that all their elements must be non-negative (P, Q ≥ 0)). This non-negativity constraint is crucial in applications where negative values do not have a meaningful interpretation.

The approximation error, or the difference between the original matrix Z and its approximation PQ^T, can be minimized using various optimization techniques. One common approach is to use iterative algorithms, such as multiplicative updates or gradient descent, to adjust the values in P and Q until the approximation is satisfactory.

The NMF process can be mathematically formulated as an optimization problem. Given a cost function C, the objective is to minimize C with respect to P and Q, subject to the non-negativity constraint:

The cost function C typically measures the difference between Z and PQ^T, such as the Frobenius norm or Kullback-Leibler divergence. The choice of K determines the reduced dimensionality of the factorization. The optimization process involves minimizing a cost function that quantifies the approximation error.

Integration of clustering using NMF.

The integration process aimed to select a set of meta-modules comprising genes/proteins co-occurring in two distinct categories of modules, thereby preserving the characteristics of two different data resources within these meta-modules. Fig 2 illustrates the integration process of modules from different categories. Generally, when dealing with multiple networks offering diverse perspectives, it becomes possible to extract modules that retain the unique characteristics of each view. The process begins with the clustering of networks corresponding to each view. The resulting clusters are then combined to generate a co-occurrence matrix that preserves the occurrence of nodes within modules of different categories. In Fig 2, genes g_i and g_j share the same module three times, resulting in a co-occurrence score of three in the matrix. The co-occurrence matrix (Z) is subsequently decomposed using NMF-based factorization techniques, denoted as Z ≈ PQ^T. Here, represents the projection of the original data onto a new set of basis vectors, also referred to as meta-cluster centroids, where k is the number of meta-clusters. Here, the silhouette Analysis determines the value of k, and seven meta-modules are identified by the clustering analysis. These meta-clusters can be additively combined using the columns of matrix . To measure the reconstruction error between the original data and the factors P and Q, the Frobenius norm is utilized.

Download:

Fig 2. Illustrates the integration process of meta-modules from different biological sources of information.

In the first step, the protein-protein interaction (PPI) data and GO-based semantic similarity data are clustered using the K-means clustering technique to obtain two categories of modules or clusters. Subsequently, these two types of modules are integrated through NMF-based techniques to generate a set of meta-modules.

https://doi.org/10.1371/journal.pone.0305503.g002

In this study, two different views of the gene nodes were considered: interaction information and gene ontology-based semantic similarity information (see Fig 1 for details).

Multi-label classification

Multi-label classification is a machine learning paradigm where instances can be simultaneously associated with multiple labels, in contrast to the traditional single-label classification [30]. In the latter, each instance is exclusively linked to a single class label whether multi-class or binary. The multi-label paradigm has garnered increasing attention and finds applicability across diverse domains, including text, audio data, still images, video, bioinformatics, and others.

The prevailing method in multi-label classification involves training independent classifiers for each label, commonly referred to as the binary relevance (BR) transformation. Essentially, this approach transforms a multi-label problem into multiple binary problems, with each label treated independently. Existing multi-label literature acknowledges the limitation of this method due to its failure to explicitly model dependencies between labels. Consequently, various algorithms have been proposed to address this limitation by incorporating and modelling label dependencies explicitly.

The label space is denoted as , encompassing all possible labels. For a given instance, the output is a subset signifying the presence of specific labels. The decision function f_l(X) assigns a binary value to each label based on the input features X, with 1 indicating presence and 0 indicating absence. The relationship between instances and labels is modelled by a binary indicator matrix Y, where Y_ij = 1 if label j is present for instance i, and Y_ij = 0 otherwise. A common strategy, known as Binary Relevance, involves training separate binary classifiers for each label independently. The final prediction Y_i for an instance is obtained as the union of individual label predictions ().

Hamming Loss is a commonly used evaluation metric in multi-label classification tasks. It measures the fraction of labels that are incorrectly predicted for a given set of instances. In the context of multi-label classification, where an instance can belong to multiple classes simultaneously, Hamming Loss is particularly relevant as it considers each label independently.

The Hamming Loss (H) is calculated as the average fraction of incorrect labels over all instances. If Y_ij represents the true binary label for instance i and label j, and represents the predicted binary label, then the Hamming Loss for a set of instances is given by: Here, N is the total number of instances, is the total number of labels, and ≠ denotes the inequality operator.

The Hamming Loss ranges from 0 to 1, where lower values indicate better performance. A Hamming Loss of 0 means perfect predictions, while a value of 1 indicates that all labels are incorrectly predicted.

Multi-label classification finds diverse applications in domains such as image classification (assigning multiple object labels to an image), text categorization (assigning multiple topics to a document), and bioinformatics (predicting multiple gene functions). This approach provides a flexible and nuanced understanding of complex data relationships by allowing instances to be associated with multiple labels simultaneously.

Results and discussions

In this section, experimental results are described.

Datasets used

The study leverages three biological data sources, namely (i) gene-disease association data, (ii) human protein-protein interaction data and (iii) Gene Ontology data. We have compiled a comprehensive list of human PPIs from two datasets: (1) CCSB human Interactome database consisting of 7,000 proteins and 13944 high-quality binary interactions [31–33], (2) The Human Protein Reference Database [34] consisting of 8920 proteins and 53184 PPIs. We collected Gene Ontology information of those selected proteins from the GO Consortium (http://www.geneontology.org/) [35]. However, the protein IDs in the PPI dataset are different from those in the GO dataset, which requires mapping between the two. We used the David Bio-informatics resource (https://david.ncifcrf.gov/home.jsp) [36] tool for protein IDs conversion. We then took the entire gene IDs in the HPRD dataset that match with the UniProt IDs in the GO database for a specific gene symbol.

HDN was created by Goh et al. with 22 disease/disorder classes and their associated genes mapped into the HPRD and STRING databases to create 22 PPI networks, which were then merged to get a network containing 15,246 interactions and 1806 unique human genes.

Labeling of meta-modules with disease class

The GO database was used to collect gene ontology information on these selected genes, and a semantic similarity network was created based on them. An adjacency and semantic similarity matrix were constructed and partitioned into four and six clusters using the K-means clustering technique. Thereafter, two modules or clusters are integrated to form meta-modules by using the NMF clustering technique.

From the PPI network four clusters and from the semantic similarity network 6 clusters are identified by the clustering analysis. Thus two categories of modules are obtained, one from interaction information and the other from ontology information. The two categories of modules are then integrated by using a matrix factorization-based framework.

The obtained meta-modules are labeled based on the z-score calculated using the expected number of disease-associated genes within each obtained meta-modules. Thus each meta module gets a z-score based on the expected number of disease-associated genes within it. Now a threshold is selected to label a meta module to one of the disease classes. The figure and table (see S4 Fig and S1 Table in supplementary for details) show the labeling of the meta-modules in the different disease classes. For example, meta-module-2 is predicted to be included in seven of the disease classes (Neurological, Ophthalmological, Skeletal, Cardiovascular, Cancer, and Endocrine). The genes within the meta-module are also labeled with the same disease class. Fig 3, shows a heatmap representing the membership of the meta-modules within each of the disease classes.

Download:

Fig 3. Heatmap representing the membership of meta-modules within the disease classes.

For example, meta-module-2 is associated with seven of the disease classes (Neurological, Ophthalmological, Skeletal, Cardiovascular, Cancer, and Endocrine).

https://doi.org/10.1371/journal.pone.0305503.g003

Prediction of gene-disease association

The labeled network was then trained using multi-label classification. The input was the concatenation of the adjacency and semantic similarity matrix of all the nodes. We have utilized the ‘makeMultilabelClassifierChainsWrapper’ function of mlr R [37] package for performing the classification. The Table 1 shows the multi-label Hamming loss of the classification results. The trained model is utilized to predict the class label of unknown genes. We have obtained 3131 genes associated with multiple disease classes. Fig 4 depicts the predicted gene-disease interaction network, illustrating how genes are associated with various disease classes. This network helps visualize potential novel interactions and supports further validation and exploration of these relationships. Table 2 lists some of the predicted gene-disease associations along with their supporting PubMed IDs, demonstrating the validity of these associations based on existing literature. The last column of Table 2 displays the PubMed IDs of the literature supporting the predicted associations. For instance, the association of the gene “MTHFD1” with cancer is supported by PubMed ID 35798877, indicating consistency with recent studies in cancer biology.

Download:

Fig 4. Predicted disease-gene interaction network consisting of genes and their associated disease/disorder classes.

https://doi.org/10.1371/journal.pone.0305503.g004

Download:

Table 1. The results of the classifiers for multi-label, multi-class dataset.

https://doi.org/10.1371/journal.pone.0305503.t001

Download:

Table 2. Predicted interactions between disease/disorder classes and human proteins.

PUBMED ID is shown as validation of the identified predicted interactions.

https://doi.org/10.1371/journal.pone.0305503.t002

Meta-modules are enriched with GO terms and pathways

To know the efficacy of the prediction of the genes within the meta-modules which are predicted to be associated with different disease classes we have performed a gene ontology-based analysis of all the meta-modules. The most significant and practical techniques for understanding the biological significance of the observed expression change are gene ontology and pathway-based analysis. This analysis reveals that meta-modules are significantly enriched with terms related to various biological processes and pathways, further supporting their relevance in disease contexts. Table 3 presents the enriched GO terms and pathways associated with each identified meta-module, providing insights into the biological significance of these modules. For example, meta-module 1 is enriched with the term “visual perception” (GO:0007601) with a p − value of 1.35E − 59, suggesting its potential role in ophthalmological diseases.

Download:

Table 3. Table shows significant Gene Ontology (GO), P − values and KEGG Pathway associated with the identified meta-modules.

https://doi.org/10.1371/journal.pone.0305503.t003

A short discussion of key findings and their theoretical implications

The study’s findings indicate that the integration of diverse biological data sources using NMF-based clustering can effectively identify novel gene-disease associations. The multi-label classification approach demonstrated strong predictive capabilities, with a high level of agreement between predicted associations and existing literature. These results support the theory that multiple biological datasets when integrated, can provide a more comprehensive understanding of disease mechanisms and gene functions.

The identification of meta-modules enriched with significant GO terms and pathways suggests that these modules represent functionally coherent groups of genes that may play critical roles in the development and progression of various diseases. This challenges the traditional view that single datasets are sufficient for understanding complex disease biology and supports the growing consensus in the field that integrative approaches are essential for uncovering hidden biological insights.

Conclusions

In this paper, we introduce a novel framework for predicting disease-associated modules by integrating multiple biological data sources. The study leverages disease-associated protein-protein interactions (PPIs) and Gene Ontology-based similarity as information sources, which are integrated using non-negative matrix factorization (NMF) based clustering. The data sources are converted into respective biological networks and integrated through the NMF-based clustering method. The resulting clusters, termed as meta-modules, are further classified into 22 diseases/disorders using a multi-label classification based on their Z-Scores.

To address the multi-label nature of the data, we employ the binary-relevance method, transforming the problem into multiple binary classification tasks. We evaluate various classifiers and identify the combination of binary relevance and the K-nearest neighbor (KNN) classifier as the most effective in predicting disease-associated modules. This study demonstrates the development of innovative computational techniques for analyzing large biological networks.

By integrating diverse biological datasets and employing advanced clustering and classification methods, the proposed framework provides valuable insights into disease-gene associations and lays a strong foundation for future research in precision medicine and disease diagnosis. This approach holds significant potential for enhancing our understanding of complex biological interactions and identifying key factors contributing to various diseases and disorders.

However, it is important to acknowledge the limitations of this study, including the quality and completeness of available data and the need for further experimental validation of the predicted associations. Future research should focus on expanding the data integration to include additional omics datasets, improving the methods for capturing dynamic interactions over time, and validating the findings through experimental studies. This will help refine the framework, extend its applications in personalized medicine, improve disease management strategies, and pave the way for novel therapeutic interventions, ultimately advancing the field of network and precision medicine.

Supporting information

S1 Fig. The Human Disease Network.

The figure illustrates a graphical representation of the Human Disease Network (HDN with disease or disorder classes represented by one set of nodes and interacted genes represented by another set of nodes.

https://doi.org/10.1371/journal.pone.0305503.s001

(TIF)

S2 Fig. Finding the optimal number of clusters using the average silhouette method.

The two networks (PPI Network and semantic similarity network) are partitioned using a simple k-means clustering with k determined by the Silhouette Analysis.

https://doi.org/10.1371/journal.pone.0305503.s002

(TIF)

S3 Fig. The silhouette analysis determines the value of k, and seven meta-modules are identified by the clustering analysis.

https://doi.org/10.1371/journal.pone.0305503.s003

(TIF)

S4 Fig. Z_score based labeling of meta–modules across different disease classes.

https://doi.org/10.1371/journal.pone.0305503.s004

(TIF)

S1 Table. The table shows Z_score value of each meta–modules in the different disease classes.

https://doi.org/10.1371/journal.pone.0305503.s005

(TIF)

References

1. Zhou X, Menche J, Barabasi AL, Sharma A. Human symptoms–disease network. Nature communications 2014; 5(1):4212. pmid:24967666
- View Article
- PubMed/NCBI
- Google Scholar
2. Wu X, Jiang R, Zhang M, Li S. Network-based global inference of human disease genes. Molecular systems biology 2008; 4(1):189. pmid:18463613
- View Article
- PubMed/NCBI
- Google Scholar
3. Goh KI, Choi IG. Exploring the human diseasome: the human disease network. Briefings in functional genomics 2012; 11(6):533–542. pmid:23063808
- View Article
- PubMed/NCBI
- Google Scholar
4. Guo W, Zeng T, Huang T, Cai YD. Disease cluster detection and functional. characterization IEEE Access 2020; 8:141958–141966.
- View Article
- Google Scholar
5. McClellan J, King MC. Genetic heterogeneity in human disease. Cell 2010; 141(2):210–217. pmid:20403315
- View Article
- PubMed/NCBI
- Google Scholar
6. Johnson G, Todd J. Strategies in complex disease mapping. Current opinion in genetics and development 2000; 10(3):330–334. pmid:10826983
- View Article
- PubMed/NCBI
- Google Scholar
7. Bar-Joseph Z, Gerber G, Lee T, Rinaldi N, Yoo J, Robert F, et al. Computational discovery of gene modules and regulatory networks. Nature biotechnology 2003; 21(11):1337–1342. pmid:14555958
- View Article
- PubMed/NCBI
- Google Scholar
8. Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nature reviews genetics 2011; 12(1):56–68. pmid:21164525
- View Article
- PubMed/NCBI
- Google Scholar
9. Alberuni S, Ray S. Predicting Disease-Associated Genes Through Interaction and Ontology-Based Inference Technique. In International Conference on Computational Intelligence in Communications and Business Analytics 2023 Jan 27 (pp. 259–272). https://doi.org/10.1007/978-3-031-48879-5_20
10. Goh KI, Cusick M, Valle D, Childs B, Vidal M, Barabasi AL. The human disease network. Proceedings of the National Academy of Sciences 2007; 104(21):8685–8690. pmid:17502601
- View Article
- PubMed/NCBI
- Google Scholar
11. Amberger J, Bocchini C, Scott A, Hamosh A. McKusick’s online Mendelian inheritance in man (OMIM®). Nucleic acids research 2009; 37(suppl1):D793–D796. pmid:18842627
- View Article
- PubMed/NCBI
- Google Scholar
12. Bandyopadhyay S, Ray S, Mukhopadhyay A, Maulik U. A multiobjective approach for identifying protein complexes and studying their association in multiple disorders. Algorithms for Molecular Biology 2015; 10(1):1–15. pmid:26257820
- View Article
- PubMed/NCBI
- Google Scholar
13. Qiu YQ, Zhang S, Zhang XS, Chen L. Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC bioinformatics 2010; 11:1–12. pmid:20070902
- View Article
- PubMed/NCBI
- Google Scholar
14. Wang T, Peng Q, Liu B, Liu Y, Wang Y. Disease module identification based on representation learning of complex networks integrated from GWAS, eQTL summaries, and human interactome. Frontiers in Bioengineering and Biotechnology 2020; 8:418. pmid:32435638
- View Article
- PubMed/NCBI
- Google Scholar
15. Bhattacharjee, D, Hossain, S, Sultana, R, Ray,S Topological inquisition into the ppi networks associated with human diseases through graphlet frequency distribution. In International Conference on Pattern Recognition and Machine Intelligence 2017 (pp. 431–437). https://doi.org/10.1007/978-3-319-69900-4_55
16. Liu W, Sun Z, Xie H. The analyses of human inherited disease and tissue-specific proteins in the interaction network. Journal of Biomedical Informatics 2016; 61:10–18. pmid:27012904
- View Article
- PubMed/NCBI
- Google Scholar
17. Schulte-Sasse R, Budach S, Hnisz D, Marsico A. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nature Machine Intelligence. 2021 Jun;3(6):513–26.
- View Article
- Google Scholar
18. Li HD, Deng C, Zhang XQ, Lin CX. A gene set-integrated approach for predicting disease-associated genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2022 Oct 14;20(6):3440–50.
- View Article
- Google Scholar
19. Yang K, Lu K, Wu Y, Yu J, Liu B, Zhao Y, et al. A network-based machine-learning framework to identify both functional modules and disease genes. Human Genetics. 2021 Jun;140:897–913. pmid:33409574
- View Article
- PubMed/NCBI
- Google Scholar
20. Peri S, Navarro J, Kristiansen T, Amanchy R, Surendranath V, Muthusamy B, et al. Human protein reference database as a discovery resource for proteomics. Nucleic acids research 2004; 32(suppl1):D497–D501. pmid:14681466
- View Article
- PubMed/NCBI
- Google Scholar
21. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research 2023; 51(D1):D638–D646. pmid:36370105
- View Article
- PubMed/NCBI
- Google Scholar
22. Schlicker A, Domingues F, Rahnenfuhrer J, Lengauer T. A new measure for functional similarity of gene products based on Gene Ontology. BMC bioinformatics 2006; 7(1):1–16. pmid:16776819
- View Article
- PubMed/NCBI
- Google Scholar
23. Ovaska K, Laakso M, Hautaniemi S. Fast gene ontology based clustering for microarray experiments. BioData mining. 2008 Dec;1:1–8. pmid:19025591
- View Article
- PubMed/NCBI
- Google Scholar
24. Lee D, Seung HS. Algorithms for non-negative matrix factorization. Advances in neural information processing systems. 2000;13.
- View Article
- Google Scholar
25. Li X, Wang Q, Zheng Y, Lv S, Ning S, Sun J, et al. Prioritizing human cancer microRNAs based on genes’ functional consistency between microRNA and cancer. Nucleic acids research 2011; 39(22):e153–e153. pmid:21976726
- View Article
- PubMed/NCBI
- Google Scholar
26. Ray, S, Bandyopadhyay, S, Mukhopadhyay, A, Maulik, U Incorporating fuzzy semantic similarity measure in detecting human protein complexes in PPI network: A multiobjective approach. In 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2013 (pp. 1–8). https://doi.org/10.1109/FUZZ-IEEE.2013.6622483
27. Mukhopadhyay A, Ray S, De M. Detecting protein complexes in a PPI network: a gene ontology based multi-objective evolutionary approach. Molecular BioSystems 2012; 8(11):3036–3048. pmid:22990765
- View Article
- PubMed/NCBI
- Google Scholar
28. Wang YX, Zhang YJ. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on knowledge and data engineering. 2012 Mar 6;25(6):1336–53.
- View Article
- Google Scholar
29. Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC bioinformatics. 2010 Dec;11:1–9. pmid:20598126
- View Article
- PubMed/NCBI
- Google Scholar
30. de Carvalho AC, Freitas AA. A tutorial on multi-label classification techniques. Foundations of Computational Intelligence Volume 5: Function Approximation and Classification. 2009 Jul 11:177–95. https://doi.org/10.1007/978-3-642-01536-6_8
- View Article
- Google Scholar
31. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 2005; 437(7062):1173–1178. pmid:16189514
- View Article
- PubMed/NCBI
- Google Scholar
32. Rolland T, Taşan M, Charloteaux B, Pevzner S, Zhong Q, Sahni N, et al. A proteome-scale map of the human interactome network. Cell 2014; 159(5):1212–1226. pmid:25416956
- View Article
- PubMed/NCBI
- Google Scholar
33. Luck K, Kim DK, Lambourne L, Spirohn K, Begg B, Bian W, et al. A reference map of the human binary protein interactome. Nature 2020; 580(7803):402–408. pmid:32296183
- View Article
- PubMed/NCBI
- Google Scholar
34. Peri S, Navarro J, Amanchy R, Kristiansen T, Jonnalagadda C, Surendranath V, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome research 2003; 13(10):2363–2371. pmid:14525934
- View Article
- PubMed/NCBI
- Google Scholar
35. Aleksander S, Balhoff J, Carbon S, Cherry J, Drabkin H, Ebert D, et al. The Gene Ontology knowledgebase in 2023. Genetics 2023; 224(1):iyad031. pmid:36866529
- View Article
- PubMed/NCBI
- Google Scholar
36. Huang D, Sherman B, Lempicki R. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 2009; 4(1):44–57. pmid:19131956
- View Article
- PubMed/NCBI
- Google Scholar
37. Montanes E, Senge R, Barranquero J, Quevedo J, Coz J, Hullermeier E. Dependent binary relevance models for multi-label classification. Pattern Recognition 2014; 47(3):1494–1508.
- View Article
- Google Scholar

[ref1] 1. Zhou X, Menche J, Barabasi AL, Sharma A. Human symptoms–disease network. Nature communications 2014; 5(1):4212. pmid:24967666
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Wu X, Jiang R, Zhang M, Li S. Network-based global inference of human disease genes. Molecular systems biology 2008; 4(1):189. pmid:18463613
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Goh KI, Choi IG. Exploring the human diseasome: the human disease network. Briefings in functional genomics 2012; 11(6):533–542. pmid:23063808
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Guo W, Zeng T, Huang T, Cai YD. Disease cluster detection and functional. characterization IEEE Access 2020; 8:141958–141966.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref5] 5. McClellan J, King MC. Genetic heterogeneity in human disease. Cell 2010; 141(2):210–217. pmid:20403315
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Johnson G, Todd J. Strategies in complex disease mapping. Current opinion in genetics and development 2000; 10(3):330–334. pmid:10826983
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Bar-Joseph Z, Gerber G, Lee T, Rinaldi N, Yoo J, Robert F, et al. Computational discovery of gene modules and regulatory networks. Nature biotechnology 2003; 21(11):1337–1342. pmid:14555958
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nature reviews genetics 2011; 12(1):56–68. pmid:21164525
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Alberuni S, Ray S. Predicting Disease-Associated Genes Through Interaction and Ontology-Based Inference Technique. In International Conference on Computational Intelligence in Communications and Business Analytics 2023 Jan 27 (pp. 259–272). https://doi.org/10.1007/978-3-031-48879-5_20

[ref10] 10. Goh KI, Cusick M, Valle D, Childs B, Vidal M, Barabasi AL. The human disease network. Proceedings of the National Academy of Sciences 2007; 104(21):8685–8690. pmid:17502601
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref11] 11. Amberger J, Bocchini C, Scott A, Hamosh A. McKusick’s online Mendelian inheritance in man (OMIM®). Nucleic acids research 2009; 37(suppl1):D793–D796. pmid:18842627
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref12] 12. Bandyopadhyay S, Ray S, Mukhopadhyay A, Maulik U. A multiobjective approach for identifying protein complexes and studying their association in multiple disorders. Algorithms for Molecular Biology 2015; 10(1):1–15. pmid:26257820
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref13] 13. Qiu YQ, Zhang S, Zhang XS, Chen L. Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC bioinformatics 2010; 11:1–12. pmid:20070902
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref14] 14. Wang T, Peng Q, Liu B, Liu Y, Wang Y. Disease module identification based on representation learning of complex networks integrated from GWAS, eQTL summaries, and human interactome. Frontiers in Bioengineering and Biotechnology 2020; 8:418. pmid:32435638
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref15] 15. Bhattacharjee, D, Hossain, S, Sultana, R, Ray,S Topological inquisition into the ppi networks associated with human diseases through graphlet frequency distribution. In International Conference on Pattern Recognition and Machine Intelligence 2017 (pp. 431–437). https://doi.org/10.1007/978-3-319-69900-4_55

[ref16] 16. Liu W, Sun Z, Xie H. The analyses of human inherited disease and tissue-specific proteins in the interaction network. Journal of Biomedical Informatics 2016; 61:10–18. pmid:27012904
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref17] 17. Schulte-Sasse R, Budach S, Hnisz D, Marsico A. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nature Machine Intelligence. 2021 Jun;3(6):513–26.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref18] 18. Li HD, Deng C, Zhang XQ, Lin CX. A gene set-integrated approach for predicting disease-associated genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2022 Oct 14;20(6):3440–50.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref19] 19. Yang K, Lu K, Wu Y, Yu J, Liu B, Zhao Y, et al. A network-based machine-learning framework to identify both functional modules and disease genes. Human Genetics. 2021 Jun;140:897–913. pmid:33409574
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref20] 20. Peri S, Navarro J, Kristiansen T, Amanchy R, Surendranath V, Muthusamy B, et al. Human protein reference database as a discovery resource for proteomics. Nucleic acids research 2004; 32(suppl1):D497–D501. pmid:14681466
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref21] 21. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research 2023; 51(D1):D638–D646. pmid:36370105
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref22] 22. Schlicker A, Domingues F, Rahnenfuhrer J, Lengauer T. A new measure for functional similarity of gene products based on Gene Ontology. BMC bioinformatics 2006; 7(1):1–16. pmid:16776819
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref23] 23. Ovaska K, Laakso M, Hautaniemi S. Fast gene ontology based clustering for microarray experiments. BioData mining. 2008 Dec;1:1–8. pmid:19025591
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref24] 24. Lee D, Seung HS. Algorithms for non-negative matrix factorization. Advances in neural information processing systems. 2000;13.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref25] 25. Li X, Wang Q, Zheng Y, Lv S, Ning S, Sun J, et al. Prioritizing human cancer microRNAs based on genes’ functional consistency between microRNA and cancer. Nucleic acids research 2011; 39(22):e153–e153. pmid:21976726
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref26] 26. Ray, S, Bandyopadhyay, S, Mukhopadhyay, A, Maulik, U Incorporating fuzzy semantic similarity measure in detecting human protein complexes in PPI network: A multiobjective approach. In 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2013 (pp. 1–8). https://doi.org/10.1109/FUZZ-IEEE.2013.6622483

[ref27] 27. Mukhopadhyay A, Ray S, De M. Detecting protein complexes in a PPI network: a gene ontology based multi-objective evolutionary approach. Molecular BioSystems 2012; 8(11):3036–3048. pmid:22990765
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref28] 28. Wang YX, Zhang YJ. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on knowledge and data engineering. 2012 Mar 6;25(6):1336–53.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref29] 29. Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC bioinformatics. 2010 Dec;11:1–9. pmid:20598126
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref30] 30. de Carvalho AC, Freitas AA. A tutorial on multi-label classification techniques. Foundations of Computational Intelligence Volume 5: Function Approximation and Classification. 2009 Jul 11:177–95. https://doi.org/10.1007/978-3-642-01536-6_8
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref31] 31. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 2005; 437(7062):1173–1178. pmid:16189514
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref32] 32. Rolland T, Taşan M, Charloteaux B, Pevzner S, Zhong Q, Sahni N, et al. A proteome-scale map of the human interactome network. Cell 2014; 159(5):1212–1226. pmid:25416956
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref33] 33. Luck K, Kim DK, Lambourne L, Spirohn K, Begg B, Bian W, et al. A reference map of the human binary protein interactome. Nature 2020; 580(7803):402–408. pmid:32296183
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref34] 34. Peri S, Navarro J, Amanchy R, Kristiansen T, Jonnalagadda C, Surendranath V, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome research 2003; 13(10):2363–2371. pmid:14525934
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref35] 35. Aleksander S, Balhoff J, Carbon S, Cherry J, Drabkin H, Ebert D, et al. The Gene Ontology knowledgebase in 2023. Genetics 2023; 224(1):iyad031. pmid:36866529
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref36] 36. Huang D, Sherman B, Lempicki R. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 2009; 4(1):44–57. pmid:19131956
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref37] 37. Montanes E, Senge R, Barranquero J, Quevedo J, Coz J, Hullermeier E. Dependent binary relevance models for multi-label classification. Pattern Recognition 2014; 47(3):1494–1508.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

Figures

Abstract

Introduction

Method

Workflow

Formal details and background

Gene ontology-based semantic similarity

NMF based clustering

NMF.

Integration of clustering using NMF.

Multi-label classification

Results and discussions

Datasets used

Labeling of meta-modules with disease class

Prediction of gene-disease association

Meta-modules are enriched with GO terms and pathways

A short discussion of key findings and their theoretical implications

Conclusions

Supporting information

S1 Fig. The Human Disease Network.

S2 Fig. Finding the optimal number of clusters using the average silhouette method.

S3 Fig. The silhouette analysis determines the value of k, and seven meta-modules are identified by the clustering analysis.

S4 Fig. Z_score based labeling of meta–modules across different disease classes.

S1 Table. The table shows Z_score value of each meta–modules in the different disease classes.

References