Figures
Abstract
The paper introduces a novel approach for constructing a global model utilizing multilayer perceptron (MLP) neural networks and dispersed data sources. These dispersed data are independently gathered in various local tables, each potentially containing different objects and attributes, albeit with some shared elements (objects and attributes). Our approach involves the development of local models based on these local tables imputed with some artificial objects. Subsequently, local models are aggregated using weighted techniques. To complete, the global model is retrained using some global objects. In this study, the proposed method is compared with two existing approaches from the literature—homogeneous and heterogeneous multi-model classifiers. The analysis reveals that the proposed approach consistently outperforms these existing methods across multiple evaluation criteria including classification accuracy, balanced accuracy, F1−score, and precision. The results demonstrate that the proposed method significantly outperforms traditional ensemble classifiers and homogeneous ensembles of MLPs. Specifically, the proposed approach achieves an average classification accuracy improvement of 15% and a balanced accuracy enhancement of 12% over the baseline methods mentioned above. Moreover, in practical applications such as healthcare and smart agriculture, the model showcases superior properties by providing a single model that is easier to use and interpret. These improvements underscore the model’s robustness and adaptability, making it a valuable tool for diverse real-world applications.
Citation: Przybyła-Kasperek M, Marfo KF (2024) A multi-layer perceptron neural network for varied conditional attributes in tabular dispersed data. PLoS ONE 19(12): e0311041. https://doi.org/10.1371/journal.pone.0311041
Editor: Kalapraveen Bagadi, Vellore Institute of Technology - Amaravati Campus: VIT-AP Campus, INDIA
Received: December 10, 2023; Accepted: September 9, 2024; Published: December 2, 2024
Copyright: © 2024 Przybyła-Kasperek, Marfo. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in the article is publicly available from the UCI repository. The specific links are as follows: Vehicle Silhouettes: https://archive.ics.uci.edu/dataset/149/statlog+vehicle+silhouettes Dry Bean: https://archive.ics.uci.edu/dataset/602/dry+bean+dataset Sensorless Drive Diagnosis: https://archive.ics.uci.edu/dataset/325/dataset+for+sensorless+drive+diagnosis Crowd Sourced: https://archive.ics.uci.edu/dataset/400/crowdsourced+mapping.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Machine learning (ML) for dispersed data addresses the challenge of analyzing and utilizing data that is scattered across different sources, formats, and locations. This is increasingly important in the era of big data where data is often inconsistent, heterogeneous, and subject to privacy regulations. Data collected from various sources often lack a uniform structure, in that attributes and objects might differ significantly from one data set to another. In the healthcare sector, data on patients are stored across multiple hospitals and medical facilities, each with its own data management system. These data sets might differ in structure, terminology, and format. Additionally, data protection regulations (e.g., HIPAA in the U.S., GDPR in Europe) prevent the sharing of sensitive patient information between institutions without proper safeguards. Suppose multiple hospitals are working together to develop a predictive model for identifying patients who are at high risk of sepsis. Each hospital has its own data set which includes varying attributes such as patient vitals, lab results, and medication histories. The problem is how to collaboratively train a sepsis prediction model leveraging on these diverse data sets. A very important aspect is the ability to make use of inconsistent data that is available in dispersed form, however, due to the arbitrariness of attributes and objects present in local data sets as well as data protection laws that restrict the free flow of data, one has to be meticulous when dealing with dispersed data. ML for dispersed data is crucial in leveraging the full potential of big data across domains where data is fragmented and regulated. It enables organizations to collaboratively develop sophisticated models.
To clarify, certain assumptions have been made that define the area of the problem under consideration. To begin with, there is the assumption that data are available in tabular and dispersed form. Also, data are provided by independent entities that do not want to share data—storing data on a central server with other data from other sources. For this reason, there are a set of local decision tables where objects and conditional attributes of such tables need not satisfy any constraints—they do not have to be equal or same but they may have some shared attributes as well as objects. In the situation considered in the paper, we do not guarantee full confidentiality. We assume that the local entities agree to disclose information about what attributes are stored in the table—the names of these attributes and certain characteristics of the values stored in the table such as mean, median, minimum and maximum of the attributes in the local tables.
Researchers have been seeking for solutions associated with dispersed data in domains such as federated learning and distributed learning. Federated learning puts a greater emphasis on data protection. The general approach here is to distribute an initial model from a central server to all local spaces for the model to be trained locally. Trained parameter values from all local models are then sent to the central server where some aggregations are performed to produce a global model. The global model is then sent back to the local spaces for verification—local units can accept, modify or reject the global model. Such a process is performed iteratively until some acceptable convergence metric is achieved. In [1], a detailed description can be found. For such a global model to be constructed in federated learning, the assumption of an equal set of conditional attributes present in all local tables must be satisfied. Distributed learning on the other hand assumes that all data are available in a centralized form, for example in a single decision table (see [2]). The division into local sets is intentional and aims to improve the quality of the model’s classification or its ability to deal with huge data. Often, the process of creating local tables in distributed learning is focused on strengthening the local classifiers by sensitizing them to difficult cases. This approach assumes full access to all data and does not necessarily guarantee any data protection.
The model proposed in this paper is different from the two domains mentioned above. Namely, the proposed model does not impose any assumptions of homogeneity on the form of data present in local spaces whiles guaranteeing a certain level data protection. By sharing data on attribute names and general characteristics of the values stored in the tables, Individual data tuples and individual raw data are protected. Also, the proposed method does not employ an iterative process to reach a consensus but rather a non-iterative algorithm that leads to the construction of a global model.
The main contribution of the paper is to propose a method that generates a global neural network model based on dispersed data. To begin, local neural networks with the same structure are trained based on local tables where local tables have varied and unrestricted form—no constraints on the set of objects and the set of attributes. In order to generate local networks with the same structure, it is necessary to somehow modify the local tables. The goal is to generate local networks whose input layers considers the full set of conditional attributes present in all local tables. In each of the local tables, values for missing attributes are imputed by using certain characteristics determined based on other local tables containing the missing attributes. In this way, a set of extended local tables are prepared and used to train local MLP networks. This study uses the MLP networks as this is the initial take on the proposed approach, thus, it is appropriate to start with the standard neural network suitable for classification. Other types of networks such as the radial basis function neural networks as well as autoenconders are planned to be used in future work. In the next step, the local neural networks are aggregated. In this study, two approaches of aggregating networks are considered—average and sum of the weights from the local models. Finally, the aggregated model is re-trained with a sample of data that is shared and defined for the full set of conditional attributes. In this way the final global model is constructed.
In this paper, the proposed model is investigated in terms of many variants. The following features are tested:
- the method to substitute values of missing attributes in local tables (different number of artificial objects generated based on one original object are tested),
- the number of hidden layers in MLP networks (k-hidden layer networks are tested, k ∈ {1, 2}),
- the number of neurons in the hidden layer/s (different values are tested),
- the method of aggregating local neural networks (sum and average are tested).
The proposed method is compared with other methods from the literature to establish its quality. Two approaches are adopted as baseline methods. To begin, the ensembles of classifiers proposed in the paper [3]. It comprises creating three base classifiers: k-nearest neighbors, decision tree and naive bayes classifier (KNN, DT, NB) based on each local table. The final decision is made by voting. The second approach is a homogeneous ensemble of classifiers and consists of generating MLP networks based on each local table separately and then generating the final decision by voting. It is shown in this paper that the proposed model produces much better results than both baseline models. These differences are also confirmed with statistical tests.
The paper is organized as follows. In the second section, an overview of the literature is included. The third section presents the newly proposed adaptive approach. Here, a formal definition of dispersed data and description of steps of the process of building a global model based on dispersed data is included. The fourth section gives the experimental protocol and description of the data used. The fifth describes obtained results. Comparative analysis are also carried out in this section. Finally, a summary is presented in the conclusion section.
2 Related work
Artificial intelligence (AI) has transformed various sectors by integrating human-like abilities such as learning, reasoning, and perception into software systems. This advancement has enabled computers to execute tasks traditionally performed by humans. Fueled by enhancements in computational capacity, the availability of extensive data sets and the creation of state-of-the-art AI algorithms, AI applications have become widespread. Noteworthy examples include finger vein recognition [4], diabetic retinopathy detection [5], RNA Engineering [6], cancer detection [7], biomathematical challenges [8], and smart agriculture [9].
Ensemble learning deals with distributed data which is similar to the issues in this paper. It is a very popular technique in machine learning that is employed to boost the predictive performance of learning algorithms. The underling reasoning of using this approach is to tackle problems involving data sets that are too large to handle at once [10] or in situations where having access to a very small data set, at which data sampling is necessary to obtain reasonable results [11]. Another rationale for using this approach may be to cope with the issue of identifying the right model for the considered problem [12]. To expound, rather than risking selecting the wrong model, one can use a heterogeneous approach of ensemble learning. This approach also works well for problems whose solution space is quite large, thus, faces the risk of getting stuck in local minima/maxima [13]. Many different approaches involving the use of neural networks to address the above mentioned problems have been proposed. Such solutions are proposed in areas such as the business field [14], malware detection [15] and audio classification [16], however, all these approaches assume free access to data and a necessary condition that all data is stored in a centralized form rather than a dispersed one.
Federated learning is another approach within distributed machine learning [1]. Different from classifier ensembles, it puts the greatest emphasis on data segregation and protection [17]. Here, the assumption is that data are available in separate sets that must not be centralized. The idea is to build local models separately and generate a global model in a central space by iteratively aggregating the local models. Neural networks are well applicable here as it is relatively simple to aggregate these models while maintaining high quality [18, 19]. There are types of federated learning: horizontal, vertical and hybrid federated learning. The latter approach is the closest to the approach proposed in this paper, however, unlike the proposed approach, hybrid federated learning requires that different parties share the data identity information which is a threat to the privacy of local clients [20]. Unfortunately, for the considered data sets, it is impossible to apply this approach due to the hybrid nature of the partitioning—regarding both objects and attributes—and the inability to obtain identity information about objects between dispersed data sets. Many different models are proposed in federated learning with various aggregation methods, network types and applications being considered in the literature [21–23].
Another approach to the problem of classification based on dispersed data is to build a separate model that aggregates prediction vectors generated by independent local models. Data privacy is also preserved here as only prediction vectors are consolidated. The form of the data can be completely arbitrary in this approach but here, a global model is not generated and the algorithm is non-iterative. Instead, it generates a separate model that only aggregates the prediction results obtained by local models. The local models can be of a completely different type than the aggregation model. In the literature, one can find papers that use neural networks, decision trees or other models as the aggregation model [24–28]. Statistical as well as dynamic approaches to this issue are also proposed which also consider conflicts or compatibility of local classifiers [29–31]. However, in the present study, the approach considered is different as the goal is to determine a global model based on dispersed data.
MLPs have been key in developing neural networks and machine learning. Although more complex models like Convolutional Neural Networks (CNNs) and Transformers have emerged, recent improvements have renewed the importance and usefulness of MLPs particularly where simplicity and efficiency are needed. Techniques such as Adam and RMSProp [32] have enhanced MLP training by dynamically adjusting learning rates, leading to faster convergence and improved generalization. Incorporating residual connections within MLPs akin to ResNet architectures [33] has mitigated the problem of gradient vanishing, enabling the training of deeper MLP models. MLPs traditionally require large amounts of labeled data to perform effectively. Techniques such as data augmentation and transfer learning are being adapted to address this limitation [34]. Some of the techniques mentioned above (e.g Adam optimizer [35]) are used in this paper for MLP. But, to the best of our knowledge, MLP networks have never been used in the way that is proposed in this paper—for dispersed data with different sets of attributes using augmentation of missing attribute values.
3 Basic concepts and proposed global model
In this section, we present preliminary designations as well as a detailed discussion on the proposed method for generating a global MLP network model based on dispersed data.
3.1 Dispersed data
A necessary assumption made is that data are available in a dispersed form—separate independent predefined data sets which are free of any constraints. In real applications, independent units collect data in tabular form. In tables, both sets of conditional attributes and sets of objects do not necessarily have to be disjoint as they may share common elements.
Also, there is an assumption that a set of decision tables is given. The tables are collected independently by separate units. A set of decision tables—local tables Di = (Ui, Ai, d)i ∈ {1, …, n} from one discipline is available, where Ui is the universe, a set of objects; Ai is a set of conditional attributes and d is a decision attribute. Decision tables are collected independently so both sets of objects and sets of attributes can have any form. They can have common elements between tables, but not necessarily. The only condition that must be satisfied by all local tables is the collection of data from one discipline. Formally, this is satisfied by the assumption that the same decision attribute is present in all tables.
Since different sets of attributes appear in local tables, the construction of a MLP local model based on each of the tables separately would create a set of networks with completely different structures. This is because the input layer in each neural network would be different since the feature vectors are not the same across the local tables, thus, making it impossible to aggregate local MLP networks into a single global model.
The approach proposed in this paper is completely different from previous studies as it has not been proposed in the literature until now. The steps of the approach are listed below.
- Determine a uniform MLP network structure for a set of local tables—dispersed data;
- Train a MLP network based on each local table separately.
- Aggregate MLP networks into a single model—a global MLP network;
- Post-train the global MLP network with a sample of global data.
Fig 1 shows the general steps of building the global MLP network model from dispersed data. In the first step, there is dispersed data—local tables with different sets of conditional attributes and different sets of objects. In order to build local neural networks with the same structure (the input layer requires the most attention here), the training data in each local space is imputed so as to have the same set of attributes. This step is carried out with the help of certain characteristics calculated from local tables. It is important to emphasize that the raw data is not shared at any model construction stage. In the next step, local MLP networks are trained, after which they are aggregated to construct a global network. The final step is to re-train the global network. In the study, this is done using a validation set.
All the steps are discussed in detail in the subsequent subsections.
3.2 Determine an uniform MLP network structure for a set of local tables—local models
Since the dispersed data need not satisfy any constraints, the key in determining the structure of the MLP network is in the number of neurons in the input layer. The output layer poses no problem since all local tables share the same decision attribute. The number of hidden layers as well as the number of neurons in the hidden layers are optimized experimentally. Thus, the most important challenge is to determine a common input layer. In this first study on the approach, it is proposed to unify the input layer by using all conditional attributes from local tables. So the input vector will have the dimension determined by the number of elements in the sum of conditional attributes present in the local tables
where card{X} is the number of objects in the set X. Such a sum is not a simple concatenation of attributes. We operate on sets, and we recognize attributes by their names. So the sum of the sets skips multi-duplicates—in case when one attribute appears in several tables it only appears once in the sum. It should also be noted that such a sum does not mean summing tuples from a table, but only determining the set of names of all attributes appearing in local tables.
Here a problem arises because local tables contain objects for which values are known only on a certain subset of the set . The question arises on how to train the local MLP networks with the input layer defined as above based on a local table with such objects. Fig 2 shows the overall configuration of the MLP network—local model used for each local decision table. In each of the local tables, a certain number of attributes (features) are included but not all of them. In order to make the network structure common for all local tables, the completion of missing values for a given local table is made. Of course, in each local table other missing values may occur. Completion of missing values is carried out by calculating values from local tables in which the attribute occurs. Local models are neural networks trained specifically on artificially created objects. That is, those that have completed values on attributes that are not present in the actual given local table. So only these artificial objects are used to train the neural network, the original objects are not used. The training process for these models involves a standard neural network built using the Keras library in Python, employing backward propagation over multiple epochs and steps within each epoch. In the next section, an explanation on how this problem is solved is given.
3.3 Training a MLP network based on each local table separately
In this section, the explanation on how to train a local MLP network based on a local table is given. Let us assume that a local table Dj = (Uj, Aj, d) is given, based on which a local MLP network is to be trained with an input layer containing neurons. For an object
from the local table Dj, values for attributes from the set Aj are specified, which means for each a ∈ Aj value
is given. Thus, in order to provide an input vector to the MLP network, the values on the other attributes from the set
must be determined. Let us assume that attribute b belongs to the set
and for this attribute one has to determine the value to be completed for the object
. In the proposed approach, this value is determined based on certain statistical measures: minimum, maximum, median and average calculated for values of attribute b occurring in other local tables in the dispersed data. In addition, the decision class of the object
is also taken into account. These measures were chosen as the most popular, frequently used in numerous calculations and characterize both the central tendency and the entire range of variation in the value of a given attribute. The paper is the first study of the approach using artificial objects. In future work, other statistical measures will be analyzed. It is planned to use quartiles and the average value offset by the standard deviation.
More strictly, let us assume that the object has a decision value v,
v ∈ Vd, where Vd is the set of values of the decision attribute d. For each of the decision tables to which the attribute b is present, the minimum, maximum, median and average are calculated for the values of the attribute b based on the objects in the decision class v. For each decision table Di for which b ∈ Ai the following values are computed:
In this way, values designated separately for each local table containing the attribute b are obtained. To determine the final value which is completed in the object and given to the input of the neural network, one of the statistical measures (minimum, maximum, mean or median) is applied on the local values determined in the previous step. Thus, one of the four measures for determining local values based on local tables and one of the four measures for determining the aggregate value. In all, there are 16 possible combinations from which one is chosen at random as the value of b. Suppose that for calculating local values the median is drawn, and for aggregate value, the minimum is drawn, then the value on attribute b is determined as follows
This method is repeated for each of the missing attributes for object
.
In the generalized version of the above method, instead of one object, k(k < = 16) objects are generated by selecting k distinct values from the 16 possible values as the value of b in each of the k objects generated from the original object . Thus, based on object
, k new objects would be generated with all values on conditional attributes
. This approach is also tested and the results are presented in the experimental part of the paper. Algorithm 1 presents the pseudo-code of the generalized version (in the basic version, it is enough to put k = 1), which implements this part of the model.
Algorithm 1 Pseudo-code of algorithm generating objects from one local table used for training the local MPL network
Input: One local decision table Dj = (Uj, Aj, d) for which we determine the training set for the MLP network; measures minimum, maximum, median and average ,
,
and
computed for each decision value v ∈ Vd and attribute b ∈ Ai based on the values stored in the table Di for each i ∈ {1, …, n};
a set of conditional attributes from all local tables; k parameter value that determines how many objects are generated based on one object from table Di.
Output: A data set used to train the MLP neural network, .
foreach x ∈ Uj
for m = 1 to k do
create an object from
by assigning values on the set Aj the same as the object x has
foreach attribute in the set b ∈ A\Aj
choose a pair (choice1, choice2) from the set
{MIN, MAX, AVG, MED} × {MIN, MAX, AVG, MED}
)
end foreach
end foreach
end foreach
The computational complexity of the above method is linearly dependent on the number of objects in the local table Dj, value of parameter k, the number of conditional attributes card{A} and the number of local tables in the dispersed data n. More precisely, the complexity resulting from the loop is O(card{Uj} ⋅ k ⋅ card{A\Aj} ⋅ card{Di, i = 1, …, n}). In the worst case, one can assume that there is only one conditional attribute in the table Dj, and for all other attributes, values have to be computed and the missing attributes are present in all other local tables except Dj. Then the complexity is O(card{Uj} ⋅ k ⋅ (card{A} − 1) ⋅ (n − 1)). The linear complexity of the algorithm proves that it can be used even for large dispersed data.
The data prepared in the above way in the next step is used for training MLP neural networks. As mentioned earlier, the input layer is defined by a set of conditional attributes from all local tables. The number of neurons in the output layer is equal to the number of decision classes. Each of the neurons determines the probability with which the test object belong to a given decision class. In the experimental part, one or two hidden layers are considered. The number of neurons in the hidden layer is determined in proportion to the number of neurons in the input layer. Different proportions are checked from 0.25 to 5 times the number of neurons from the input layer. In the case of two hidden layers, all combinations of the number of neurons in the hidden layers are checked such that: the first layer had the number of neurons from the set {0.25 × I, 0.5 × I, 0.75 × I, 1 × I, 1.5 × I, 1.75 × I, 2 × I, 2.5 × I, 2.75 × I, 3 × I, 3.5 × I, 3.75 × I, 4 × I, 4.5 × I, 4.75 × I, 5 × I}, and the second layer had the number of neurons from the set {1 × I, 2 × I, 3 × I, 4 × I, 5 × I} where I is the number of neurons in the input layer. For the hidden layer, the ReLU (Rectified Linear Unit) activation function is used, as it is the most popular activation function and gives very good results [36]. For the output layer, the softmax activation function is used, which is recommended when one deals with a multi-class problem [37]. In this paper, data sets containing from four to nineteen decision classes are analyzed. The neural network is trained by using the back-propagation method. A gradient descent method, with an adaptive step size is used in the back-propagation method. It is known that the softmax layer give good results with the Adam optimizer [35]. The Adam optimizer proposed in [38] and is one of the most popular adaptive step size methods. From [39], the categorical cross-entropy loss gives best results with softmax layer. That is why the Adam optimizer and the categorical cross-entropy loss function are used in the study.
The implementation of the MLP neural network from Keras library in Python is used. The algorithm that defines a neural network with one or two hidden layer with the rectified linear unit (ReLU) activation function and the number of neurons in the first hidden layer dependent on the parameter. Softmax activation function is used in the output layer. In the compilation, the categorical cross-entropy loss function, the Adam optimizer and the accuracy as the learning rate are used. For two hidden layers approach the second hidden layer with the ReLU activation function and the number of neurons dependent on the parameter is used. In the way described above, a set of local MLP networks are obtained. The number of networks is equal to the number of local decision tables. All networks have the same structure and this is a very important property necessary for the next step.
3.4 Aggregation of MLP networks into a single model—a global MLP network
The result of the previous stage is a set of local MLP networks which are trained and all have the same structure. Aggregation of such networks into a single global MLP model is relatively simple. The global network has exactly the same structure as each of the local networks i.e. the same number of layers and the same number of neurons in each layer. However, during aggregation, each local model may have a different impact on the construction of the global model. This influence is proportional to the quality of each local model’s classification on the training set. The method used is inspired by the second weighting system used in the AdaBoost algorithm [40].
For each local model, a classification error is estimated based on its training set (artificial objects generated using Algorithm 1). Let us denote by ei the classification error determined for the i−th local model i ∈ {1, …, n}. Since local models are built based on a piece of data, their accuracy can be very different. It may sometimes happen that their classification error is above 0.5. In order not to eliminate such local models from the aggregation stage as they may contain important information on specific attributes that may have a positive impact in the global model, the min-max normalization is applied to the interval [0, 0.5] of all errors ei, i ∈ {1, …, n}. After, the weights ωi for each local neural network i ∈ {1, …, n} is adjusted according to the formula:
(1)
The weights of global model are determined by one of two approaches: in the first approach, the weights for the global network are determined by the weighted average of the corresponding weights (assigned to edges connecting exactly the same neurons) present in local MLP networks with weights ωi, i ∈ {1, …, n}. The second approach is to determine the weight for the global network as the sum of the corresponding weights from the local networks with weights ωi, i ∈ {1, …, n}. The two approaches are studied separately in the experimental part of the paper.
Fig 3 illustrates the process of aggregating local models into a global model. Since all local models share the same structure, this aggregation is relatively straightforward. Each connection between neurons in the global model corresponds to the connections in the local models. The critical aspect of this process is the determination of weights, which are based on the classification performance of local models on their respective sets of artificial objects (training sets). The weights assigned to each local model are crucial as they influence the global model’s configuration. Local models that perform poorly in classification (possibly due to a higher number of missing attributes and thus less connection with reality, more values are artificial) are given smaller weight in shaping the global model. However, they are not entirely excluded from the aggregation. It will still contribute to the overall classification performance of the global model. This ensures that the global model benefits from the specialized capabilities of each local model, enhancing its overall classification quality.
The implementation of the global MLP network is done in Python. First the network’s structure is defined—the number of layers and neurons in the layers are the same as in local networks. Then the weights are not trained but assigned based on average or sum with consideration of the weights for the local networks of corresponding connections in local networks.
3.5 Re-training the global MLP network with a sample of data
The retraining process with global objects is a step that enhances the proposed model’s accuracy, generalization, and robustness. By carefully integrating and fine-tuning the local models, the global model achieves superior performance, making it a valuable tool for various real-world applications. This process integrates local models into a cohesive global model, enhancing overall accuracy and generalization. The global MLP network is re-trained using the validation set. This step involves adjusting the weights and biases of the aggregated model to fine-tune it for better performance. The retraining process ensures that the global model leverages the strengths of the local models while mitigating their individual weaknesses. A validation set, which is a subset of the training data, is used for the retraining process. The size of this validation set is smaller than the local models’ training sets but is crucial for capturing the model’s generality. The validation set helps in fine-tuning the global model to prevent overfitting and ensure it generalizes well to unseen data. What is important is that the objects in such a validation set must contain a global description of the objects, i.e. include attributes/characteristics present in all local tables. The integration of local models through retraining results in a significant boost in classification accuracy. The model benefits from the collective knowledge of all local datasets, leading to more accurate predictions.
The last stage is to re-train the global network. The training objects needed in this step should have values on the set of all conditional attributes . In the paper, this is implemented by using a validation set. Such a validation set is much smaller than the training sets for local models and will have less influence on the final form of the global neural network. However, without the use of this last step, the obtained quality of classification is unsatisfactory and we miss to capture a generality of the model. For the approach of generating one artificial object, the size of a validation set is about 21% of the size of the training set for local model. In the case of generating three artificial objects, the size of a validation set is about 7% of the size of the training set for local model. In future works, it is planned to test the active learning approach [41, 42] instead. In active learning, the assumption is that the model builds its own training data or changes the original training data.
After the completion of this step, the final form of the global model is obtained and is evaluated by using an independent test data set.
It should be noted that the model avoids overfitting through a series of carefully planned steps. The final stage of training involved re-training the global MLP network with a validation set. This validation set is smaller than the training sets for local models, but crucial in capturing the generality of the model. For generating one artificial object, the validation set os about 21% of the local model’s training set size. For three artificial objects, it is about 7%. This step is essential to prevent overfitting and ensure the model could generalize well to new, unseen data. Also during the selection of optimal parameters, we aimed for the best classification accuracy with the lowest possible model complexity, involving the fewest layers and neurons. This focus on simplicity helped in reducing the risk of overfitting.
4 Experimental setup
In order to assess the efficiency of the suggested model, the methodology of the experiment is analyzed within this section. The simulation platform, parameter allocation, and criteria for measuring performance are all elaborated upon. The scheme for describing the experimental methodology, which is widely used in the literature as shown in the work of [43, 44] is used below.
4.1 Simulation platform
All simulation is conducted using an open-source software Jupyter Notebook 6.5.4 and Anaconda 2023.09-0 (Sep 29, 2023) Installer Python Version: 3.11.5. The implementation of the proposed model is made using the Keras library in Python. The simulations were run on a computer with an Intel(R) Xeon(R) W-2235 CPU @ 3.80GHz 3.79 GHz processor and 32.0 GB RAM to avoid any bias in the analysis of the results, it is crucial that the study is conducted using the same compiler, on the same computing hardware, and with the same processing capabilities.
4.2 Data set
The experimental study uses data sets available in the UC Irvine Machine Learning Repository: Vehicle Silhouettes [45], Dry Bean [46], Sensorless Drive Diagnosis [47] and Crowd Sourced [48]. The characteristics of the data sets are given in Table 1.
Each of the data sets are originally available in non-dispersed form—each data in a single decision table. The training set are dispersed where different degrees of dispersion are considered. Each single data set is converted into five different dispersed versions: 3, 5, 7, 9 and 11 local tables respectively. During the construction of the local tables, a subset of attributes in each local table is considered. The number of attributes is significantly reduced in local tables as compared to the original table with some attributes repeating among some tables to satisfy. This is done to make provision for the possibility that some local tables may share common attributes. The full set of objects is stored in each local table but without their identifiers. More precisely, the number of local tables is first determined (e,g dispersed version with 5 local tables). Then the number of original set of conditional attributes is divided evenly among the local tables (so that each local table had more or less the same number of attributes). In addition, it is assumed that there are common attributes between the selected local tables, e.g. between table one and two we have two common attributes, between table two and three we have one common attribute, and so on. With the initial assumptions made, attributes are then randomly distributed between local tables. Once we have established sets of attributes in local tables then entire columns from the original tables are rewritten into local tables. In this way, we have the same sets of objects in all local tables.
The Sensorless data set is balanced with each decision class containing 5319 objects. The Vehicle, Dry Bean and Crowd Sourced data sets are imbalanced (Fig 4). The data are balanced but it is worth emphasizing that this process is carried out after the dispersion (to keep the approach as consistent with the real situation as possible). The Synthetic Minority Over-sampling Technique (SMOTE) method is used [49] for each local decision table separately. The implementation of this algorithm available in WEKA [50] software is used. The data considered are multiclass labeled so in each decision table and for each decision class except the most dominant one, the SMOTE method is used. As a result, all decision classes have the same number of objects after balancing. Finally, for each of the three original data sets, 5 dispersed versions of imbalanced data and 5 dispersed versions of balanced data are obtained. Thus, a total of 35 dispersed data are considered in the experimental part.
4.3 Parameter assignments
The proposed model comprises of three phases: structure phase, training phase, and testing phase. The structure phase involves determining the structure of local and global MLP models. The input layer of the model is strictly dependent on the data set—the number of neurons is equal to the number of attributes present in all local tables. The same is true of the output layer—the number of neurons is equal to the number of decision classes. However, as for the other parameters of the network, it is variable and determined experimentally. Also, the method of determining the value of missing attributes, as well as the number of artificial objects used is variable. Another parameter of the model is the method of aggregation of local networks. Different parameter values are tested. We conducted a comprehensive grid search to explore various configurations of the hyperparameters. This involved testing different combinations of the number of hidden layers, the number of neurons in each layer, methods for aggregating local neural networks, and strategies for handling missing attributes. By systematically varying these parameters, they are able to identify the optimal configuration that achieved the best performance. The optimal number of hidden layers and neurons in each layer is determined through the grid search. The authors tested various configurations, ranging from shallow networks with fewer layers to deeper networks with more layers and neurons. The chosen configuration provided the best trade-off between model complexity and classification accuracy. Different methods for aggregating the local neural networks are evaluated. The authors considered techniques such as averaging the weights of the local models and summing the weights. By employing a systematic and thorough approach to hyperparameter optimization, the authors ensured that their model is both robust and efficient. The experiments are carried out according to the following scheme:
- Different approaches to substitute values of missing attributes in local tables are studied—one or three artificial objects are generated based on one original object in local table.
- Different approaches to aggregating local neural networks are studied—using average of weights or sum of weights.
- Different numbers of hidden layers in the local and global networks are studied—one or two hidden layers.
- Different numbers of neurons in hidden layers are studied. The number is determined in proportion to the number of neurons in the input layer. The following values are tested: for the first hidden layer {0.25, 0.5, 0.75, 1, 1.5, 1.75, 2, 2.5, 2.75, 3, 3.5, 3.75, 4, 4.5, 4.75, 5} × the number of neurons in the input layer; for the second hidden layer {1, 2, 3, 4, 5} × the number of neurons in the input layer.
So in total, 384 different experiments for the proposed method are conducted—384 different settings of the approaches analyzed (2 ⋅ 2 ⋅ 16 + 2 ⋅ 2 ⋅ 16 ⋅ 5). In the tables in Section 5, we show both the results from each parameters settings, as well as the optimal parameter values. The optimal parameters are chosen as those provide the best classification accuracy with the lowest possible model complexity—the lowest number of layers and the lowest number of neurons used in the model.
4.4 Formulations of the performance metrics
The quality of classification are evaluated based on the test set. A classification accuracy measure (acc) is used for this purpose. That is, a fraction of the total number of objects in the test set that are classified correctly. As is mentioned in the previous section, in the final step the aggregated model is re-trained. To do this, the use a validation set containing objects that have values on all conditional attributes present in the dispersed data is used—attributes occurring in all local tables. The validation set is obtained by dividing the original test set randomly but in a stratified manner into two equal parts. First, one part is used as the validation set (for re-training process) and the second part is used to assess the quality of classification. Then the roles reverses as the second part acts as the validation set. Finally, both results are averaged. Each experiment is repeated three times; in the following section, all results given are the average of these three runs.
4.5 Reproducibility of the proposed model
The structure phase aims to prepare the local MLP neural networks with a consistent architecture. This involves identifying common attributes among the local tables and addressing any missing attributes. The next step involves supplementing the local tables with additional objects, assigning values to the missing attributes. Following this, the training phase optimizes the network weights. During the testing phase, the model’s accuracy is validated on test objects with values defined for all attributes. To assess the effectiveness of the proposed model, a comparative analysis is performed against baseline training methods. Given the inherent randomness in the structure phase—where missing values are filled and new objects are created—and the training phase—where initial network weights are set randomly—the experiments are repeated three times and the results are averaged. The simulations are configured as follows to facilitate reproducibility. Benchmark data that is publicly available are used and the process of splitting (performed only once) into local tables is done as described as in Section 4.2. The local tables are then augmented with additional values on missing attributes. One setting of all parameters is selected. The experiments are performed three times and results are averaged. Such a procedure is repeated for all other parameter settings using the same predefined local tables.
4.6 Baseline methods
In the literature, there are no models dedicated for dispersed data to generate a single model based on local tables with different attributes (but some of them are common). Therefore, for comparison, an intermediate approach is used, which, although does not generate a global model and does not resolve differences between attributes, it generates local models and performs global classification by voting. In the paper, two approaches for building a local model are used.
- The first approach is an ensemble of homogeneous classifiers, where the base classifiers are MLP networks. However, it should be noted here that each network generated for local tables has a different structure as the input layer is different. Which means that no unification of the input layer is done by filling in the values of missing attributes. For a single local table, an MLP network is created, which in the input layer has neurons corresponding to attributes occurring exactly in that local table. In order to maintain transparency and integrity numbers of neurons in the hidden layer are tested as for the proposed method. The number of neurons in the output layer is the same in all local models as it is equal to the number of decision classes. The final decision of the ensemble is made by soft voting.
- The second approach is the method proposed in [3]. This ensemble of classifiers method consists of creating three base classifiers: k−nearest neighbors, decision tree and Naive Bayes classifier (KNN, DT, NB) based on each local table. The parameter k = 3 and the Gini index as a splitting criterion when building decision trees are used. Thus, three classifiers are defined for each local table. The final decision of the ensemble is also made by soft voting.
Both approaches are implemented in the Python programming language using implementations available in the sklearn library.
5 Results and comparisons
The results of the experiments are shown in the tables below. Comparison of experimental results are made in terms of:
- The quality of classification for different numbers of artificial objects created based on one original object in local table.
- The quality of classification for different approaches: average of weights and sum of weights to aggregating local neural networks.
- The quality of classification for different number of hidden layers.
- The quality of classification for different number of neurons in the hidden layers.
- The quality of classification of the proposed method versus two other approaches from the literature—homogeneous and heterogeneous ensemble of classifiers.
The average classification accuracy obtained from three runs of the algorithm are presented. Tables 2–5 show the results obtained for one hidden layer, different number of artificial objects (one or three artificial objects generated), different aggregation methods used (average or sum) and different number of neurons in the hidden layer. For simplicity, the designations are adopted:
- 1HL, 2HL—for one or two hidden layers,
- 1AO, 3AO—for one or three generated artificial objects,
- AVG, SUM—for the aggregation method—average and sum.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Tables 6–21, show the results obtained for two hidden layers and also different number of artificial objects (one or three artificial objects generated), different aggregation methods used (mean and sum) and different number of neurons in the hidden layers. The columns show the number of neurons in the first hidden layer while the rows indicate the different number of neurons in the second hidden layer. The results obtained for different data sets are divided into separate tables due to the size of these tables. In all of these tables, the best result obtained for a given number of hidden layers, a number artificial objects and given aggregation method are marked in bold. Comparisons of experimental results with respect to different factors are made in separate sections.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
Designation I is used for the number of neurons in the input layer.
5.1 Comparison of classification quality for different number of artificial objects created based on one original object in local table
Table 22 shows a comparison of the classification accuracy obtained for one and three generated artificial objects at various other settings (these are the best results which have been presented in bold in previous tables). For each setting and data set, the better result is marked in bold. In ninety-three cases, better results are obtained with one artificial object, and in fifty-four cases with three artificial objects generated. Thus, in most cases, generating just one artificial object based on original object is enough to get better result. For statistical test [51], the two dependent groups are created (1AO and 3AO), each with one hundred and forty objects. Initially, the null hypothesis, Hθ is defined, whereby Hθ means there is no significance in terms of the number of artificial objects used in the model. The Wilcoxon test for dependent samples confirmed the statistical significance of the differences with p = 0.0003 (we have justification for rejecting the null hypothesis) and the medians are equal 0.9 and 0.893 for groups 1AO and 3AO respectively. This proves that actually using one artificial object on average generates better results than using three artificial objects.
In addition, Fig 5 is created on which the results for all the settings and data sets analyzed are marked with respect to different artificial objects generated (individual cases are not labeled on the x-axis for clarity). In the graph, one can observe that indeed using one artificial object in many cases generates better results.
5.2 Comparison of classification quality for different approaches: Average of weights and sum of weights to aggregating local neural networks
Table 23 shows a comparison of the classification accuracy obtained for methods for aggregating local networks: average and sum. For each setting and data set, the better result is marked in bold. In seventy-four cases, better results are obtained for average, and in sixty-nine cases for sum method. Thus, in most cases, the global network obtained by the sum of the weights provides better results. A statistical test is performed to confirm the significance of differences. The null hypothesis, Hθ is defined, whereby Hθ means there is no significance in terms of the method for aggregating local neural networks in the model. The two dependent groups are created (AVG and SUM), each with one hundred and forty objects. The Wilcoxon test for dependent samples confirmed the statistical significance of the differences in accuracy with p = 0.04—so we have justification for rejecting the null hypothesis. The medians are equal 0.902 and 0.87 for groups AVG and SUM respectively. This proves that actually using the average approach generates better results. However, it should be noted that the results are very dependent on the data set. For the Vehicle and the Dry Bean data sets, it is definitely apparent that the sum method provides better results. However, for the Sensorless and the Crowd Sourced data sets, it is the average method that provides better results.
In addition, Fig 6 is created on which the results for all the settings and data sets analyzed are marked (individual cases are not labeled on the x-axis for clarity). In the graph, one can observer that indeed the average in many cases generates better results.
5.3 Comparison of classification quality for different numbers of hidden layers in the local and global networks
Table 23 also indicates a comparison of the results obtained for one and two hidden layers for each data set and settings analyzed. The better results are underlined. As can be seen, in twenty-seven cases the better results are obtained when using one hidden layer, while in one hundred and seventeen cases the better results are obtained when using two hidden layers. Thus, the use of two hidden layers generates better results in most cases. For statistical tests the null hypothesis, Hθ is defined, whereby Hθ means there is no significance in terms of the method for aggregating local neural networks in the model. The Wilcoxon test for dependent samples confirmed the statistical significance of the differences in accuracy with p = 0.0001. Also, Fig 7 is created on which the results for all the settings and data sets analyzed are compared for one and two hidden layers. In the graph, it is evident the two hidden layers networks in many cases generates better results.
5.4 Comparison of classification quality of the proposed method versus other approaches
Table 24 shows all the results obtained for the proposed approach, different setting and all analyzed each data set. Based on the previous analyzes, it is concluded that in most cases, the best results are obtained with using one artificial object, the sum as aggregation method, and two hidden layers. But now, based on the summarized results in Table 24, the best approach and result is selected for each data set (marked in bold).
Table 25 shows the best obtained accuracy for the proposed approach and two known approaches from the literature: homogeneous ensemble of MLP network classifiers and ensemble of classifiers (KNN, DT, NB) with soft voting, which are described in detail in the previous section. The best result is shown in bold. The proposed method is the best which virtually always generates better results. Statistical tests are performed in order to confirm the importance in the differences in the obtained results acc. At first, the values of the classification accuracy in three dependent groups (proposed method, homogeneous ensemble of MPL and ensemble of classifiers KNN, DT, NB) are analyzed. For accuracy the Friedman statistics is 38.98 with df = 2, p = 0.000001 and we can again reject the null hypothesis. The average ranks are the following: Proposed approach 2.86; Homogeneous ensemble MLP 1.61; Ensemble of classifiers (KNN, DT, NB) 1.53. The critical value of difference of the Nemenyi test between the average ranks of two methods is 0.96. We can claim that classification accuracy of proposed approach is significantly better to all other classifiers (Fig 8). The Wilcoxon-each-pair test confirmed the significant differences between the average accuracy values for all pairs with p−value lower than 0.00002 between proposed method and the other analyzed. Also the post-hoc Dunn Bonferroni test confirmed that with p = 0.000001.
Groups of methods that are not significantly different (with the level of significance at 0.05) are connected.
Additionally, comparative box-plot charts for the values of the classification accuracy and different approaches are created (Fig 9). As can be seen, the proposed approach generates by far the best quality of classification as this is confirmed by the highest positioned box plot and median.
In addition to classification accuracy, other measures are used for comparisons, which are good for unbalanced data and give more reliable comparisons. Table 26 shows the balanced accuracy for the proposed approach and homogeneous ensemble of MLP network classifiers and ensemble of classifiers (KNN, DT, NB) with soft voting. Balanced accuracy is calculated as the average of the sensitivity (true positive rate) for each class in a multiclass classification problem. The best result is shown in bold. Also for balanced accuracy, the proposed method yields the best results. Statistical tests are performed in order to confirm the importance in the differences in the obtained results bacc. At first, the values of the balanced accuracy in three dependent groups (proposed method, homogeneous ensemble of MPL and ensemble of classifiers KNN, DT, NB) are analyzed. For accuracy the Friedman statistics is 38 with df = 2, p = 0.000001 and we can again reject the null hypothesis. The average ranks are the following: Proposed approach 2.8; Homogeneous ensemble MLP 1.84; Ensemble of classifiers (KNN, DT, NB) 1.36. The critical value of difference of the Nemenyi test between the average ranks of two methods is 0.96. We can claim that classification accuracy of proposed approach is significantly better to all other classifiers (Fig 10). The Wilcoxon-each-pair test confirmed the significant differences between the average of balanced accuracy values for two approaches from the literature and the proposed approach with p−value lower than 0.004. The difference in average of balanced accuracy is not significant between the approaches the homogeneous ensemble of MLP network and the ensemble of classifiers (KNN, DT, NB). These conclusions are also confirmed graphically in Fig 11. Also the post-hoc Dunn Bonferroni test confirmed that with p = 0.0002.
Groups of methods that are not significantly different (with the level of significance at 0.05) are connected.
The F1−score measure values are compared next (Table 27). This measure provides a balance between precision and recall. It helps evaluate the trade-off between making accurate positive predictions (precision) and capturing all positive instances (recall). The F1−score is a good choice when you want to find a model that performs well in terms of both precision and recall. As we can see this time, the advantage of the proposed model over those from the literature is even greater. The Friedman test confirmed a statistically significant difference in the results obtained for the considered approaches, χ2(35, 2) = 44.4, p = 0.000001. The average ranks are the following: Proposed approach 2.91; Homogeneous ensemble MLP 1.63; Ensemble of classifiers (KNN, DT, NB) 1.46. The critical value of difference of the Nemenyi test between the average ranks of two methods is 0.96. We can claim that classification accuracy of proposed approach is significantly better to all other classifiers (Fig 12). The Wilcoxon-each-pair test confirmed the presence of significant differences in the average F1-score values between the two approaches from the literature and the proposed approach, with a p-value of less than 0.00001. These findings are visually reinforced by the data presented in Fig 13. Also the post-hoc Dunn Bonferroni test confirmed that with p = 0.000001.
Groups of methods that are not significantly different (with the level of significance at 0.05) are connected.
In the last step, the precisions are compared. It quantifies the ability of a model to correctly identify positive instances while minimizing false positives. Precision is particularly important in scenarios where the cost of false positives is high, or when you want to ensure that the positive predictions made by the model are highly reliable. In Table 28 results are compared with the best score highlighted. Here, again, the proposed approach performs much better than the others. The Friedman test confirmed a statistically significant difference in the results obtained for the considered approaches, χ2(35, 2) = 32.8, p = 0.000001. Just as before, the Wilcoxon-each-pair test confirmed the presence of significant differences in the average precision values between the two approaches from the literature and the proposed approach, with a p-value of less than 0.0001. The post-hoc Dunn Bonferroni test confirmed that differences are significant between the proposed approach and the besline methods with p = 0.000001. The average ranks are the following: Proposed approach 2.76; Homogeneous ensemble MLP 1.8; Ensemble of classifiers (KNN, DT, NB) 1.44. The critical value of difference of the Nemenyi test between the average ranks of two methods is 0.96. We can claim that classification accuracy of proposed approach is significantly better to all other classifiers (Fig 14). The data presented in Fig 15 visually supports and reinforces these findings.
Groups of methods that are not significantly different (with the level of significance at 0.05) are connected.
Based on the preceding analysis, it is clearly evident that the proposed approach consistently delivers very good results. To provide a more in-depth comparative analysis and demonstrate why the proposed model is superior, let us delve into the specific examples illustrating the advantages of the proposed approach. Let us notice once again that the proposed model involves training local MLP networks based on extended local tables (where missing values are imputed), aggregating these local models (using average or sum of weights), and re-training the global model with a shared sample of data. So the result is a single model that is more interpretable and easier to use. On the other hand, ensemble of classifiers (homogeneous MLP or heterogeneous KNN, DT, NB) involves creating base classifiers for each local table. The final decision is made by voting, so we do not get one model—one interpretation. The comparative analysis revealed that the proposed model consistently outperforms the baseline methods across all evaluation criteria. But the advantage of the proposed model obtained due to increased complexity is also justified practically. In the healthcare sector, predicting high-risk patients for sepsis across multiple hospitals is crucial for timely intervention and treatment. Each hospital has its own data set with various patient attributes, some of which may be missing or incomplete. With a single model obtained from the proposed approach, it is enough to check all values on the attributes of the global model of the diagnosed patient—without having to refer to local hospitals and their databases. In smart agriculture, yield prediction is essential for effective farm management and planning. Multiple farms collect data on various attributes affecting crop yield, but these data sets can be incomplete or missing certain information. Also in this case the proposed model outperforms the baseline methods in integrating diverse and incomplete data from multiple farms to improve predictive accuracy, aiding farmers in making more informed decisions. These case studies illustrate the practical benefits and reliability of the proposed model in real-world applications, such as healthcare and agriculture, where accurate predictions are crucial for improving outcomes.
Additionally, AUCROC charts are prepared to demonstrate that the proposed approach outperforms those known from the literature. Due to limited space, we do not present the graphs for all thirty-five dispersed data sets. Figs 16 and 17 shows AUCROC graphs for Crowd Sourced imbalanced and balanced data sets, as well as all versions of dispersion—with 3, 5, 7, 9, and 11 local tables (since the proposed approach performed the worst for these data sets). Each row first displays the curve plot for the homogeneous ensemble of MPL networks classifiers, followed by the ensemble of classifiers (KNN, DT, NB), and finally, the proposed approach. Since the analyzed data sets are multi-class data, the graph shows the ROC curves for each decision class versus to the others, as well as the averaged ROC curve. It is clear that the proposed approach outperforms the other approaches. We can confidently say that for dispersed data, building a global neural network yields better results than using either heterogeneous or homogeneous ensembles of classifiers.
AUCROC graph for Crowd Sourced imbalanced data sets and all versions of dispersion: a) 3 local tables, b) 5 local tables, c) 7 local tables, d) 9 local tables, e) 11 local tables and three different approaches: first row graph—homogeneous ensemble of MPL networks classifiers, second row graph—ensemble of classifiers (KNN, DT, NB), third row graph—proposed approach.
AUCROC graph for Crowd Sourced balanced data sets and all versions of dispersion: a) 3 local tables, b) 5 local tables, c) 7 local tables, d) 9 local tables, e) 11 local tables and three different approaches: first row graph—homogeneous ensemble of MPL networks classifiers, second row graph—ensemble of classifiers (KNN, DT, NB), third row graph—proposed approach.
6 Conclusions
This paper proposes a new method for generating MLP global networks based on dispersed data with different sets of attributes. This method involves generating local MLP neural networks with identical structure based on local tables. Artificial objects based on the original objects are generated in order to realize that. In the next step, the networks are aggregated using appropriate weights proportional to the classification accuracy of the local models and one of two proposed methods—sum and average. The paper shows that the proposed model generated better results that other methods known from the literature. In addition, it is verified that on average, the best quality is achieved using only one artificial object with two hidden layers.
While the proposed model demonstrates significant improvements over traditional ensemble classifiers and homogeneous ensembles of MLPs, it is important to acknowledge certain limitations that could impact its performance and applicability in various scenarios. The model’s ability to handle missing attributes heavily relies on the quality of the imputation methods used. If the imputation process introduces biases or inaccuracies, it can adversely affect the overall performance of the model. As the number of local tables increases, the model’s ability to efficiently aggregate and re-train the global model might become a bottleneck. The model’s performance is sensitive to the choice of parameters, such as the number of hidden layers and neurons, the method of aggregating local networks, and the strategies for handling missing data. Identifying the optimal settings requires extensive experimentation, which may not always be feasible. Developing automated parameter tuning methods or adaptive algorithms could mitigate this limitation. An important limitation of the method is also the need for a validation set, which must contain the combined characteristics of object—descriptions from the perspective of all local tables.
In further works, it is planned to use conflict analysis and coalitions of local networks to generate a global model as well as to develop a method for generating the artificial objects necessary for the global network’s re-training stage. Also, it is planned to use other neural network architectures in the proposed approach.
References
- 1. Li T., Sahu A., Talwalkar A., Smith V. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine. 2020;37(3):50–60.
- 2. Verbraeken J., Wolting M., Katzy J., Kloppenburg J., Verbelen T., Rellermeyer J. A survey on distributed machine learning. Acm computing surveys (csur). 2020;53(2):1–33.
- 3. Kurian R., Lakshmi K. An ensemble classifier for the prediction of heart disease. International Journal of Scientific Research in Computer Science. 2018;3(6):25–31.
- 4. Bilal A., Sun G., Mazhar S. Finger-vein recognition using a novel enhancement method with convolutional neural network. Journal of the Chinese Institute of Engineers. 2021;44(5):407–417.
- 5. Bilal A., Liu X., Shafiq M., Ahmed Z., Long H. NIMEQ-SACNet: A novel self-attention precision medicine model for vision-threatening diabetic retinopathy using image data. Computers in Biology and Medicine. 2024;171:108099. pmid:38364659
- 6. Feng X., Xiu Y., Long H., Wang Z., Bilal A., Yang L. Advancing single-cell RNA-seq data analysis through the fusion of multi-layer perceptron and graph neural network. Briefings in Bioinformatics. 2024;25(1):.
- 7. Mendoza J., Pedrini H. Detection and classification of lung nodules in chest X-ray images using deep convolutional neural networks. Computational Intelligence. 2020;36(2):370–401.
- 8. Bilal A., Sun G., Mazhar S., Junjie Z. Neuro-optimized numerical treatment of HIV infection model. International Journal of Biomathematics. 2021;14(05):2150033.
- 9. Bilal A., Liu X., Long H., Shafiq M., Waqar M. Increasing Crop Quality and Yield with a Machine Learning-Based Crop Monitoring System. Computers, Materials & Continua. 2023;76(2):.
- 10. Yu L., Li M. A case-based reasoning driven ensemble learning paradigm for financial distress prediction with missing data. Applied Soft Computing. 2023;137:110163.
- 11. Kang J., Ullah Z., Gwak J. MRI-based brain tumor classification using ensemble of deep features and machine learning classifiers. Sensors. 2021;21(6):2222. pmid:33810176
- 12. Sesmero M., Iglesias J., Magán E., Ledezma A., Sanchis A. Impact of the learners diversity and combination method on the generation of heterogeneous classifier ensembles. Applied Soft Computing. 2021;111:107689.
- 13.
Arora J., Agrawal U., Tiwari P., Gupta D., Khanna A. Ensemble feature selection method based on recently developed nature-inspired algorithms. In: International Conference on Innovative Computing and Communications: Proceedings of ICICC 2019, Volume 1. Springer; 2020. p. 457–470. 2020.
- 14. Yaiprasert C., Hidayanto A. AI-driven ensemble three machine learning to enhance digital marketing strategies in the food delivery business. Intelligent Systems with Applications. 2023;18:200235.
- 15. Bhat P., Behal S., Dutta K. A system call-based android malware detection approach with homogeneous & heterogeneous ensemble machine learning. Computers & Security. 2023;130:103277.
- 16.
Dinkel H., Wang Y., Yan Z., Zhang J., Wang Y. CED: Consistent ensemble distillation for audio tagging. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2024. p. 291–295. 2024.
- 17. Mothukuri V., Parizi R., Pouriyeh S., Huang Y., Dehghantanha A., Srivastava G. A survey on security and privacy of federated learning. Future Generation Computer Systems. 2021;115:619–640.
- 18.
Bazan J., Milan P., Bazan-Socha S., Wójcik K. Application of Federated Learning to Prediction of Patient Mortality in Vasculitis Disease. In: International Joint Conference on Rough Sets. Springer; 2023. p. 526–536. 2023.
- 19.
Li Z., Lin T., Shang X., Wu C. Revisiting weighted aggregation in federated learning with neural networks. In: International Conference on Machine Learning. PMLR; 2023. p. 19767–19788. 2023.
- 20. Zhu H., Zhang H., Jin Y. From federated learning to federated neural architecture search: a survey. Complex & Intelligent Systems. 2021;7:639–657.
- 21. Alazab M., Priya R. M. P., Maddikunta P., Gadekallu T., Pham Q. Federated Learning for Cybersecurity: Concepts, Challenges, and Future Directions. IEEE Trans. Ind. Informatics. 2022;18(5):3501–3509.
- 22.
Dyczkowski K., Pekala B., Szkoła J., Wilbik A. Federated learning with uncertainty on the example of a medical data. In: 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE; 2022. p. 1–8. 2022.
- 23. Singhal K., Sidahmed H., Garrett Z., Wu S., Rush J., Prakash S. Federated reconstruction: Partially local federated learning. Advances in Neural Information Processing Systems. 2021;34:11220–11232.
- 24. Guo R., Shen W. A model fusion method for online state of charge and state of power co-estimation of lithium-ion batteries in electric vehicles. IEEE Transactions on Vehicular Technology. 2022;71(11):11515–11525.
- 25. Marfo K., Przybyła-Kasperek M. Radial basis function network for aggregating predictions of k-nearest neighbors local models generated based on independent data sets. Procedia Computer Science. 2022;207:3234–3243.
- 26. Moshkov M. Common Decision Trees, Rules, and Tests (Reducts) for Dispersed Decision Tables. Procedia Computer Science. 2022;207:2503–2507.
- 27.
Przybyła-Kasperek M., Aning S. Bagging and single decision tree approaches to dispersed data. In: Computational Science–ICCS 2021: 21st International Conference, Krakow, Poland, June 16–18, 2021, Proceedings, Part III. Springer; 2021. p. 420–427. 2021.
- 28. Przybyła-Kasperek M., Marfo K. Neural network used for the fusion of predictions obtained by the K-nearest neighbors algorithm based on independent data sources. Entropy. 2021;23(12):1568. pmid:34945874
- 29. Czarnowski I. Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams. Journal of Computational Science. 2022;61:101614.
- 30. Przybyła-Kasperek M. The power of agents in a dispersed system–The Shapley-Shubik power index. Journal of Parallel and Distributed Computing. 2021;157:105–124.
- 31. Przybyła-Kasperek M., Kusztal K. New Classification Method for Independent Data Sources Using Pawlak Conflict Model and Decision Trees. Entropy. 2022;24(11):1604. pmid:36359694
- 32. Elshamy R., Abu-Elnasr O., Elhoseny M., Elmougy S. Improving the efficiency of RMSProp optimizer by utilizing Nestrove in deep learning. Scientific Reports. 2023;13(1):8814. pmid:37258633
- 33. Stephen A., Punitha A., Chandrasekar A. Designing self attention-based ResNet architecture for rice leaf disease classification. Neural Computing and Applications. 2023;35(9):6737–6751.
- 34. Qureshi A., Roos T. Transfer learning with ensembles of deep neural networks for skin cancer detection in imbalanced data sets. Neural Processing Letters. 2023;55(4):4461–4479.
- 35.
Bishop C. Pattern recognition and machine learning, 5th Edition. Information science and statistics. Springer 2007.
- 36.
Glorot X., Bordes A., Bengio Y. Deep Sparse Rectifier Neural Networks. In: Gordon GJ, Dunson DB, Dudik M, editors. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011. vol. 15 of JMLR Proceedings. JMLR.org; p. 315–323. 2011.
- 37.
Li X., Li X., Pan D., Zhu D. On the learning property of logistic and softmax losses for deep neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34; p. 4739–4746. 2020.
- 38.
Kingma D., Ba J. Adam: A Method for Stochastic Optimization. In: Bengio Y, LeCun Y, editors. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings 2015.
- 39.
Mannor S., Peleg D., Rubinstein R. The cross entropy method for classification. In: Raedt LD, Wrobel S, editors. Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany, August 7-11, 2005. vol. 119 of ACM International Conference Proceeding Series. ACM; 2005. p. 561–568. 2005.
- 40. Schapire R. Explaining adaboost. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik. 2013;:37–52.
- 41.
Hemachandra A., Dai Z., Singh J., Ng S., Low B. Training-free neural active learning with initialization-robustness guarantees. In: International Conference on Machine Learning. PMLR; 2023. p. 12931–12971 2023.
- 42.
Saran A., Yousefi S., Krishnamurthy A., Langford J., Ash J. Streaming active learning with deep neural networks. In: International Conference on Machine Learning. PMLR; 2023. p. 30005–30021 2023.
- 43. Zamri N., Azhar S., Mansor M., Alway A., Kasihmuddin M. Weighted Random k Satisfiability for k = 1,2 (r2SAT) in Discrete Hopfield Neural Network. Applied Soft Computing. 2022;126:109312.
- 44. Zamri N., Azhar S., Sidik S., Mansor M., Kasihmuddin M., Pakruddin S., et al. Multi-discrete genetic algorithm in hopfield neural network with weighted random k satisfiability. Neural Computing and Applications. 2022;34(21):19283–19311.
- 45.
Siebert J. Vehicle recognition using rule based methods. Turing Institute Research Memorandum. 1987;TIRM-87-0.18:.
- 46. Koklu M., Özkan I. Multiclass classification of dry beans using computer vision and machine learning techniques. Comput. Electron. Agric. 2020;174:105507.
- 47.
Bator M., Wissel C., Dicks A., Lohweg V. Feature Extraction for a Conditioning Monitoring System in a Bottling Process. In: 23rd IEEE International Conference on Emerging Technologies and Factory Automation, ETFA 2018, Torino, Italy, September 4-7, 2018. IEEE; 2018. p. 1201–1204. 2018.
- 48.
Johnson B. Crowdsourced Mapping. UCI Machine Learning Repository, 2016.
- 49. Chawla N., Bowyer K., Hall L., Kegelmeyer W. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002;16:321–357.
- 50.
Russell I., Markov Z. An introduction to the Weka data mining system. 2017.
- 51. Zamri N., Mansor M., Kasihmuddin M., Sidik S., Alway A., Romli N., Guo Y., et al. A modified reverse-based analysis logic mining model with Weighted Random 2 Satisfiability logic in Discrete Hopfield Neural Network and multi-objective training of Modified Niched Genetic Algorithm. Expert Systems with Applications. 2024;240:122307.