Figures
Abstract
In 2019, the novel coronavirus swept the world, exposing the monitoring and early warning problems of the medical system. Computer-aided diagnosis models based on deep learning have good universality and can well alleviate these problems. However, traditional image processing methods may lead to high false positive rates, which is unacceptable in disease monitoring and early warning. This paper proposes a low false positive rate disease detection method based on COVID-19 lung images and establishes a two-stage optimization model. In the first stage, the model is trained using classical gradient descent, and relevant features are extracted; in the second stage, an objective function that minimizes the false positive rate is constructed to obtain a network model with high accuracy and low false positive rate. Therefore, the proposed method has the potential to effectively classify medical images. The proposed model was verified using a public COVID-19 radiology dataset and a public COVID-19 lung CT scan dataset. The results show that the model has made significant progress, with the false positive rate reduced to 11.3% and 7.5%, and the area under the ROC curve increased to 92.8% and 97.01%.
Citation: Liu H, Zhao M, She C, Peng H, Liu M, Li B (2025) Classification of CT scan and X-ray dataset based on deep learning and particle swarm optimization. PLoS ONE 20(1): e0317450. https://doi.org/10.1371/journal.pone.0317450
Editor: Bushra Zafar, Government College University Faisalabad, PAKISTAN
Received: June 12, 2024; Accepted: December 25, 2024; Published: January 27, 2025
Copyright: © 2025 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The experimental data and the simulation results that support the findings of this study are available in GitHub with the identifier https://github.com/UCSD-AI4H/COVID-CT and Kaggle with the identifier https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database/data.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The COVID-19 pandemic has brought profound experiences and warnings to the world, significantly impacting our society, economy, and public health systems [1]. This pandemic has once again emphasized the importance of prevention and early response [2]. Early detection, rapid implementation of control measures, and strengthening of monitoring, early warning, and response mechanisms can greatly slow down the spread of the disease, reducing the number of cases and deaths.
Computed tomography (CT) is a frequently used medical imaging technique, which is widely used in clinical diagnosis and treatment. Meanwhile, X-ray is one of the commonly used methods of lung medical imaging, which can show abnormal shadows in the lungs. After the outbreak of the COVID-19 pandemic, these two methods have become an important tool for the diagnosis and assessment of COVID-19. These lung medical images have both high resolution and high contrast, allowing physicians to diagnose diseases more accurately, assess injuries, or plan surgeries.
Improving the accuracy of lung medical image classification undoubtedly enhances the detection and early warning capabilities of related diseases, thereby effectively suppressing the spread of the diseases. Traditional medical image classification methods usually depend on hand-designed feature extraction and machine learning algorithms. These methods require manually selecting and extracting features associated with a specific disease, and then using a classifier to perform the classification. However, because of the complexity and high dimensionality of lung medical images, traditional methods are often limited by the expressive and generalization capabilities of the features [3], which makes it difficult to capture the rich information in the images.
In recent years, the rapid development of deep learning technology has brought breakthroughs in medical image classification. Deep learning utilizes a multi-layer neural network model to achieve end-to-end image classification by learning and extracting features from a large amount of labeled data, which has a powerful nonlinear modeling capability and automatic learning ability and can effectively extract useful features from complex images to obtain accurate classification results [4]. In the clinical management and diagnosis of COVID-19, CT technology and X-ray technology have been widely used in the assessment of lung imaging in patients with suspected or confirmed infections. These medical images provide more intuitive and detailed information about the lung structure, which can help doctors determine the characteristics and extent of lung lesions and assist in the clinical diagnosis and therapeutic decision-making [5].
Due to the contagious and clinical characteristics of COVID-19, hospitals and healthcare organizations face the problem of large amounts of lung medical image data requiring rapid and accurate classification [6]. Accurately classifying COVID-19-infected medical images is crucial for early diagnosis, tracking disease progression, and guiding treatment. Therefore, deep learning-based lung image classification methods have become one of the hotspots of research. Deep learning methods automatically learn feature representations from a large amount of image data by constructing deep neural networks with strong nonlinear modeling and generalization capabilities and applying deep learning to COVID-19 lung image classification has become a promising solution [7]. In addition, medical image classification can help monitor outbreak transmission trends and assess epidemiological characteristics, providing necessary decision support for public health departments.
In recent years, there have been many studies on deep learning algorithms for CT images or Chest X-rays. The current approach primarily involves utilizing CNN networks to construct models for image classification or segmentation, such as ResNet, Inception, U-Net, and others. However, there are few studies on existing methods to reduce the false positive rate. They mostly focus on accuracy, AUC (Area Under the Curve) metrics, or F1-Score, while overlooking the FPR. Experiments have proved that over-pursuing AUC metrics in the training process will lead to a decrease in FPR, and experimental methods based on the data will consider each metric as equally important without weighting, and in the actual disease monitoring process, we are more accepting of the fact that negative patients being misclassified as positive, while positive patients being misclassified as negative will bring more losses and high FPR metrics can increase the cost of unnecessary testing and thus reduce the efficiency of disease detection.
In this paper, a low FPR method for disease detection based on COVID-19 lung images is proposed. The contributions are shown as follows.
- Proposed a New Method for Disease detection: The paper introduces a novel method for detecting diseases, specifically COVID-19, using lung images. This method focuses on achieving a low false positive rate (FPR).
- Integration of residual networks and particle swarm optimization: The proposed approach effectively combines residual networks and particle swarm optimization to minimize FPR while maintaining a satisfactory level of accuracy, measured by the Area Under the Curve (AUC) metrics.
- Optimization of FPR in Model Training: The research addresses the challenge that FPR, typically used to evaluate model performance, is not a variable that can be directly optimized using traditional gradient descent methods.
- Development of a Two-Stage Optimization Model. Stage 1: Utilizes classical gradient descent to train the model and extract relevant features. Stage 2: Constructs an objective function that minimizes FPR and employs an evolutionary algorithm to optimize model parameters, resulting in a network model with a low FPR.
The remainder of the paper is organized as follows: The Methods section explains the implementation of the low FPR neural network optimization model proposed in this paper, including the network structure and optimization algorithm;the Results and Discussion section explains the results of the comparative experiments and simulations done in this paper and finally the conclusions are given in the Conclusion section.
Related work
Artificial intelligence-based Computer-Aided Diagnosis (CAD) is currently one of the most prominent research areas, playing a crucial role in diagnosing COVID-19 [8]. Many existing studies employ mainstream publicly available model architectures for CAD in lung medical imaging. For example, Jaiswal et al. [5] utilized DenseNet201 to classify lung CT images of COVID-19 patients. By employing transfer learning techniques to train the deep learning network, they achieved an accuracy of 96.29%. Similarly, Yu et al. [9] optimized the widely used GoogLeNet model to enhance its information extraction capability, demonstrating excellent performance on their private dataset. Fares Bougourzi et al. [10] introduced the PDAtt-Unet method, which achieved a specificity of 99.30%, precision of 74.74%, sensitivity of 80.69%, and an F1-Score of 77.60%. Furthermore, Ismael et al. [11] tested several pre-trained models, such as ResNet18, ResNet50, ResNet101, VGG16, and VGG19, on COVID-19 X-ray data, further confirming the effectiveness of deep learning techniques in COVID-19 diagnostic tasks.
In addition to adapting mainstream models, some researchers have proposed targeted model designs for COVID-19 lung imaging. Shah et al. [12] developed a custom CTnet-10 model, which required less training and prediction time compared to other publicly available network models, and achieved higher accuracy by extracting deep features from CT images. Attallah et al. [13] proposed a diagnostic approach that fused features from multiple CNNs to diagnose COVID-19, attaining high accuracy. Reshi et al. [14] aiming to address the imbalance and low quality of COVID-19 X-ray image datasets, introduced a deep CNN architecture that classified chest X-ray images and diagnosed COVID-19 with an impressive accuracy of 99.5% on an independent dataset.
Some researchers have improved accuracy by extracting deep features from data or optimizing various model parameters. For instance, Wang et al. [15] introduced a particle swarm optimization (PSO)-guided self-tuning CNN method. Similarly, Entesar et al. [16] employed optimization algorithms to fine-tune hyperparameters during model training. Both studies demonstrated that hyperparameter optimization not only reduced the time required for parameter adjustment but also enhanced model stability. Wang et al. [17] further combined feature extraction with parameter optimization. They first used wavelet Renyi entropy to extract deep features from images. These features were then fed into a neural network for learning, followed by the application of optimization algorithms to refine the network’s weight parameters and the Renyi entropy order.
Despite significant advancements, there is still room for improvement in optimizing model weight parameters. Many existing methods primarily focus on conventional metrics, such as overall accuracy, sensitivity, or specificity, to enhance model performance. However, relatively little research has been dedicated to optimizing specific metrics, such as the false-positive rate (FPR). In the context of disease detection, false positives can lead to additional medical testing, resource wastage, and increased unnecessary costs. To address this issue, this paper proposes a low-FPR method for COVID-19 prediction based on lung CT images.
Methods
This study formulates disease detection as a binary classification problem and introduces a lightweight binary classification framework [18]. The proposed framework consists of two main components. The first component utilizes a deep neural network to extract discriminative features from three-channel two-dimensional image inputs. The second component leverages an evolutionary algorithm to optimize the model by freezing the parameters of other layers and updating only the weights of the fully connected layer. After the above two components, this paper builds a low FPR deep learning model.
Proposed model
The low FPR network model proposed is mainly composed of the feature extraction network module and the optimization module, the structure is shown in Fig 1. The convolutional and residual layers first extract features from the image data to generate feature vectors. Then, the fully connected layer is used as the classification layer to output the probability distribution of classes, which is then translated into predicted labels. During the training process, the network weight parameters are updated using the gradient descent method. The convolutional and residual layers are then frozen, and the weights in the fully connected layer are fed into an evolutionary algorithm for training. Upon completion of training and updating of model weight parameters, classification results are output. Among them, the feature extraction network consists of a convolutional layer and a residual network layer, and the optimization module consists of an evolutionary algorithm and an FPR objective function, as shown in Fig 2. The convolutional layer consists of multiple 3 × 3 convolutional kernels and the number of kernels gradually increases as the number of layers increases, and the convolutional step size is 2 to reduce the size of the output feature map. At the same time, there is no dropout and pooling layer to maximize the data features that are passed to the residual network layer for feature extraction.
The residual network layer consists of several convolutional kernels of different dimensions 1×1, 3×3, and 5×5 to increase the adaptability of the network to different scales and to extract more global features. In this case, the 5×5 convolutional kernel is replaced by two 3×3 convolutional kernels to reduce computational complexity. Using convolutional kernels of different sizes allows for the extraction of richer features. A smaller kernel improves the efficiency of the network at capturing local features, while a larger kernel improves the performance of the network at capturing global features. At the same time, a 1×1 convolutional kernel is added before the convolutional kernel to downscale the input features to reduce the computational burden of larger convolutional kernels. At the same time, the inputs of each convolutional kernel are passed through a linear layer to combine the outputs of each convolutional kernel into a one-dimensional feature vector. Finally, residuals are computed with the input feature data to reduce network overfitting and solve the problems of gradient vanishing and gradient explosion. Finally, after a pooling layer with a kernel size of 3 × 3, the category of each feature is predicted by a fully connected layer, as shown in Fig 2. A dropout layer is added before the fully connected layer to reduce overfitting.
The feature extraction layer uses two convolution kernels of size 1 and 3. The residual layer uses a convolution layer with a convolution kernel size of 3, a stride of 1, and all-zero padding, and a convolution layer with a convolution kernel size of 1 and a stride of 1. Among them, the data normalization function BatchNorm2d uses default parameters, the batch size is 16, the initial learning rate is 0.001, and the number of training epochs is 100. The MultiStepLR function is used to dynamically adjust the learning rate. The number of adaptation epochs is 30. The optimizer uses AdamW, and the weight decay coefficient is 1e-6. The details of the architecture and hyperparameter values are given in Table 1.
Training process of deep neural network
AUC, FPR, and TPR are three interrelated metrics, and typically, excessive focus on one metric will lead to changes in the other two metrics. In this paper, after extracting features using a deep learning network, we optimize the weight parameters of the fully connected layer of the network to reduce the FPR metrics, but excessive focus on the FPR metrics may lead to a decrease in AUC or TPR, so we should optimize these two interrelated metrics simultaneously to improve the AUC metrics and reduce the FPR as much as possible while ensuring a high degree of accuracy. Therefore, we design a multi-objective fitness function with linear weighting as the objective, and then we design a multi-objective fitness function as the objective. We designed a multi-objective fitness function with linear weighting as the objective function. In linear weighting, each evaluation metric is assigned a weight, and the sum of these weights is usually 1. The weights can be determined based on the importance of the task, the optimization objective, and the associated requirements. A higher weight indicates that the metric contributes more to the overall performance. The basic idea of the linear weighting method is to determine the indicators to be evaluated, AUC and FPR, and weight the indicators ω1, ω2, ω1, ω2 in the range of (0,1), where the ideal value of the FPR is 0%, and the ideal value of the AUC is 100%. The ideal value is 100%. Since both indicators can’t reach the ideal value, it is necessary to rationally evaluate the trade-off relationship between the indicators and introduce trade-off points to ensure that all indicators have effective changes, and the formula is expressed as follows:
(1)
where ω1 and ω2 denote the weights of each indicator, P1 denotes the ideal point for FPR and P1 should be close to 0, P2 denotes the ideal point for AUC and P2 should be close to 1.
Squaring each metric allows more attention to be paid to the optimization of a metric when the difference between that metric and the trade-off point is too large. In addition, to reduce the overfitting phenomenon, L2 regularization is introduced as a penalty term. The formula is expressed as follows:
(2)
Optimize the FPR objective function
After training the network model in the first stage, we optimize the fully connected layer for the desired metrics, the algorithm used in this paper is the particle swarm optimization.
Particle Swarm Optimization (PSO) is a heuristic optimization algorithm inspired by the behavior of groups of organisms such as flocks of birds or schools of fish. It finds the optimal solution to a problem by modeling the collaboration and information exchange between individuals in a group [19]. In the particle swarm algorithm, the solution of the problem is represented as the position of a particle. Each particle has its position and velocity and searches based on the current position and velocity. The particle searches for the optimal solution by continuously updating its position and velocity. Meanwhile, the population search mechanism of the particle swarm algorithm converges faster, has strong robustness, requires fewer initial parameter settings, has a wide range of applicability, and the memory occupation is related to the number of particles and particle information.
The Particle Swarm Optimization is defined as:
(3)
where, vid denotes the velocity vector of the ith particle, xid denotes the position vector of the ith particle,
denotes the individual optimal solution,
denotes the population optimal solution, ω1 denotes the inertia weight, c1 denotes the individual learning factor, c2 denotes the population learning factor, r1 and r2 denote random numbers within [0, 1].
The number of particles is chosen to be 30, the maximum number of iterations is set to 50, and the initial learning factors c1 and c2 are set to 1.5. The application of Particle Swarm Optimization is as follows.
- Initialize particle swarm: The weight parameters of the fully connected layers are aligned and merged into an array, with each parameter serving as the initial position and velocity of the particles.
- Compute fitness: For each particle, the fitness value, i.e., the objective function a, is computed based on its position function.
- Update individual optimal position: For each particle, update
, i.e. the optimal solution found so far by this particle, based on the current fitness value.
- Update optimal position of the population: Find the particle with the best fitness value in the population and use its position as the optimal position of the population
.
- Update velocity and position: For each particle, update the velocity and position of the particle according to
and
, and the current velocity vi.
- Judge termination conditions: judge whether to terminate the algorithm based on predefined termination conditions (e.g., reaching the maximum number of iterations or meeting certain fitness requirements).
- Return result: the optimal solution found by using
as the output of the algorithm and then loading the optimal solution into the original model to verify the validity of the algorithm.
When solving practical problems, it is normally required to search globally first and locally later, so it is necessary to adaptively change the size of the weights as the iteration proceeds, in this paper, we adopt the linear weighting method with the following formula:
(4)
The flow of our algorithm
The flow steps of our algorithm are as follows.
- Step 1. Obtain image data from the original dataset.
- Step 2. Perform resizing, horizontal flipping, and vertical flipping on the images, followed by pixel normalization and data standardization to ensure adherence to a normal distribution.
- Step 3. Separate the data into training, validation, and test sets based on their respective categories.
- Step 4. Build the neural network model proposed in this paper.
- Step 5. Construct the training model based on the divided training data set, and evaluate the performance of the model using the validation data set at the end of each training epoch.
- Step 6. Freeze the parameters of the other layers and optimize the weights of the fully connected layer using the PSO optimization method proposed in the Optimizing the FPR Objective Function section.
- Step 7. Evaluate and test the trained and optimized model using the test set to verify the model’s performance.
Results and discussion
This chapter first introduces the two datasets used in this study, followed by an explanation of the relevant metrics required for the comparative experiments. Finally, the experimental results are presented and discussed.
Dataset selection and preparation
To verify the effectiveness of the proposed method, two datasets COVID-19 Radiography Dataset and the COVID-CT dataset were used to construct the experiment.
Dataset 1: COVID-19 Radiography Dataset, is comprised of 18,479 CXR images across 15,000 patient cases [20, 21]. The 925 of these cases are selected to construct the experiment, of which 427 are positive and 498 negative.
Dataset 2: The publicly available COVID-CT dataset consists of 812 CT scan images in total, including 349 images from COVID-19-positive patients and 463 images from COVID-19-negative patients [22]. As shown in Fig 3, a portion of the sample image from the dataset COVID-19 can be seen. Due to the different image quality of the acquired data, all images were transformed into RGB images of size (299, 299) to facilitate subsequent loading into the model for processing.
The two datasets, the open COVID-19 Radiography Dataset and the open COVID-19 lung CT scan dataset, utilized in the study are anonymized to ensure data security and individual privacy, encompassing sensitive information processing, encrypted storage and transfer, and compliance with privacy laws. Additionally, their long-standing public availability and usage in numerous papers underscore their transparency, validity, and representativeness, supporting the repeatability and verifiability of research findings. The data ratio is training set: validation set: test set = 7:1:2, and the data classification process uses stratified sampling and sets random seeds to ensure data balance. The details are shown in Table 2.
Evaluation metrics
Appropriate evaluation metrics are the key to evaluating the performance of a model. At present, most experimental methods choose traditional confusion matrix calculation metrics, such as accuracy, precision, AUC, F1-Score, and other metrics, which evaluate the overall performance of the model, and in practical medical image processing tasks, more attention is paid to evaluating the AUC metric while ignoring the FPR. In the field of disease monitoring and medical diagnosis, misclassifying a sick patient as a healthy negative patient compared to misclassifying a healthy positive person as sick can lead to greater losses and increase unnecessary testing and treatment costs. AUC measures the classification ability of a model under different thresholds, of which FPR is a component, and optimizing the area under the curve (AUC) is a common goal of model training; a higher AUC value indicates that the model has better predictive ability under different classification thresholds and can more accurately distinguish between positive and negative situations. However, too much focus on improving the AUC can lead to some problems. A common scenario is that the model may tend to predict samples as negative cases to reduce the number of false positives and thus lower the FPR. In this case, the model may produce some compression in the prediction of positive case samples, which may cause some true positive cases to be incorrectly predicted as negative cases. The model can improve the prediction accuracy of negative example samples and reduce the misclassification of negative examples as positive examples. At the same time, however, it may cause some true positives to be misclassified as negatives, increasing the risk of false negatives.
Balancing the risk of false positives and negatives is a critical consideration in the model optimization process. Selecting appropriate classification thresholds and employing appropriate strategies can help address this issue. At the same time, combining the knowledge and experience of domain experts with an understanding of specific application scenarios can better balance and manage the predictive accuracy and risk of the model. For example, in domains such as disease surveillance, focusing only on AUC improvement during training may not be the best strategy. Different evaluation metrics need to be considered comprehensively, and appropriate optimization goals and evaluation metrics need to be determined by weighing them according to the actual needs and application scenarios. Since the negative patient label is 1 and the positive patient label is 0 when the dataset is divided, to pay more attention to the misclassification rate of the positive patients, the method in this paper selects the indexes of AUC and FPR and optimizes the network model to reduce the FPR index while maintaining a certain AUC value.
- AUC: AUC is one of the commonly used metrics for evaluating the performance of the binary classification model [23]. The ROC curve (Receiver Operating Characteristic) is plotted with the True Positive Rate (TPR) (also known as recall or sensitivity) at different thresholds as the horizontal coordinate, and False Positive Rate (FPR) as the vertical coordinate. AUC represents the area under the ROC curve, which ranges from 0 to 1. The closer the AUC is to 1, the better the model is at different thresholds. Positive Rate (FPR) as the vertical coordinate. AUC denotes the area under the ROC curve, which ranges from 0 to 1. The closer the AUC is to 1, the better classification ability the model has under different thresholds.
- FPR: FPR is the percentage of samples in which the model incorrectly predicts a negative case as a positive case. In image processing tasks, especially in some applications with high sensitivity (e.g., medical imaging), it is important to reduce the FPR. By focusing on the FPR, it is possible to ensure that the model’s false alarm rate in negative case samples is as low as possible, thus increasing the model’s reliability and practical application value. In this experiment, the prevalence data is set to label 0, so the FPR is defined in this paper as the proportion of patients with COVID who are incorrectly diagnosed as not having COVID. The FPR is defined as:
(5)
- F1-Score: The F1-Score score is a crucial metric for assessing the effectiveness of a binary classification model. It considers the model’s proficiency in detecting both positive and negative examples. Detecting positive examples involves correctly identifying the number of genuinely positive samples, while detecting negative examples involves accurately recognizing the number of truly negative samples.
(6) where FP denotes false positive, TN denotes true negative, TP denotes true positive, and FN denotes false negative.
- PREC: Precision is the proportion of samples predicted to be positive that are actually positive. It measures the reliability of the model’s predictions.
(7)
- SPEC: Specificity refers to the proportion of samples predicted to be negative by the model that are truly negative. It measures the ability of the model to distinguish between true negatives.
(8)
Experimental result
The dataset is manually divided into training, test, and validation sets in the ratio of 7:2:1, and one hundred training rounds are performed, with each round using the test set data to validate the model results, and finally, plots of the accuracy analysis results of the validation and test sets for each model are generated. The deep learning model proposed in this paper and the deep learning model involved in the experimental comparison are implemented using the open-source deep learning framework of Pytorch, which runs under the environment of python3.7.
The confusion matrix is shown in Fig 4. The confusion matrix is an essential tool in classification models, providing a visual representation of an algorithm’s performance. It consists of four elements: True Positives (TP), which indicate the correct identification of a disease; False Positives (FP), which indicate the incorrect identification of a non-existent disease; True Negatives (TN), which confirm the correct non-existence of a disease; and False Negatives (FN), where the model fails to identify an existing disease. These components allow us to calculate important metrics such as accuracy, precision, recall, and F1-Score. As shown in the matrix, the results indicate an excellent detection capability with an accuracy rate of over 85%.
The proposed network model is used to validate the model on the COVID-19 radiography dataset and COVID-19 dataset, and then Lightweight Multi-scale CNN (LIMS) [24] and hybrid deep learning models(HDLM) [25], RESNET-50 [26], INCEPTION [27], a five-layer deep convolutional neural network (5LCNN) [28], EfficientNetV2 [29] are used for comparative experiments. RESNET-50, with its 50 layers, enables efficient performance of tasks such as image classification using skip connections. INCEPTION uniquely combines convolutions of different sizes for cost-effective parameter use, inspired by the retina. The 5LCNN introduces the method of stochastic pooling, providing robustness against overfitting and computational efficiency. EfficientNetV2, part of the Google family of models, introduces scalable improvements that address speed and accuracy, optimizing performance across devices. The LIMS has a special network model architecture, and the Hybrid Deep Learning model is a converged network using VGG-19 and VGG-16. The experimental results are shown in the following two tables, Tables 3 and 4. The ROC curve between different methods in data set 1 and data set 2 is shown in Fig 5.
In the performance assessments conducted on two distinct datasets, COVID-19 Radiography Dataset (Dataset 1) and COVID-CT dataset (Dataset 2), the proposed method outperformed well-established methodologies such as ResNet-50, Inception, 5LCNN, EfficientNetV2, LIMS and Hybrid Deep Learning Model across the majority of evaluation metrics, including accuracy, area under the ROC curve, false positive rate, precision, and specificity. The only deviation was a slightly lower F1-Score than the 5LCNN and LIMS in Dataset 1, which results in insignificant significant differences in p-value. Nonetheless, in Dataset 2, the proposed method excelled in all six evaluation metrics, outshining the four comparative approaches. These results unequivocally validate the effectiveness of the proposed method in COVID-19 detection imaging tasks and its exceptional and consistent performance across diverse datasets.
To verify the effectiveness of different steps of the method, we constructed ablation experiments, the proposed method without Residual unit (Without RES), the proposed method without PSO (Without PSO), and the proposed method. The analysis of the proposed model in dataset1 and dataset2 in Tables 5 and 6.
The ablation studies highlighted in Tables 5 and 6, using the COVID-19 Radiography Dataset and the COVID-CT dataset, reveal that our method outshines other variations, enhancing key performance metrics. Specifically, the omission of residual modules and Particle Swarm Optimization resulted in a decrease in model accuracy (ACC), area under the ROC curve (AUC), false positive rate (FPR) by nearly 23%, F1-Score, precision (PREC), and specificity (SPEC) in the Radiography Dataset. Implementing our full method not only recovered but also improved these metrics. This pattern of declination and subsequent improvement with the complete method was also noted in the COVID-CT dataset performance, emphasizing the essential role of the included components in refining the model’s effectiveness.
Limitation and discussion
The remarkable results achieved in Tables 3 and 4 can be attributed to three distinctive steps incorporated in the proposed method. First, multilayer convolution is used to efficiently extract image features. This technique captures an array of features at different depths, providing the model with enhanced representational capacity. Next, the architecture integrates residual modules that mitigate the gradient vanishing problem prevalent in conventional deep networks, allowing for more effective training and improved model performance. Finally, Particle Swarm Optimization (PSO) is used to fine-tune the false positive rate (FPR). PSO, an algorithm rooted in swarm intelligence, quickly and accurately refines the global optimum for the optimization problem at hand. Through this optimization process, the model skillfully minimizes the false positive rate, significantly reducing false alarms while maintaining a high detection rate, and increasing the practical applicability and value of the model.
The time spent of these methods is shown in Table 3. The results show that the time used by our method is less than 0.50 seconds, which is close to the minimum time consumption and far less than the time consumption of other methods.
The remarkable results obtained in Tables 3 and 4 show that on the COVID-19 radiography and COVID-CT datasets, the removal of residual modules (without RES) and Particle Swarm Optimization (without PSO) degrade performance metrics such as ACC, AUC, FPR, F1-Score, PREC, and SPEC; whereas the full implementation of the proposed method significantly improves the results, in particular reducing FPR by almost 23%, highlighting its effectiveness in reducing false positives.
The results of evaluating the model on two data sets (Data Set 1 and Data Set 2) show the values for various performance metrics along with the confidence intervals at the 95% confidence level. For Data Set 1, the precision of the model is 0.8957 with a confidence interval of [0.8598, 0.9316], and there are corresponding values and confidence intervals for metrics such as AUC, FPR, F1-Score, precision, and specificity. For Data Set 2, the confidence interval for precision is [0.7813, 0.8795], AUC is 0.9137 with a confidence interval of [0.8769, 0.9505], and there are corresponding values and confidence intervals for FPR, F1-Score, precision, and specificity.
The impact of data preprocessing/augmentation techniques on model performance can vary depending on the specific model and task. It is important to use the same data preprocessing/augmentation techniques for training and testing data. Data preprocessing/augmentation techniques can be computationally expensive, so it is important to choose effective techniques. The performance of machine learning models can be improved by carefully considering the data preprocessing/augmentation techniques used. To compare the performance of the model itself, the proposed method uses only simple data processing methods. First, the image is flipped horizontally and vertically, then the pixel values of the image are normalized, and the data is standardized to follow a normal distribution. This process reduces the cost of data augmentation and the computational cost of the model, which is more conducive to the later use of the model.
Although the proposed method works extremely well in Data Set 1 and Data Set 2, it is not a substitute for clinical trial results. The results of the model can support clinical trials by providing physicians with a diagnostic reference based on patient symptoms, identifying high-risk individuals, and allowing early intervention to reduce disease incidence. It can also streamline patient care by quickly screening low-risk cases and reducing waiting times, but the model results can only be used as a reference in a limited number of situations and are not a substitute for professional medical judgment.
Conclusion
In this paper, we design a low FPR method for disease prediction based on COVID-19 lung images. The ACC and AUC in the validation set of the COVID-19 chest CT scan data set are 83.04% and 91.37%, and the accuracy and AUC of the COVID-19 chest X-ray data validation set are 89.57% and 97.01%, respectively, which are much higher than those of other methods, including some recent methods. The FPR metrics of the low-FPR deep neural network model designed in this paper are 12.75% and 7.52% in the CT and chest X-ray data sets, respectively. This indicates that the proposed method is superior to the corresponding metrics of other comparative models, and has a more obvious improvement compared to the metrics before optimization.
In clinical practice, failure to detect positive cases is often considered unacceptable. The method proposed in this study improves the accuracy of positive case detection by optimizing the FPR, thereby increasing the model’s sensitivity to positive cases and improving the predictive accuracy and reliability, while the two-stage model optimization approach can further improve the model’s performance within defined constraints. In the future, we will collect a large number of medical imaging datasets for different pathologies and use these data to further validate and test the model. By testing the model under different pathological conditions, we will identify the model’s limitations and areas for improvement, and then further optimize and improve the existing model to increase its versatility in different diseases.
This study primarily addresses the binary classification problem between diseased and normal samples, without addressing the classification of different disease types and their pathological features. However, in actual clinical practice, raw image data often contain multiple disease categories and are typically affected by problems such as data imbalance and poor quality. Therefore, future research should focus on the following directions while continuing to optimize model performance and metrics: Designing loss functions suitable for multi-class tasks to address the problem of class imbalance; Applying data augmentation techniques to improve the model’s robustness to sample variation; Using transfer learning to learn generalizable features from large medical imaging datasets, thereby improving the model’s ability to generalize in new environments. In addition, given the difficulty of fully adapting deep learning models to the complex and variable nature of clinical applications, it is necessary to establish dynamic updating and continuous learning mechanisms.
In summary, in addition to optimizing model performance, it is also necessary to explore diversified solutions for data quality, category imbalance, model generalization, etc., and build an AI diagnostic system with dynamic updates and continuous learning to ensure its long-term applicability in complex medical environments.
References
- 1.
Sharma B, Gupta K, Bhardwaj G, Kumar S. A Comprehensive Study on Impact, Analysis and Complications of COVID-19. In: 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM). IEEE; 2022. p. 102–106.
- 2. Chen S, Liu M, Deng P, Deng J, Yuan Y, Cheng X, et al. Reinforcement learning based diagnosis and prediction for COVID-19 by optimizing a mixed cost function from CT images. IEEE Journal of Biomedical and Health Informatics. 2022;26(11):5344–5354. pmid:35951561
- 3. Mostafa G, Mahmoud H, Abd El-Hafeez T, ElAraby ME. Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms. Journal of Big Data. 2024;11(1):88.
- 4. Lawton S, Viriri S, et al. Detection of COVID-19 from CT lung scans using transfer learning. Computational Intelligence and Neuroscience. 2021;2021:6621607.
- 5. Jaiswal A, Gianchandani N, Singh D, Kumar V, Kaur M. Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. Journal of Biomolecular Structure and Dynamics. 2021;39(15):5682–5689. pmid:32619398
- 6. Ardakani AA, Kanafi AR, Acharya UR, Khadem N, Mohammadi A. Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. Computers in Biology and Medicine. 2020;121:103795. pmid:32568676
- 7. Maurya P, Kushwaha A, Khare A, Prakash O. Balancing accuracy and efficiency: A lightweight deep learning model for COVID-19 detection. Engineering Applications of Artificial Intelligence. 2024;136:108999.
- 8. Chen Y, Yang XH, Wei Z, Heidari AA, Zheng N, Li Z, et al. Generative adversarial networks in medical image augmentation: A review. Computers in Biology and Medicine. 2022;144:105382. pmid:35276550
- 9.
Yu X, Wang SH, Zhang X, Zhang YD. Detection of COVID-19 by GoogLeNet-COD. In: Intelligent Computing Theories and Application: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part I. Springer; 2020. p. 499–509.
- 10. Bougourzi F, Distante C, Dornaika F, Taleb-Ahmed A. PDAtt-Unet: Pyramid dual-decoder attention Unet for Covid-19 infection segmentation from CT-scans. Medical Image Analysis. 2023;86:102797. pmid:36966605
- 11. Ismael AM, Şengür A. Deep learning approaches for COVID-19 detection based on chest X-ray images. Expert Systems with Applications. 2021;164:114054. pmid:33013005
- 12. Shah V, Keniya R, Shridharani A, Punjabi M, Shah J, Mehendale N. Diagnosis of COVID-19 using CT scan images and deep learning techniques. Emergency Radiology. 2021;28:497–505. pmid:33523309
- 13. Attallah O, Ragab DA, Sharkas M. MULTI-DEEP: A novel CAD system for coronavirus (COVID-19) diagnosis from CT images using multiple convolution neural networks. PeerJ. 2020;8:e10086. pmid:33062453
- 14. Reshi AA, Rustam F, Mehmood A, Alhossan A, Alrabiah Z, Ahmad A, et al. An efficient CNN model for COVID-19 disease detection based on X-ray image classification. Complexity. 2021;2021:6621607.
- 15. Wang W, Pei Y, Wang SH, Gorriz JM, Zhang YD. PSTCNN: Explainable COVID-19 diagnosis using PSO-guided self-tuning CNN. Biocell: Official Journal of the Sociedades Latinoamericanas de Microscopia Electronica et al. 2023;47(2):373. pmid:36570878
- 16. Eliwa EHI, El Koshiry AM, Abd El-Hafeez T, Farghaly HM. Utilizing convolutional neural networks to classify monkeypox skin lesions. Scientific Reports. 2023;13(1):14495. pmid:37661211
- 17. Wang SH, Wu X, Zhang YD, Tang C, Zhang X. Diagnosis of COVID-19 by wavelet Renyi entropy and three-segment biogeography-based optimization. International Journal of Computational Intelligence Systems. 2020;13(1):1332–1344.
- 18. Wang J, Zhu Y, Gao Y, Cai Z, Sun Y, Peng F. Detecting Energy Theft in Different Regions Based on Convolutional and Joint Distribution Adaptation. IEEE Transactions on Instrumentation and Measurement. 2023;.
- 19. Marini F, Walczak B. Particle swarm optimization (PSO). A tutorial. Chemometrics and Intelligent Laboratory Systems. 2015;149:153–165.
- 20. Chowdhury ME, Rahman T, Khandakar A, Mazhar R, Kadir MA, Mahbub ZB, et al. Can AI help in screening viral and COVID-19 pneumonia? Ieee Access. 2020;8:132665–132676.
- 21. Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Kashem SBA, et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Computers in biology and medicine. 2021;132:104319. pmid:33799220
- 22.
Yang X, He X, Zhao J, Zhang Y, Zhang S, Xie P. COVID-CT-dataset: a CT scan dataset about COVID-19. arXiv preprint arXiv:200313865. 2020;.
- 23. Carrington AM, Fieguth PW, Qazi H, Holzinger A, Chen HH, Mayr F, et al. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC medical informatics and decision making. 2020;20:1–12. pmid:31906931
- 24. Joshi AM, Nayak DR, Das D, Zhang Y. LiMS-Net: A lightweight multi-scale CNN for COVID-19 detection from chest CT scans. ACM Transactions on Management Information Systems. 2023;14(1):1–17.
- 25. Abdullah M, berhe Abrha F, Kedir B, Tagesse TT. A Hybrid Deep Learning CNN model for COVID-19 detection from chest X-rays. Heliyon. 2024;10(5). pmid:38468922
- 26.
Xingyi Y, Xuehai H, Jinyu Z, Yichen Z, Shanghang Z, Pengtao X. Covid-ct-dataset: a ct image dataset about covid-19. arXiv preprint arXiv:200313865. 2020;5.
- 27.
Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence. vol. 31; 2017.
- 28. Zhang YD, Satapathy SC, Liu S, Li GR. A five-layer deep convolutional neural network with stochastic pooling for chest CT-based COVID-19 diagnosis. Machine vision and applications. 2021;32:1–13. pmid:33169050
- 29. Ebenezer AS, Kanmani SD, Sivakumar M, Priya SJ. Effect of image transformation on EfficientNet model for COVID-19 CT image classification. Materials Today: Proceedings. 2022;51:2512–2519.