A combined monthly precipitation prediction method based on CEEMD and improved LSTM

Xinyun Jiang

doi:10.1371/journal.pone.0288211

Abstract

With the continuous decline of water resources due to population growth and rapid economic development, precipitation prediction plays an important role in the rational allocation of global water resources. To address the non-linearity and non-stationarity of monthly precipitation, a combined prediction method based on complementary ensemble empirical mode decomposition (CEEMD) and a modified long short-term memory (LSTM) neural network was proposed. Firstly, the CEEMD method was used to decompose the monthly precipitation series into a set of relatively stationary sub-sequence components, which can better reflect the local characteristics of the sequence and further understand the nonlinear dynamic characteristics of the sequence. Then, improved LSTM neural networks were employed to predict each sub-sequence. The proposed improvement method optimized the hyper-parameters of LSTM neural networks using particle swarm optimization algorithm, which avoided the randomness of artificial parameter selection. Finally, the predicted results of each component were superimposed to obtain the final prediction result. The proposed method was validated by taking the monthly precipitation data from 1961 to 2020 in Changde City, Hunan Province as an example. The results of the case study show that, compared with other traditional prediction methods, the proposed method can better reflect the trend of precipitation changes and has higher prediction accuracy.

Citation: Jiang X (2023) A combined monthly precipitation prediction method based on CEEMD and improved LSTM. PLoS ONE 18(7): e0288211. https://doi.org/10.1371/journal.pone.0288211

Editor: Lin Wang, Huazhong University of Science and Technology, CHINA

Received: May 4, 2023; Accepted: June 21, 2023; Published: July 13, 2023

Copyright: © 2023 Xinyun Jiang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data cannot be shared publicly because of the request from the data owner that the data cannot be shared. The data underlying the results presented in the study are available from Changde Meteorological Bureau of Hunan Province and can be obtained through the following email: lal1112023@163.com. This is the email address of data administrator, Changde Meteorological Bureau, Hunan Province.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The World Meteorological Organization (WMO) published the report “State of Climate Services 2021: Water” in October 2021, stating that climate change, particularly the increased frequency of extreme heat, will cause a global water crisis and that governments and relevant international organizations are still lacking effective response measures [1]. Global heat, drought, and various types of extreme weather have resulted in a drop in global precipitation, yet with the rapid expansion of society, the usage of freshwater resources has increased more than 35 times in the previous 300 years, resulting in a water crisis in many parts of the world. In response, China’s Ministry of Water Resources issued its 2022 Water Resources Management Essentials, which suggests increasing the level of refinement in water resource management. Precipitation forecasting is a key foundation for enhanced water resource management and an important instrument for extreme weather warnings. This research proposes a combined monthly precipitation prediction model based on CEEMD-PSO-LSTM to increase the accuracy of monthly precipitation prediction and can provide a decision basis for energy warning and meteorological catastrophe prevention.

Numerous techniques for predicting precipitation have long been studied by researchers both domestically and overseas. The primary prediction techniques can be categorised into three groups: machine learning methods, statistical prediction methods, and causal prediction methods. The basic foundation of causal prediction methods is the study of climatic changes, atmospheric circulation processes, and the creation of causal models [2–4]. However, because there are so many variables that might affect atmospheric precipitation, choosing the right one makes it challenging to predict precipitation using the genesis model. To build prediction models based on historical precipitation data, statistical prediction methods use time-tested techniques like Markov chains [5] and autoregressive integrated moving average (ARIMA) [6]. The non-smooth and non-linear properties of precipitation data are frequently ignored by statistical prediction methods, which leads to typically subpar prediction accuracy. Due to the complexity and randomness of mid-long term precipitation, machine learning methods mostly use neural network algorithms to develop forecasting models, such as LSTM [7–9]. This can result in reduced convergence speed and forecasting accuracy of neural networks. In summary, single machine learning algorithms, statistical prediction models, and genetic prediction methods all have limitations and do not provide accurate prediction results for mid-long term precipitation. Both domestic and foreign scholars have made certain improvements to single neural network algorithms, such as optimizing the hyper-parameters of neural networks using optimization algorithms [10, 11], which avoids the randomness of manual selection. For nonlinear data, decomposition methods [9, 12, 13] can be used to decompose sequences into multiple IMFs in order to better capture the nonlinear components in the signal and improve prediction accuracy.

Empirical model decomposition (EMD) [14] is a decomposition model proposed by Huang N. E. in 1988 that has strong advantages for processing non-linear and non-stationary data. With the gradual deepening of EMD research, it is discovered that the process of adaptive decomposition in EMD inevitably involves modal confusion. Huang [15] et al. suggested the ensemble empirical model decomposition (EEMD) method, which successfully lessens the influence of the mode mixing phenomenon on the results while maintaining the physical uniqueness of the derived IMF components. In summary, the monthly precipitation data was decomposed by using the CEEMD method, which is an improvement over EEMD and effectively reduces the reconstructed signal noise so that the residual white noise is negligible;For nonlinear and non-stationary data, the signal characteristics are usually complex and dynamic, making it difficult to effectively process and extract them using traditional linear analysis methods such as Fourier analysis and wavelet analysis. However, the CEEMD algorithm based on EMD can achieve adaptive decomposition of non-linear and non-stationary signals and decompose them into multiple IMFs, thereby better capturing the non-stationary and non-linear components contained in the signal. LSTM neural networks are widely valued due to their superiority in capturing Spatio-temporal relationships [16]. In time series data, current data is often influenced by data from multiple previous time steps. Traditional feedforward neural networks cannot meet this requirement, resulting in errors in prediction. However, the design of long- and short-term memory units in LSTM networks allows for effective learning and memory of information from previous time steps, thereby better handling long-term dependencies and improving prediction performance. LSTM networks were used to predict the decomposed IMF components; to avoid the contingency of traditional manual selection of LSTM hyperparameters, the particle swarm algorithm [17–20] was used to find the optimal hyperparameters in LSTM. Compared with other optimization algorithms, PSO (Particle Swarm Optimization) algorithm can quickly converge to the global optimal solution and usually has faster calculation speed than other algorithms. The PSO algorithm has good robustness and reliability, is less dependent on initial parameters, and is not easily trapped in local optimal solutions. A combined CEEMD-PSO-LSTM prediction model is established, which effectively improves the prediction accuracy on the basis of traditional precipitation prediction method.

To test the effectiveness of the model, Changde City, known as an international wetland city and an international garden city, was selected as the study area, and the model was applied to the monthly precipitation prediction of Changde City and compared with PSO-LSTM [21], EMD-LSTM [22] and EEMD-LSTM [23]. The analysis of the algorithm results shows that the combined CEEMD-PSO-LSTM prediction model has better prediction results and provides more effective decision support for water resources development planning for global development.

Methodology

CEEMD

Huang N E [14] proposed EMD in 1988 as part of the Hilbert-Huang transform (HHT), which has the advantage of processing non-stationary signals. The core of EMD is to decompose the signal into several intrinsic model function (IMF) and monotonic residual by using the signal polar point information. The EEMD decomposition is a solution to the problem of modal aliasing by adding normally distributed white noise to the original order and then decomposing the white noise as a whole to obtain the corresponding IMF components. In the CEEMD decomposition process, multiple sets of white noise with opposite signs are added to the original signal, which effectively reduces the noise of the reconstructed signal and achieves the goal of negligible residual white noise. The CEEMD decomposition process is as follows:

Add L sets of white noise of opposite sign to the original sequence X(t) to obtain a positive noise sequence X₁ and a negative noise sequence X₂, with the total number of sequences being 2L. (1) Where N is the white noise sequence.
The resulting sequence is decomposed by EMD separately to obtain m IMF components; each group of components is noted as and , where i = 1, …, L; j = 1, …, m.
By taking the average of and for each set of IMF components, determine the value of the j th component. (2)
Each set of IMF values obtained is accumulated to obtain the original sequence. The formula is as in Eq (3): (3) Where r(t) is the residual trend term, i.e., the residual.

PSO

The main idea of the PSO algorithm is to use “particles” to simulate the behaviour of a “flock of birds foraging”. When PSO solves an optimisation problem, each particle has its own position, velocity, and fitness value. The problem solution corresponds to the position of each particle in the search space, and PSO completes the optimisation by the particle following the optimal solution it finds and the optimal solution of the population.

Assume that there is currently a D-dimensional search space with a population of N particles, the position of the i th particle is: X_id = (x_i1, x_i2, ⋯, x_iD); the velocity is: V_id = (v_i1, v_i2, ⋯, v_iD); the individual history optimal adaptation value is f_p; the population history optimal adaptation value is f_g; the individual optimal solution is P_id,pbest = (p_i1, p_i2, ⋯, p_iD); the population optimal solution is P_d,gbest = (p_1,gbest, p_2,gbest, ⋯, p_d,gbest). The velocity update equation is given by equation: (4) where i is the particle serial number; k is the number of iterations; d is the particle dimension serial number; ω is inertia weight; c₁ and c₂ are the individual learning factor and population learning factor respectively; r₁ and r₂ are random numbers between 0 and 1; and are the velocity vector and position vector of particle i in dimension d at the k th iteration respectively; is the historical optimal position of particle i in dimension d at the k th iteration; is the historical optimal position of the population in dimension d at the k th iteration; the first term to the right of the equal sign is the inertia term. The larger the ω the greater the inertial exploration capacity of the particle; the 2nd term to the right of the equal sign is the cognitive term, i.e. the distance and direction between the current position of a particle and its historically optimal position; the 3rd term to the right of the equal sign is the social term, i.e.the distance and direction between the current position of the particle and the optimal position of the group history. The position update equation is as follows: (5)

If the current number of iterations reaches the pre-set maximum number or the minimum difference in the adaptation value between two iterations, stop the iteration and output the optimal solution, otherwise the particle velocity and position are updated using equations, and.

The MSE of LSTM network prediction was used as the particle fitness value: (6) Where y_i is the true value; is the predicted value; n is the number of samples.

LSTM

The LSTM network is a special kind of recurrent neural network whose memory module structure retains the recurrent feedback mechanism while introducing gating units to control the rate of information accumulation and adding forgetting gates to selectively control the addition of new information, thus solving the long-term dependency problem that exists in sequence modelling.

The LSTM neural network adds input, output and forget gate structures to the RNN. The architecture of LSTM block is shown in Figs 1 and 2, which mainly consists of a memory unit, an oblivion gate, an input gate and an output gate, all of which take values in the range of (0, 1) and are controlled by a sigmoid function. The memory unit is the key component in the LSTM module, and its state at moment t is c_t, which contains information about the long-term memory of the series. Assuming moment t, the inputs to the memory module in the LSTM include the input sequence X_t, state c_t−1 of memory cell at time t − 1 and state h_t of hidden layer at time t − 1. In the memory module, the forgetting gate controls how much of the value of c_t−1 is forgotten at the previous moment and controls the degree of influence of c_t−1 on c_t; in the input gate, the linear combination of the state of the hidden layer at the previous moment h_t−1 and the input sequence X_t are used as the input to the sigmoid function. is the information retained after input gate control, i.e. the extent to which control X_t affects c_t; in the output gate, the linear combination of the previous implicit state h_t−1 and the input sequence X_t is used as input to the sigmoid function, which determines the output information to be retained by the memory neuron, i.e. the degree of influence of c_t on h_t. Equations—for each of the three gates are as follows: (7) (8) (9) where f_t, i_t, and o_t are the output results of the forget, input, and output gates at time t;W_f, W_i, and W_o are the weight matrices of the forget, input, and output gates; b_f, b_i, and b_o are the bias terms of the forget, input, and output gates respectively; σ is the sigmoid activation function.

Download:

Fig 1. Architecture of LSTM block.

https://doi.org/10.1371/journal.pone.0288211.g001

Download:

Fig 2. Components of LSTM network.

https://doi.org/10.1371/journal.pone.0288211.g002

The memory module c_t is obtained by adding and multiplying c_t−1 with the addition of the information retained after being controlled by the input gate, calculated as follows: (10)

The state of the memory cell c_t and the state of the hidden layer h_t at time t are calculated as follows: (11) (12)

In Formula (10), W_c and b_c denote the weight matrix and bias term of the input cell state; tanh denotes the hyperbolic tangent activation function.

Monthly precipitation prediction model based on CEEMD-PSO-LSTM

The predictability of monthly precipitation data using traditional prediction techniques is poor because they are inherently variable, non-linear, and non-smooth. This paper selects the CEEMD decomposition model and LSTM network prediction model, selects the PSO algorithm to optimise the hyperparameters in the LSTM, and proposes a combined CEEMD-PSO-LSTM prediction model in light of the benefits of empirical modal decomposition in series smoothing and the excellent performance of long and short-term memory neural networks in time series data prediction. The data decomposition, PSO optimization, and LSTM network prediction are the three key stages that make up the combined prediction model. Fig 3 illustrates the specific steps in this procedure.

CEEMD decomposition phase. The CEEMD model is used to decompose the monthly precipitation data into the L-group IMF components {IMF₁, IMF₂, ⋯, IMF_L} and the residual error RES.
PSO optimization phase. The PSO-LSTM model is used to make LSTM predictions for each set of IMF components separately, and the hyperparameters of the LSTM network are optimized by PSO. The specific steps are as follows:
1. ① Initialize the particle swarm parameters. Determine the population size, number of layers, number of iterations, individual and population learning factors, limited range of particle position and velocity values, and inertia weights.
2. ② Randomly initialise the particle velocity and position in the bounded range. Randomly generate a particle, K is the number of iterations of the LSTM, lr is the learning rate, H₁ is the number of neurons in layer 1 hidden layer, and H₂ is the number of neurons in layer 2 hidden layer.
3. ③ Determine the fitness function of the PSO algorithm. The LSTM model is constructed with the initialized parameters, and the mean square error between the true value and the predicted value is used as the fitness function of the particle population, as shown in Fig 3.
4. ④ Calculate the position and fitness value of the particles at each iteration. The positions and velocities of the particles are continuously adjusted according to equations until the fitness function is minimized to determine the optimal positions and thus the optimal parameters of the LSTM.
LSTM prediction phase. The optimal parameters determined after the PSO search are used to predict each set of IMF components obtained from the decomposition, and then the predicted values of each set of IMFs and the predicted values of the residual term RES are added together to obtain the final prediction results.

Download:

Fig 3. Monthly precipitation data prediction process of CEEMD-PSO-LSTM combined model.

https://doi.org/10.1371/journal.pone.0288211.g003

In this paper, RMSE and MAE are selected as evaluation indexes, and the specific formulas are as follows: (13) (14) (15) In the formula: x_r is the true value; is the predicted value; N is the number of samples.

Experimental results

The experimental environment is Intel(R) Core (TM) i7-10510U, 2.30GHz processor, NVIDIA GeForce MX250 graphics card. Algorithm model uses MATLAB R2022a as programming language.

Overview of the study area

Changde is in the south of mainland China, in the northwest of Hunan Province, and is famous as the “land of fish and rice” in the south of the Yangtze River. It is located in the Dongting Lake system in the middle reaches of the Yangtze River, the lower reaches of the Yuan River, and the middle and lower reaches of the river, as well as the northeastern end of the Wuling Mountains and the Xuefeng Mountains. Changde is 174.6 kilometres wide from east to west and 187.2 kilometres long from north to south, with a total area of 18,200 square kilometres. With an average annual temperature of 16.7°C and 1200–1900 mm of precipitation, Changde has a subtropical humid monsoon climate. Water resources in Changde are relatively abundant, with a total of 15.337 billion cubic meters of water resources on average over the years, with a per capita possession of 2,556 cubic meters. Changde has abundant rainfall and water resources mainly come from precipitation, which is unevenly distributed in space and time, with precipitation and runoff accounting for more than 70% of the year during the period of abundant water (April to October). Changde is one of the first international wetland cities and international garden cities in China, so it is important to carry out medium and long-term precipitation forecasting work in the area.

Research data sources and pre-processing

The monthly precipitation data from seven representative meteorological stations in Changde, Hunan Province, from January to December 1961 to 2020 were selected, and the distribution of meteorological stations is shown in Fig 4. In view of the limited space of the article, the validity and accuracy of the combined prediction model are mainly verified by using the measured data from Changde station with station number 57662.

Download:

Fig 4. Distribution of meteorological stations in Changde.

https://doi.org/10.1371/journal.pone.0288211.g004

The data is from Changde Meteorological Bureau. The data is true and accurate, part of the data is shown in Table 1. The number of data samples is 720.

Download:

Table 1. Partial monthly precipitation in Changde from 1961 to 2020.

https://doi.org/10.1371/journal.pone.0288211.t001

Data for August to October 1976 were missing and Three times Hermite interpolation [24] were performed on the data for that year to maintain data continuity and reduce data loss. The final results obtained are shown in Table 2.

Download:

Table 2. Results of data preprocessing.

https://doi.org/10.1371/journal.pone.0288211.t002

CEEMD decomposition of monthly precipitation time series

The monthly precipitation time series has obvious non-linearity and non-smoothness, and CEEMD was used to decompose the series. When decomposed, add a white noise amplitude of 0.02 times the standard deviation of the original signal, set the average number of processing to 50. The original series was decomposed into eight IMF components [25], and the effect is shown in Fig 5, where each IMF presents the influence of different influencing factors on precipitation at different scales. Compared with the EMD IMFs, the CEEMD-processed IMFs do not show the mode mixing that often occurs in EMD, and each IMF contains significantly different characteristic time scales. After CEEMD processing, it can be seen that the auxiliary noise residuals of the IMF components are decreased and the signal-to-noise ratio is increased compared to the EEMD IMFs. This allows the information of the original series to be more accurately reflected, and the total number of set averaging, the decomposition takes less time.

Download:

Fig 5. CEEMD decomposition results of monthly precipitation.

https://doi.org/10.1371/journal.pone.0288211.g005

PSO-LSTM network prediction

LSTM prediction was performed on each of the eight IMF and RES sequences obtained after CEEMD decomposition. Before making the predictions, each sequence data was first normalised separately with the following equation. (16) where x_i is the original data; x_max, x_min are the maximum and minimum values of the original data respectively; and X_i is the normalised data.

The timestep of the input samples for the LSTM network is 12, with 12 consecutive months of data as the input variable and the next month’s data as the output variable. The data set is divided as shown in Table 3, and the network uses a mini-batch input, with the number of samples per input, batch-size = 16.

Download:

Table 3. Dataset partitioning and partial parameter settings.

https://doi.org/10.1371/journal.pone.0288211.t003

The LSTM network architecture adopts a 2 + 1 stack structure (2 layers of LSTM and 1 layer of fully connected layer). In order to prevent the neural network from overfitting, Dropout technology is added to each layer [26] (parameter value is 0.2). Dropout technology is to randomly discard some neurons in the neural network model, the weights of the discarded neurons are set to zero, and discard neurons do not participate in network training forward calculation and reverse calculation, reducing the weight parameters, reduce the overfitting phenomenon, Fig 6 is a schematic diagram of Dropout technology. The neural network training uses the full iterative epoch method (the sample set is recorded as 1 epoch after training), the loss function uses the root mean square error, and the gradient optimization algorithm uses Adam [27].

Download:

Fig 6. Schematic of Dropout.

https://doi.org/10.1371/journal.pone.0288211.g006

After dividing the training set, the training set data is input into LSTM, and a loss value is generated by forward propagation calculation. According to the loss value, the Adam optimizer uses the BPTT algorithm to adjust the weight of LSTM. By comparing the fitness function, the PSO algorithm is used to find the optimal number of two hidden layer units H₁, H₂, the number of training K, and the learning rate lr. The PSO partial parameter settings are shown in Table 4, M is the maximum iteration parameter. H₁, H₂ in the range [1, 200], K in the range [10, 100], lr in the range [0.001, 0.01]. As the number of training iterations increases, the accuracy of LSTM predictions improves. The results of hyper-parameter optimization can be found in Table 5. After LSTM training, the test set data is input into LSTM, the denormalized results are compared with the actual results, and the error indicators are used to evaluate the prediction performance of LSTM.

Download:

Table 4. Some parameters of PSO.

https://doi.org/10.1371/journal.pone.0288211.t004

Download:

Table 5. Some parameters of PSO.

https://doi.org/10.1371/journal.pone.0288211.t005

Simulation comparison results analysis

The above sub-series components predicted by the LSTM model were overlaid and reconstruction was carried out to obtain the monthly precipitation predictions. In this section, the research idea of verifying the superiority of the CEEMD-PSO-LSTM prediction model is mainly divided into three steps, and the evaluation indexes are RMSE and MAE. In the first step, BP neural network [28], SVM [29] and ANN [30] are used to compare the prediction accuracy of the three commonly used models with the LSTM model to prove the advantages of LSTM in dealing with time series modeling problems. In the second step, compare the EMD-LSTM, EEMD-LSTM and CEEMD-LSTM [31, 32] combined prediction models to prove the superiority of using CEEMD to process non-stationary data. In the third step, PSO-LSTM, CEEMD-LSTM and CEEMD-PSO-LSTM [33] models are compared to prove that the CEEMD-PSO-LSTM combined model has the best prediction performance.

The prediction results of the above prediction model in the test set samples are shown in Figs 7 and 8, and the predictors are shown in Table 6. Compared with BP and ANN, SVM and LSTM models have lower prediction error metrics and LSTM models have the highest prediction accuracy, with RMSE, MAE and MAPE of 77.92, 59.17 and 43.81%, respectively. Compared with ANN, RMSE, MAE and MAPE decreased by 27.9%, 26.76% and 21.85%, respectively. It can be seen from Fig 8 that the LSTM model can still have more accurate accuracy at the point where the monthly precipitation changes greatly, which proves that the LSTM model can capture the temporal correlation in the data more effectively. Compared with EMD-LSTM and EEMD-LSTM, RMSE and MAE of CEEMD-LSTM are lower than those of the other two models. As can be seen from Fig 8, although the trend of these three models is basically the same as the actual value, the CEEMD-LSTM predicted value is basically consistent with the actual value at the inflection point, which shows the superiority of the CEEMD decomposition method for processing non-stationary data. Finally, compared with PSO-LSTM and CEEMD-LSTM models, the RMSE of CEEMD-PSO-LSTM model is reduced by 25.26% and 6.43% respectively, and the MAE is reduced by 21.14% and 8.37% respectively. In Fig 8, where the sample points fluctuate greatly, the CEEMD-PSO-LSTM method has better prediction results, which proves that the CEEMD-PSO-LSTM method has better estimated performance. Through Figs 8 and 9, the CEEMD-PSO-LSTM method improves the overall forecasting precision while controlling the deviation of most prediction points from the actual data within a small range. This is due to the fact that the model decomposes the monthly precipitation data into a number of sub-series with significant regularity before forecasting them separately, improving the accuracy of the prediction.

Download:

Fig 7. Prediction results of the comparison models.

https://doi.org/10.1371/journal.pone.0288211.g007

Download:

Fig 8. Prediction results of the comparison models.

https://doi.org/10.1371/journal.pone.0288211.g008

Download:

Fig 9. Bar chart of the predication results.

https://doi.org/10.1371/journal.pone.0288211.g009

Download:

Table 6. Some parameters of PSO.

https://doi.org/10.1371/journal.pone.0288211.t006

Conclusion

This paper combines the current research hotspots in the field of deep learning and focuses on the prediction accuracy of monthly precipitation to conduct research and establish a CEEMD-PSO-LSTM prediction model. Initially, the CEEMD decomposition algorithm was used to decompose the monthly precipitation with non-linearity and non-stationary characteristics into sub-sequences. Then, PSO was used to optimize the hyper-parameters of the LSTM network. Finally, the LSTM model was used to predict each sub-sequence, and the predicted results were combined to obtain the final monthly precipitation prediction. The following conclusions were drawn:

The CEEMD decomposition method is used to decompose the monthly precipitation series, which reduces the interaction between different time scale information. Compared with EMD-LSTM and EEMD-LSTM methods, the RMSE of CEEMD-LSTM is reduced by 13.78% and 3.75% respectively, MAE is reduced by 6.21% and 0.9% respectively,and MAPE is reduced by 13.83% and 6.95% respectively. It shows that the CEEMD effectively improves prediction accuracy.
The PSO algorithm is used to optimize the hyper-parameters of the LSTM network, which avoids the contingency of manual selection. Compared with other optimization algorithms, PSO algorithm can quickly converge to the global optimal solution and have good robustness. LSTM network has certain superiority in handling time series data. Through the design of LSTM units, the LSTM network can effectively learn and remember information from previous time steps, thereby better handling long-term dependencies and improving prediction performance.
The combined CEEMD-PSO-LSTM model has been developed to effectively improve the accuracy of monthly precipitation prediction. The model is suitable for processing non-smooth, non-linear time-series data and can also be extended to the fields of electricity, traffic flow and text recognition.
In the next step, other influencing factors [34–37] can be introduced, such as pressure, temperature, etc., to further improve the reliability and prediction accuracy of the model.

References

1. Chen Erlie. The global freshwater crisis is getting worse. Ecological Economy. 2022; 38(10): 5–8. https://chn.oversea.cnki.net/kcms/detail/detail.aspx?FileName=STJJ202210023&DbName=DKFX2022
- View Article
- Google Scholar
2. Hong Mei, Zhang Ren, Feng Mang, et al. A new dynamical forecasting modlel of Western Pacific subtropical high ridge line index based on the improved self-memorization principle and forecast experiments. Chinese Journal of Geophysics. 2016; 59(07): 2362–2376.
- View Article
- Google Scholar
3. Chen R, Zhang W, Wang X. Machine learning in tropical cyclone forecast modeling: A review. Atmosphere. 2020; 11(7): 676.
- View Article
- Google Scholar
4. Singh V, Konduru R T, Srivastava A K, et al. Predicting the rapid intensification and dynamics of pre-monsoon extremely severe cyclonic storm ‘Fani’ (2019) over the Bay of Bengal in a 12-km global model. Atmospheric Research. 2021; 247: 105222.
- View Article
- Google Scholar
5. Long Y, Tang R, Wang H, et al. Monthly precipitation modeling using Bayesian non-homogeneous hidden Markov chain. Hydrology Research. 2019; 50(2): 562–576.
- View Article
- Google Scholar
6. Lai Y, Dzombak D A. Use of the autoregressive integrated moving average (ARIMA) model to forecast near-term regional temperature and precipitation. Weather and Forecasting. 2020; 35(3): 959–976.
- View Article
- Google Scholar
7. Torcasio R C, Federico S, Comellas Prat A, et al. Impact of Lightning Data Assimilation on the Short-Term Precipitation Forecast over the Central Mediterranean Sea. Remote Sensing. 2021; 13(4): 682.
- View Article
- Google Scholar
8. Yan J., DiMeo P., Sun L. and Du X. Machine learning in tropical cyclone forecast modeling: A review. [3]Atmosphere. 2020; 11(7): 676.
- View Article
- Google Scholar
9. Lv S X, Wang L. Multivariate wind speed forecasting based on multi-objective feature selection approach and hybrid deep learning model. Energy. 2023; 263: 126100.
- View Article
- Google Scholar
10. Wu B, Wang L, Tao R, et al. Interpretable tourism volume forecasting with multivariate time series under the impact of COVID-19. Neural Computing and Applications. 2023; 35(7): 5437–5463. pmid:36373134
- View Article
- PubMed/NCBI
- Google Scholar
11. Wu B, Wang L, Zeng Y R. Interpretable tourism demand forecasting with temporal fusion transformers amid COVID-19. Applied Intelligence. 2022: 1–22.
- View Article
- Google Scholar
12. Wu B, Wang L, Zeng Y R. Interpretable wind speed prediction with multivariate time series and temporal fusion transformers. Energy. 2022; 252: 123990.
- View Article
- Google Scholar
13. Peng L, Wang L, Xia D, et al. Effective energy consumption forecasting using empirical wavelet transform and long short-term memory. Energy. 2022; 238: 121756.
- View Article
- Google Scholar
14. Huang N E, Shen Z, Long S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings Mathematical Physical & Engineering Sciences. 1998; 454(1971): 903–995.
- View Article
- Google Scholar
15. Wu Z, Huang N E. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Advances in adaptive data analysis. 2009; 1(01): 1–41.
- View Article
- Google Scholar
16. Yeh J R, Shieh J S, Huang N E. Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in adaptive data analysis. 2010; 2(02): 135–156.
- View Article
- Google Scholar
17. Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures. Neural computation. 2019; 31(7): 1235–1270. pmid:31113301
- View Article
- PubMed/NCBI
- Google Scholar
18. Sengupta S, Basak S, Peters R A. Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives. Machine Learning and Knowledge Extraction. 2018; 1(1): 157–191.
- View Article
- Google Scholar
19. Bansal J C. Particle swarm optimization//Evolutionary and swarm intelligence algorithms. Springer, Cham. 2019: 11–23.
20. Schutte J. F., Reinbolt J. A., Fregly B. J., Haftka R. T., & George A. D. Parallel global optimization with the particle swarm algorithm. International journal for numerical methods in engineering. 61(13): 2296–2315. pmid:17891226
- View Article
- PubMed/NCBI
- Google Scholar
21. Chopard B, Tomassini M. Particle swarm optimization[M]//An Introduction to Metaheuristics for Optimization. Springer, Cham. 2018: 97–102.
22. Ren X, Liu S, Yu X, et al. A method for state-of-charge estimation of lithium-ion batteries based on PSO-LSTM. Energy. 2021; 234: 121236.
- View Article
- Google Scholar
23. Li T, Wang B, Zhang L, et al. Short-term load forecasting using optimized LSTM networks based on EMD//2018 10th International Conference on Communications, Circuits and Systems (ICCCAS). IEEE. 2018: 84–88.
24. Lorentz R A. Multivariate Hermite interpolation by algebraic polynomials: A survey. Journal of computational and applied mathematics. 2000; 122(1-2): 167–201.
- View Article
- Google Scholar
25. Guo Y, Guo J, Sun B, et al. A new decomposition ensemble model for stock price forecasting based on system clustering and particle swarm optimization. Applied Soft Computing. 2022: 109726.
- View Article
- Google Scholar
26. Liu R.W., He W.T., Wang L.L., et al. CEEMD-LSTM-based diagnosis method for off-design working conditions of centrifugal pump. Applied Soft Computing. 2022: 109726.
- View Article
- Google Scholar
27. Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research. 2014; 15(1): 1929–1958.
- View Article
- Google Scholar
28. Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.
29. Zhang Y, Cui N, Feng Y, et al. Comparison of BP, PSO-BP and statistical models for predicting daily global solar radiation in arid Northwest China. Computers and Electronics in Agriculture. 2019; 164: 104905.
- View Article
- Google Scholar
30. Jakkula V. Tutorial on support vector machine (svm). School of EECS, Washington State University. 2006; 37(2.5): 3.
- View Article
- Google Scholar
31. Ghritlahre H K, Prasad R K. Application of ANN technique to predict the performance of solar collector systems-A review. Renewable and Sustainable Energy Reviews. 2018; 84: 75–88.
- View Article
- Google Scholar
32. Zhang X, Wu X, He S, et al. Precipitation forecast based on CEEMD–LSTM coupled model. Water Supply. 2021; 21(8): 4641–4657.
- View Article
- Google Scholar
33. Rezaei H, Faaljou H, Mansourfar G. Stock price prediction using deep learning and frequency decomposition. Expert Systems with Applications. 2021; 169: 114332.
- View Article
- Google Scholar
34. Pei Y, Zhenglin L, Qinghui Z, et al. Load forecasting of refrigerated display cabinet based on CEEMD–IPSO–LSTM combined model. Open Physics. 2021; 19(1): 360–374.
- View Article
- Google Scholar
35. Xu F.; Yuan H.; Lin L.; Chen W. Convective-scale ensemble forecasts of the heavy precipitation of Typhoon Lekima (2019) in Zhejiang Provinc. Atmospheric Research. 2017: 106543.
- View Article
- Google Scholar
36. Zhang P, Jia Y, Zhang L, et al. A deep belief network based precipitation forecast approach using multiple environmental factors. Intelligent Data Analysis. 2018; 22(4): 843–866.
- View Article
- Google Scholar
37. Zhang C J, Zeng J, Wang H Y, et al. Correction model for rainfall forecasts using the LSTM with multiple meteorological factors. Meteorological Applications. 2020; 27(1): e1852.
- View Article
- Google Scholar

[ref1] 1. Chen Erlie. The global freshwater crisis is getting worse. Ecological Economy. 2022; 38(10): 5–8. https://chn.oversea.cnki.net/kcms/detail/detail.aspx?FileName=STJJ202210023&DbName=DKFX2022
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Hong Mei, Zhang Ren, Feng Mang, et al. A new dynamical forecasting modlel of Western Pacific subtropical high ridge line index based on the improved self-memorization principle and forecast experiments. Chinese Journal of Geophysics. 2016; 59(07): 2362–2376.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Chen R, Zhang W, Wang X. Machine learning in tropical cyclone forecast modeling: A review. Atmosphere. 2020; 11(7): 676.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Singh V, Konduru R T, Srivastava A K, et al. Predicting the rapid intensification and dynamics of pre-monsoon extremely severe cyclonic storm ‘Fani’ (2019) over the Bay of Bengal in a 12-km global model. Atmospheric Research. 2021; 247: 105222.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Long Y, Tang R, Wang H, et al. Monthly precipitation modeling using Bayesian non-homogeneous hidden Markov chain. Hydrology Research. 2019; 50(2): 562–576.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Lai Y, Dzombak D A. Use of the autoregressive integrated moving average (ARIMA) model to forecast near-term regional temperature and precipitation. Weather and Forecasting. 2020; 35(3): 959–976.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Torcasio R C, Federico S, Comellas Prat A, et al. Impact of Lightning Data Assimilation on the Short-Term Precipitation Forecast over the Central Mediterranean Sea. Remote Sensing. 2021; 13(4): 682.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Yan J., DiMeo P., Sun L. and Du X. Machine learning in tropical cyclone forecast modeling: A review. [3]Atmosphere. 2020; 11(7): 676.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Lv S X, Wang L. Multivariate wind speed forecasting based on multi-objective feature selection approach and hybrid deep learning model. Energy. 2023; 263: 126100.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Wu B, Wang L, Tao R, et al. Interpretable tourism volume forecasting with multivariate time series under the impact of COVID-19. Neural Computing and Applications. 2023; 35(7): 5437–5463. pmid:36373134
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref11] 11. Wu B, Wang L, Zeng Y R. Interpretable tourism demand forecasting with temporal fusion transformers amid COVID-19. Applied Intelligence. 2022: 1–22.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref12] 12. Wu B, Wang L, Zeng Y R. Interpretable wind speed prediction with multivariate time series and temporal fusion transformers. Energy. 2022; 252: 123990.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref13] 13. Peng L, Wang L, Xia D, et al. Effective energy consumption forecasting using empirical wavelet transform and long short-term memory. Energy. 2022; 238: 121756.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref14] 14. Huang N E, Shen Z, Long S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings Mathematical Physical & Engineering Sciences. 1998; 454(1971): 903–995.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. Wu Z, Huang N E. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Advances in adaptive data analysis. 2009; 1(01): 1–41.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Yeh J R, Shieh J S, Huang N E. Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in adaptive data analysis. 2010; 2(02): 135–156.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref17] 17. Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures. Neural computation. 2019; 31(7): 1235–1270. pmid:31113301
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref18] 18. Sengupta S, Basak S, Peters R A. Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives. Machine Learning and Knowledge Extraction. 2018; 1(1): 157–191.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref19] 19. Bansal J C. Particle swarm optimization//Evolutionary and swarm intelligence algorithms. Springer, Cham. 2019: 11–23.

[ref20] 20. Schutte J. F., Reinbolt J. A., Fregly B. J., Haftka R. T., & George A. D. Parallel global optimization with the particle swarm algorithm. International journal for numerical methods in engineering. 61(13): 2296–2315. pmid:17891226
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref21] 21. Chopard B, Tomassini M. Particle swarm optimization[M]//An Introduction to Metaheuristics for Optimization. Springer, Cham. 2018: 97–102.

[ref22] 22. Ren X, Liu S, Yu X, et al. A method for state-of-charge estimation of lithium-ion batteries based on PSO-LSTM. Energy. 2021; 234: 121236.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref23] 23. Li T, Wang B, Zhang L, et al. Short-term load forecasting using optimized LSTM networks based on EMD//2018 10th International Conference on Communications, Circuits and Systems (ICCCAS). IEEE. 2018: 84–88.

[ref24] 24. Lorentz R A. Multivariate Hermite interpolation by algebraic polynomials: A survey. Journal of computational and applied mathematics. 2000; 122(1-2): 167–201.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref25] 25. Guo Y, Guo J, Sun B, et al. A new decomposition ensemble model for stock price forecasting based on system clustering and particle swarm optimization. Applied Soft Computing. 2022: 109726.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref26] 26. Liu R.W., He W.T., Wang L.L., et al. CEEMD-LSTM-based diagnosis method for off-design working conditions of centrifugal pump. Applied Soft Computing. 2022: 109726.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref27] 27. Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research. 2014; 15(1): 1929–1958.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref28] 28. Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.

[ref29] 29. Zhang Y, Cui N, Feng Y, et al. Comparison of BP, PSO-BP and statistical models for predicting daily global solar radiation in arid Northwest China. Computers and Electronics in Agriculture. 2019; 164: 104905.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref30] 30. Jakkula V. Tutorial on support vector machine (svm). School of EECS, Washington State University. 2006; 37(2.5): 3.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref31] 31. Ghritlahre H K, Prasad R K. Application of ANN technique to predict the performance of solar collector systems-A review. Renewable and Sustainable Energy Reviews. 2018; 84: 75–88.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref32] 32. Zhang X, Wu X, He S, et al. Precipitation forecast based on CEEMD–LSTM coupled model. Water Supply. 2021; 21(8): 4641–4657.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref33] 33. Rezaei H, Faaljou H, Mansourfar G. Stock price prediction using deep learning and frequency decomposition. Expert Systems with Applications. 2021; 169: 114332.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref34] 34. Pei Y, Zhenglin L, Qinghui Z, et al. Load forecasting of refrigerated display cabinet based on CEEMD–IPSO–LSTM combined model. Open Physics. 2021; 19(1): 360–374.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref35] 35. Xu F.; Yuan H.; Lin L.; Chen W. Convective-scale ensemble forecasts of the heavy precipitation of Typhoon Lekima (2019) in Zhejiang Provinc. Atmospheric Research. 2017: 106543.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref36] 36. Zhang P, Jia Y, Zhang L, et al. A deep belief network based precipitation forecast approach using multiple environmental factors. Intelligent Data Analysis. 2018; 22(4): 843–866.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref37] 37. Zhang C J, Zeng J, Wang H Y, et al. Correction model for rainfall forecasts using the LSTM with multiple meteorological factors. Meteorological Applications. 2020; 27(1): e1852.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

Figures

Abstract

Introduction

Methodology

CEEMD

PSO

LSTM

Monthly precipitation prediction model based on CEEMD-PSO-LSTM

Experimental results

Overview of the study area

Research data sources and pre-processing

CEEMD decomposition of monthly precipitation time series

PSO-LSTM network prediction

Simulation comparison results analysis

Conclusion

References