Figures
Abstract
SOC prediction is of great value to electric vehicle status assessment. Informer model is better than other models in SOC prediction, but there is still a gap in practical application. Therefore, based on the health assessment algorithm, a new optimized Informer model is proposed to predict SOC. Firstly, the health assessment is carried out through the historical running data of the electric vehicle to obtain the health matrix. Then, the health matrix is used to improve Encoder and Decoder modules and improve the prediction accuracy and speed of Informer model. Subsequently, the health matrix is utilized to optimize the prediction logic, reduce the influence of truncation error, and further improve the SOC prediction accuracy. Finally, using the Informer model before and after optimization, SOC prediction is performed using four different datasets. The results indicate that after optimizing the En-De module of Informer, prediction accuracy improved by approximately 15%, with prediction speed increasing by about 100%. Furthermore, optimizing the prediction logic to reduce truncation error further enhanced Informer’s prediction accuracy by around 20%.
Citation: Xie X, Huang F, Long Y, Peng Y, Zhou W (2025) An optimized informer model design for electric vehicle SOC prediction. PLoS ONE 20(3): e0314255. https://doi.org/10.1371/journal.pone.0314255
Editor: Dhasarathan Chandramohan, Thapar Institute of Engineering and Technology: Thapar Institute of Engineering and Technology (Deemed to be University), INDIA
Received: October 7, 2024; Accepted: November 7, 2024; Published: March 11, 2025
Copyright: © 2025 Xie et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: "All dataset files are available from the Data Review URL: https://aistudio.baidu.com/datasetdetail/304079"
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
State of Charge(SOC) prediction is of great significance to the assessment of electric vehicle status. Because of the influence of SOC on power output, energy management and operation of electric vehicles [1], SOC predictions have become a hot topic in electric vehicle research.
At present, SOC predictions of electric vehicle can be roughly divided into three research directions: SOC prediction through battery working condition; SOC prediction based on the running state change of the electric vehicle; and SOC prediction by combining the historical data of the same type of vehicle on the basis of the two.
Silva et al. [2] analyzed the relationship between the operation of lithium batteries and emissions gases, predicting the remaining SOC percentage based on CO2, NOx, and other gas concentrations in the circulating air inside electric vehicles, but the method needs the support of high-accuracy air data. If the research of Alarrouqi RA et al. [3] is referred to and the low-accuracy air data is used for battery state analysis and SOC prediction, the model prediction accuracy is reduced due to the influence of truncation error; Tekin M et al. [4] estimates the operating parameters of the electric vehicle and predicts SOC through the physical structure change of battery interlayer under different charging voltages. Due to the assumption in this law that transmission delays are fixed, when communication blocks occur and transmission speeds decrease, causing additional latency in data transmission, predictive results will exhibit noise; Kang H C et al. [5] starts with the parameter change of the battery and trains the SOC prediction model through the change of battery parameter by using the phenomenon of thermal expansion and cold contraction of the battery, but the method requires that the temperature of the battery is raised to 400°C and then slowly decreased, and the initial data is difficult to collect in actual situation, causing difficulty in prediction; Mustaffa Z et al. [6] uses the temperature difference of the surface layer of the battery to require a prediction model of surface temperature-SOC from a thermodynamic point of view. However, it is difficult to collect the surface temperature data of a single battery because electric vehicles adopt an integrated battery cluster as a power supply.
Jiang N et al. [7] obtains a prediction model of charging voltage, charging current, total voltage, ambient temperature and the remaining percentage of SOC under different operating conditions under different environmental temperatures. The model exhibits significant prediction errors near the transition between charging and discharging states; Yao Z et al. [8] divided the whole operation interval into several sub-intervals according to the changes of vehicle speed such as acceleration, deceleration and uniform speed, and each sub-interval adopts different SOC prediction strategies to improve the prediction accuracy. When integrating all sub-interval strategies to predict SOC for a new complete operational interval, the prediction accuracy is adversely affected by noise induced by marginal effects and the impact of multi-value predictions, resulting in suboptimal forecasting performance; Sarmokadam S et al. [9] analyzes influence of AC module on SOC and gave the prediction model of SOC according to characteristic data of AC module under different operation conditions. Because the reaction of the mechanical part of the electric vehicle to the AC module is not taken into account, the SOC prediction error of the model is large in the operating range characterized by alternating acceleration and deceleration; Gong et al. [10] analyzed the relationship between electric motor shaft rotation under different operating conditions, vehicle driving speed, and energy consumption of the energy storage module. They proposed a torque-speed-SOC predictive model based on their findings. The model has fewer prediction steps and lower complexity of prediction calculation. The disadvantage of the model is that the prediction accuracy of the model is poor when the running state changes frequently and the torque of the motor changes sharply.
Shao et al. [11] utilized historical data of the same vehicle model at different speeds to derive predictive models for the rate of change of SOC under various operating conditions, indirectly predicting SOC. Although the probability of multi-value prediction is reduced, the influence of SOC characteristic curve deviation is ignored, resulting in the increase of system error; Wang R et al. [12] carry out classified training on the historical data of the same vehicle type and different roads to obtain a speed-SOC prediction model. However, this method assumes that the SOC characteristic curve remains fixed and does not change with varying operational states, leading to poor prediction accuracy of the resulting model; Zhu C et al. [13] using environmental temperature variations, classified historical data for training and obtained predictive models for SOC depletion rates at different temperatures, and based on the model and the SOC margin, the total amount of the SOC at the next moment is predicted. On this basis, Li Q et al. [14] estimate the amount of change in SOC by a weighted average of the rate of change, taking into account the part aging probability of the mechanical part. Both methods refer to the standard SOC characteristic curve to train predictive models, without considering the system errors caused by SOC characteristic curve shifts, resulting in poor prediction accuracy of the model; Lim H et al [15] train the SOC prediction model through the historical data of uniform speed driving at different altitudes, and directly predict the SOC; Guo N et al. [16] train the traction power prediction model by using the historical data of climbing driving under different gravity constants, and indirectly predict the SOC through relevant calculation of mechanics. Due to only considering gravity’s impact on driving speed and not its effect on mechanical components, both methods struggle to handle system errors caused by SOC characteristic curve shifts. As a result, the predictive models obtained exhibit poor robustness against disturbances.
The existing optimization algorithms have addressed some deficiencies in the neural network model but have not significantly enhanced the SOC prediction performance of the model. Selvaraj V et al. [17] combined Bayesian decision mode with Support Vector Regression (SVR), Gaussian Process Regression (GPR) and Linear Regression to improve the SOC multi-value prediction problem of state boundary, but increased the complexity of calculation, and the prediction error of the model was not significantly reduced; Kim D M et al. [18] proposed an improved Deep Neural Network (DNN) for SOC prediction of Fuel Cell Electric Vehicles (FCEVs) based on the changing characteristics of the operating state of FCEVs. The method has fewer prediction steps and higher prediction accuracy, but the prediction calculation is complex, the single prediction time is long, and the hardware requirement is high, which is difficult to be applied in practice; Singla P et al. [19] introduces AND, OR and NOT logic judgment on the original Perturb & Observe (P&O) model to replace the conventional threshold judgment to carry out SOC prediction, although the multi-value prediction frequency is reduced, but the truncation error is not processed in time, and the obtained P&O model exhibits prolonged fixed-value prediction issues when predicting SOC during steady-state operation ranges; Altun Y E et al. [20] applies multi-mode prediction to the existing Artificial Neural Network (ANN). The prediction accuracy of the ANN is temporarily improved, however, when the significant digits after the decimal point of the SOC data are 1-2 bits lower than those of other characteristic data, the prediction accuracy of the ANN is significantly reduced.
As a new prediction model, Informer model has attracted the attention of researchers because of its excellent performance. However, the current Informer model is difficult to meet the high requirements of accuracy and speed for SOC prediction of electric vehicles. Through the health assessment algorithm, the model structure can be improved, the prediction process can be optimized, and the prediction performance of Informer can be improved.
Based on this, a new optimized Informer model for electric vehicle SOC prediction is proposed. Compared to existing SOC prediction models and other Informer models, it features the following characteristics:
- Structural optimization: Through health assessment, improving the En-De module, optimizing the model structure, and without increasing data costs, enhancing the predictive performance of Informer.
- Logical Optimization: Addition of fundamental state variables allows for indirect SOC prediction through equivalent conversion, reducing the adverse effects of truncation errors. This further enhances the predictive performance of Informer while retaining the benefits of structural optimization.
This paper is divided into 5 parts: Part 1 focuses on health assessment and optimization analysis, Part 2 analyzes the Informer model and the En-De optimization scheme, Part 3 examines truncation error and proposes prediction logic optimization, Part 4 involves model testing and result analysis, and Part 5 presents the conclusions.
2 Health assessment
Health assessment is a weight optimization algorithm proposed by Beatrice Greaves, Lars Landberg et al. 2011 [21]. It transforms the correlation weights between features into a health matrix based on the statistical probability distribution characteristics of model outputs.
The health assessment process is shown in Fig 1:
- (1). Data Acquisition: The operational data of total voltage, total current, ambient temperature, vehicle speed, motor temperature, and similar parameters is acquired through the data acquisition module [22];
- (2). Data Cleaning: Eliminate null value, abnormal value and noise fluctuation of operation data to improve data reliability [23];
- (3). Feature Screening: Feature data is screened out from the cleaned operation data according to the strength of SOC Correlation [24];
- (4). Matrix Computation: Using feature data, after determining matrix elements and dimension division schemes, matrix calculations are conducted. This includes computing the weights of each feature’s impact on SOC under different operating conditions, as well as assessing the dimensions of interactions among different features. [25];
The brief calculation formula for Matrix Computation is as follows:
where i is the feature number from 1 to N, μ is the mean vector, and Σ is the covariance matrix.
- (5). Matrix Generation: The calculation results of matrix are arranged sequentially to obtain the weight matrix
. Each element of Q, representing a different operating state, specifies a weight factor for the feature [26]. Factor analysis was then used on the running data to generate the correlation matrix
. Finally, Q is multiplied by R to obtain the health matrix A, namely:
where N is the dimension of the matrix, usually 4-12. When N is greater than 12, dimension reduction of operation data is required.
The optimization analysis of health assessment is as follows
Let the first layer input of the prediction model , with a principal diagonal element of 1, and a coefficient matrix of
between (0,1) of the remaining elements. Sort
so that the order of
row and column labels is consistent.
During the prediction process, the coefficient matrix for the i - th layer input
can be viewed as the i - th power operation of
:
The health assessment optimization can be regarded as multiplying the input with the health matrix A to obtain
, that is,
, then the coefficient matrix of
is
, which is calculated as follows:
In Equation (4), A reduces the weakly correlated weight of , that is, on the basis of maintaining the dominance and symmetry of the dominant diagonal, the weakly correlated principal element of
is adjusted to between (0,1). After optimization, all
in Equation (3) is replaced by
, and the weight of weak correlation decreases every time the forward calculation of the model is performed, so as to reduce the influence of weak correlation features and improve the prediction performance of the model.
The effects of input z and coefficient matrix D are different due to different prediction processes, and the prediction performance of the model is improved differently after optimization.
The health assessment algorithm, when applied to the SOC prediction of electric vehicles, has two main drawbacks:
- High Data Costs: High dimensionality of operation data required by the algorithm leads to large workload of data collection, data cleaning and feature screening, and increases data costs [27];
- High Complexity: In order to select the main features, a high-order selection module is introduced, which increases the algorithm complexity [28];
In order to solve the above problems, reduce the algorithm complexity and reduce the data costs, the following measures can be taken:
Firstly, the principal component analysis method is used to simplify the high-order selection module and reduce the complexity of the algorithm.
Then, the data costs are reduced by dimensionality reduction and decoupling of the running data.
3 Informer prediction model
3.1 Model principle
Informer is a self-attention prediction model proposed by Zhou Haoyi et al. in 2021 [29]. Different from other prediction models, Informer adopts the structure of encoder-decoder separation. In the prediction process, there is no other connection between encoder and decoder except the transmission of coding output, and there is no feedback mechanism in all prediction steps.
The principle diagram of Informer model is shown in Fig 2.
In Informer, there are three key structures:
- (1). Self-Attention Distilling Operation: The encoder is composed of a plurality of encoding layers, each encoding layer uses sparse sampling, reduces the sequence length, and outputs strong correlation features in the last layer, and then obtains the encoded output through the feature connection layer.
- (2). Generative Style Decoder: Decoder’s decoding operation is carried out based on masking multi-head attention, and decoding output with a specified length can be obtained only by forward calculation, so that the occurrence probability of multi-value prediction is reduced while cumulative error diffusion is prevented.
- (3). Prob Sparse Self-Attention Mechanism: Based on the conclusion that a small part of dot product focuses on contribution, Informer improves the attention degree of strong correlation features, and improves the self-attention mechanism by increasing the weight of strong correlation features in the prediction process.
The existing Informer model has the following shortcomings:
- Frequent Calculation: In order to improve the attention of strong correlation features, the Encoder module needs to perform two dot product calculations, which reduces the prediction speed of Informer [30–31];
- Decoding Defect: Decoding input, consisting of 0 placeholder and encoded output. The weight of the placeholder is too high, which affects the decoding reliability of the Decoder module and reduces the prediction accuracy of Informer [32–33].
- High Complexity: Sparse sampling query of self-attention mechanism increases model complexity and reduces the lightweight advantage of Informer [34–35].
In order to make up for the above shortcomings and improve the prediction performance of Informer, it is planned to obtain the health matrix through health assessment, and optimize the model structure of Informer from two aspects of encoder and decoder.
3.2 Encoder optimization
Encoder has multiple encoding layers, each with a self-attention sublayer and a convolutional sublayer. Let the input of the i - th encoding layer be and the output be
. In each encoding layer,
is advanced forward by multi-head self-attention, and then convolutional calculations are used to reduce the sequence length to obtain
, i.e.,:
In Equation (5), Conv represents convolutional calculation, Attention represents probabilistic sparse self-attention, and after the Attention operation, the output p of the self-attention sublayer can be obtained, that is:
In Equation (6), softmax is the normalization function, is the probability sparse matrix obtained after sampling
,
is the variance, K,
,V is the adjustment matrix of the sampling key value, and p is the output of the self-attention sublayer.
In , the weights of all features are equal, and the weights of the strongly correlated features in
are increased through the health matrix A, that is:
In Equation (5), is used instead of
, which can accelerate the generation of sparse sampling and
, and improve the encoding speed of Encoder and the prediction speed of Informer.
The optimization analysis of Encoder is as follows
In Equation (6), the i - th query element of , which is obtained by the following operation:
where ,
,
represent the i - th line in
, K, V,
,
represents the asymmetric exponential kernel Exp
/
), and
represents the variance.
In Equation (6) and Equation (8), the Top-u query with a sparse metric M(,K) is used, i.e.,:
where denotes the sequence length of matrix K.
In sparse self-attention, the length of the query and the key are equal, i.e., =
= L, and the computational complexity is O(L · ln L). Since all features of
have equal weights, in order to increase the weights of strongly correlated features, Top-u queries need to perform two dot product calculations, which affects the encoding speed of Encoder.
By replacing with
as the input of each encoding layer, the weight of strongly correlated features can be increased, and the Top-u query only needs to perform one dot product calculation, and the computational complexity of sparse self-attention is reduced fromO(L · ln L) to O(
· ln L), which improves the encoding speed and accelerates theprediction speed of the Informer model.
3.3 Decoder optimization
Let the Decoder be composed of L decoding sublayers overlapping. In the decoding process, initial input and feature data y need to undergo L decoding operations to obtain the decoding output
, that is:
where represents the decoding computation of the i - th decoding sublayer, and
.
is the input of the first decoding sublayer,
is the input of the i - th decoding sublayer, and y represents the characteristic data at the prediction time.
In , all features are weighted equally. The weight of the strongly correlated features in the
can be increased through the health assessment, namely:
where h is the health factor, which is obtained by the health matrix compression transformation. In each decoding sublayer, the decoding output
obtained by
is multiplied by the health factor h and used as the input of the next decoding sublayer, which can improve the weight of strongly correlated features, and the decoding accuracy of Decoder and the prediction accuracy of Informer model can be increased.
The optimization analysis of Decoder is as follows
The input of the first decoding sublayer can be expressed as a sequence of
, i.e.,:
where is the encoded output in Fig 2,
is the sequence placeholder with a scalar of 0 (corresponding to all 0s of the
in Fig 2),
represents the sequence length of the encoded output,
represents the sequence length of the placeholder,
represents the feature dimension, and Concat represents the vector connection. By setting the 0 placeholder and
masking dot product, masked multi-head probabilistic sparse self-attention can be applied to the decoding process of Decoder.
Although the masking dot product can be set to prevent the autoregressive phenomenon of each feature focusing on its own next moment in , the weight of the coded output
and the placeholder
is equal, and the cumulative error caused by
spreads every time the forward computation of the decoded sublayer is performed. The diffusion of accumulated errors reduces the decoding accuracy of the Decoder.
The optimization scheme of Equation (11) is equivalent to increasing the weight of and decreasing the weight of
, and it is known from Equation (3) and Equation (4) that the cumulative error caused by
is reduced once every time the forward decoding is carried out, which improves the decoding reliability of Decoder and thus improves the prediction accuracy of Informer.
3.4 En-De optimization
Based on Section 2.1, Informer adopts a structure with separated Encoder and Decoder, and there are no feedback mechanisms. According to Section 2.2 and Section 2.3, optimizing the Encoder module can improve the Informer prediction speed, and optimizing the Decoder module can improve the Informer prediction accuracy, and the two optimization effects do not conflict. En-De combination optimization can simultaneously improve the prediction speed and accuracy of Informer.
4 Prediction logic optimization
4.1 Truncation error
In Literatures [3], [19] and [20], there is a problem that truncation errors cause a decrease in prediction accuracy, which frequently occurs in certain cases. In fact, all prediction models have truncation errors. The reason is that, due to the accuracy limitation of the acquisition module, the significant digit of SOC is only 0 or 1 digit after the decimal point in most cases, which is much lower than the significant digit of 3-4 digits after the decimal point of other characteristics. During prediction, the model standardizes data precision by aligning the significant figures of predicted values with the feature in the training data that has the highest number of significant figures. The larger the significant bit difference between the predicted value and the true value, the greater the SOC truncation error.
Assuming that the predicted value of SOC is 28.5273 for a certain time, when the true value is recorded as 28.5, the truncation error is {(28.5273-28.5)/ 28.5273} × 100% = 0.096%, and when the true value is recorded as 28, the truncation error is {(28.5273-28)/ 28.5273} × 100% = 1.848%, the truncation error is increased by about 20 times. From the analysis in Sections 1 and 2.3, the truncation error will accumulate with the forward calculation. When the truncation error accumulates to a certain extent, the prediction accuracy of the model will decrease significantly. In addition to the limitation of acquisition accuracy, the state change of the power supply module leads to the virtual electricity phenomenon that the actual value of SOC is lower than the measured value [36–37], and the long-term low-speed operation of electric vehicles, with minimal state changes, leads to the phenomenon of fixed SOC measurement [38–39], which will result in the truncation error.
In order to reduce the negative influence of truncation error, the basic health matrix can be obtained through health assessment, basic state variables are added, equivalent conversion is carried out, SOC indirect prediction is used instead of SOC direct prediction, and the prediction logic of Informer is optimized.
4.2 Prediction logic optimization scheme
Informer’s prediction logic optimization scheme is based on the following four assumptions:
- Each motor temperature corresponds to a unique SOC characteristic curve. Within the normal operating range, this curve is a smooth and continuous single-value curve. Moreover, the characteristic curves for the same vehicle model at the same temperature are completely consistent. [40–41];
- All characteristics are smooth and continuous in the normal operation interval and are affected by SOC, motor temperature and other characteristics. Furthermore, when any one feature changes independently while all other characteristics remain unchanged, the rate of change of SOC will correspondingly alter [42–43];
- In the normal operation range, the SOC characteristic curve of the same vehicle type and different temperature has the same curve trend, and there is only horizontal and vertical translation, resulting in a numerical difference [44–45];
- The influence of a single feature on SOC is calculated by approximate simplification, i.e.,
. Among them, k is the characteristic influence multiplier and m is the influence coefficient of the motor temperature, which is determined by the mechanical structure of the electric vehicle; n is the influence coefficient of other features, which is determined by the physical structure of the energy storage module; ∆ x is the amount of change for the selected feature, and l is a constant. Within the normal operating interval, m, n, and l have a corresponding fixed value for each feature, and this value does not change due to the deviation of the SOC characteristic curve [46–47].
The prediction logic optimization scheme based on the above assumptions is as follows:
- Set the temperature of a certain motor and the corresponding characteristic parameters of the electric vehicle to the basic state by referring to the equipment manual or rated parameters;
- Using the Informer model optimized with historical data and structural adjustments for prediction, and obtaining the baseline health matrix of the system. Through the matrix elements of this baseline health matrix, we calculate the basic rate of change
of SOC, as well as the values m, n, and l corresponding to each feature assumed in hypothesis 4.
- According to the temperature of the motor, the complete normal operating range is divided into several sub-intervals. For the boundary temperatures of the entire range and the intermediate temperatures at the intersections of the sub-intervals, predictions are made using the dataset from step 2 and the optimized Informer model to obtain the characteristic parameters corresponding to each motor temperature when the SOC change rate is
. These characteristic parameters, along with the SOC basic change rate and the corresponding motor temperatures, are labeled as sub-basic states;
- When predicting, first, we should obtain a predicted value of the motor temperature through an Informer model, find a corresponding basic state or a sub-basic state according to the predicted temperature and then carry out equivalent conversion to obtain a corresponding SOC change rate at the predicted time;
If there are p features, the influence rate of the features is calculated to be ,
,···,
. Relative to the basic rate
, the multiple of the prediction rate
is
, and the relevant calculation equation is as follows:
When it is to be predicted, the corresponding calculation equation for the SOC prediction rate is:
where is the basic rate of change of the SOC, which is obtained in step 2.
- Combined with the historical value of SOC and the sampling interval, the predicted value of SOC is calculated through the
of the change rate.
The predicted logic optimization diagram is shown in Fig 3. Due to the fundamental health matrix, which is independent of the Informer model and specifically used for the variable matrix of SOC basic states, to avoid misunderstanding, Fig 3 directly labels the SOC basic change rates and the parameters for change rate. The basic health matrix involved in both is not separately labeled.
5 Model test
5.1 Evaluation indicator
R2 (Coefficient of Determination), MSE (Mean Squared Error) and MAE (Mean Absolute Error) are adopted as three evaluation indicators, and the calculation of each indicator is shown in Equation (15)–(17):
where is the true value,
is the predicted value and
is the true average value. R2 indicates the degree of fit, and the closer the value is to 1, the better the fitting effect of the model. MSE, MAE represents the degree of dispersion, and the closer the value is to 0, the smaller the prediction fluctuation of the model.
The prediction accuracy is improved, and the judgment is performed by R2 promotion. When MSE and MAE are unchanged or decreased, R2 is increased from 0.8 to 0.85 and prediction accuracy is improved by 25%. R2 improved from 0.9 to 0.95 with a 50% improvement in prediction accuracy. The improvement in prediction speed is assessed by reducing the predicted time required. When MSE, MAE and R2 remain unchanged or decrease, for single prediction, it takes 30s before optimization and 20s after optimization, and the prediction speed increases by 50%; it takes 30s before optimization and takes 10s after optimization, and the prediction speed increases by 200%.
If the optimization is effective, MSE and MAE will also decrease while R2 increases. However, due to the different calculation methods of the model, there is a difference in the degree of predicted fluctuation[48] and in order to increase the overall fit, the decrease of the fit of specific points will increase the cost of accident treatment[49]. Therefore, when two indicators improve by more than 10%, and the remaining one decreases by no more than 2%, it is also considered effective optimization.
5.2 Simulation environment
The SOC prediction test was performed using 4 different published data sets, each of which is outlined below:
- Data set of Charging of Chinese Urban Electric Vehicle: This dataset is sourced from the public dataset provided by the 4th Jiangsu Big Data Development and Application Competition—New Energy Track of “SEED” in 2023, including current, voltage, SOC and energy recorded in electric vehicle charging piles in Beijing, Shanghai and Shenzhen, totally 4 characteristics. Beijing data set with No. 1 and Shanghai data set with No. 2 are selected for simulation test;
- Data set of Battery Module Charging and Discharging of 20 New Energy Vehicles: This data set is sourced from the public data set provided in Literature [50], including the charging and discharging data of 20 electric vehicles, and the data of each electric vehicle includes 9 characteristics including voltage, current, temperature, energy, SOC, etc. Select the running data of No. 2 vehicle and No. 17 vehicle for simulation test;
- Data set of Power Battery Health of New Energy Vehicles: This data set is sourced from the public data set provided by the 2nd session of Pazhou Algorithm Competition in Guangzhou, including the charging and discharging data of two electric vehicles, and the operation data of each electric vehicle, including such characteristics as speed, voltage, current and SOC. The operation data of two electric vehicles were simulated and tested respectively.
- Data set of Health Degree of New Energy Vehicles in Guangdong-Hong Kong-Macao Bay Area: This data set is sourced from the public data set provided in the 2022 New Energy Smart Vehicle Big Data Innovation Competition in Guangdong-Hong Kong-Macao Bay Area, including the operation data of five new energy vehicles of each type. The operation conditions of the data set are complex, including not only the charging and discharging conditions, but also the high-low speed operation, sudden braking, continuous acceleration or deceleration, turning and turning and other operation conditions. The operational data of each electric vehicle includes multiple features such as SOC, speed, voltage, current, and temperature. This study selects 12 features, including SOC, for simulation testing.
In the phase of structural optimization, four models including before optimization, Encoder optimization, Decoder optimization and En-De optimization were selected for simulation test. In the logic optimization stage, 4 sets of models are optimized for prediction logic in turn, and 8 sets of models before and after logic optimization are used for simulation test respectively.
From datasets 1 to 4, several subsets of data are partitioned for simulation testing. Each subset contains approximately 20,000 operational records, with the first 80% of the data serving as the training set and the remaining 20% as the test set. All data in datasets 1 to 4 are sourced from real-time data collected by the remote monitoring module of electric vehicles.
In the prediction effect charts, V1 represents non optimization, V2 represents Encoder optimization, V3 represents Decoder optimization, V4 represents En-De optimization, true represents true value. The horizontal axis represents the sequence number of the data; the vertical axis represents the standardized predicted values, with the vertical axis labels capable of taking any real number. Each prediction effect chart is accompanied by a corresponding table that records relevant prediction results.
The simulation test platform is Intel Core i7-7700HQ CPU, 8GB DDR4 RAM, 1TB WDC HDD, and the operating system is Windows 10.
To reduce the impact of dimensional differences, all predictions were normalized as follows:
where represents the unnormalized prediction;
represents the normalized forecast;
represents the variance of the forecast and
represents the average value of all forecasts.
See Table 1 for Informer Model Parameters.
5.3 Model optimization verification
The prediction results using dataset 1 are shown in Figs 4 and 5.
The prediction results using dataset 2 are depicted in Figs 6 and 7.
The prediction results using dataset 3 are shown in Figs 8 and 9.
The prediction results using dataset 4 are depicted in Figs 10 and 11.
The prediction results of each simulation test are presented in Tables 2–9.
As can be seen from Figs 4 to 11, after the En-De optimization, the prediction fitting degree of the Informer model has improved. From Tables 2–9, it can be observed that after the En-De optimization, the R2 indicator has improved by 15%, both MSE and MAE have decreased, and the prediction time has also been reduced by 50%. From the discussion in Section 4.2, it is known that after the En-De optimization, the prediction accuracy of the Informer model has improved by 15%, and the prediction speed has increased by 100%. Based on health assessment, optimizing the En-De module can enhance the prediction performance of the Informer model.
5.4 Logic optimization verification
Comparing Fig 10 and 11 with Fig 4–7, Table 8 and 9 with Table 2–7, it is easy to see that when the operation condition of the electric vehicle is changed from a single charge/discharge state to a complex operation state with multi-state switching, the prediction performance of the Informer is obviously degraded. According to Section 3.1, the prediction performance degradation is caused by the truncation error of SOC. In order to reduce the adverse effect of truncation error, it is necessary to optimize the prediction logic of Informer.
A prediction logic optimization scheme is given in Section 3.2. In order to verify whether the proposed scheme can reduce the influence of truncation error and improve the SOC prediction performance of Informer, the operation data of No. 3 Type A vehicle and No. 2 Type B vehicle in data set 4 are selected for simulation test from July to October respectively.
The monthly simulation test of each vehicle includes not only the SOC prediction effect charts before and after logic optimization, but also the prediction effect charts of motor temperature after logic optimization and the corresponding prediction result table. Prediction 1 represents the prediction effect charts before logic optimization, and Prediction 2 represents the prediction effect charts after logic optimization.
In order to avoid the over-fitting phenomenon caused by repeated training on the same data, for the same model before and after the logic optimization, the data of the same vehicle in the same month and in different time periods are selected for simulation verification.
Taking Fig 3 as an example, parameter prediction involves pre-computation and is not performed during formal predictions. On the other hand, SOC calculation is a simple arithmetic operation that does not involve the Informer model, taking less than 0.1% of the time required for temperature prediction. Therefore, after optimizing the logic, the time required for motor temperature prediction is added to that for SOC prediction. Only the SOC prediction result table includes a column for prediction time, while the motor temperature prediction table removes the prediction time column.
The prediction results using the July data from No. 3, Type A Vehicle are shown in Figs 12–14.
The prediction results for No. 3, Type A Vehicle for the month of July are presented in Tables 10–12.
The prediction results using the August data from No. 3, Type A Vehicle are shown in Figs 15–17.
The prediction results for No. 3, Type A Vehicle for the month of August are presented in Tables 13–15.
The prediction results using the September data from No. 3, Type A Vehicle are shown in Figs 18–20.
The prediction results for No. 3, Type A Vehicle for the month of September are presented in Tables 16–18.
The prediction results using the October data from No. 3, Type A Vehicle are shown in Figs 21–23.
The prediction results for No. 3, Type A Vehicle for the month of October are presented in Tables 19–21.
The prediction results using the July data from No. 2, Type B Vehicle are shown in Figs 24–26.
The prediction results for No. 2, Type B Vehicle for the month of July are presented in Tables 22–24.
The prediction results using the August data from No. 2, Type B Vehicle are shown in Figs 27–29.
The prediction results for No. 2, Type B Vehicle for the month of August are presented in Tables 25–27.
The prediction results using the September data from No. 2, Type B Vehicle are shown in Figs 30–32.
The predicted results for No. 2, Type B Vehicle in September are shown in Tables 28–30.
The prediction effects using the October data for No. 2, Type B Vehicle are shown in Figs 33 to 35.
The prediction results for No. 2, Type B Vehicle in October are shown in Tables 31–33.
From Figs 13, 16, 19, 22, 25, 28, 31, and 34, it can be seen that the prediction logic optimization does not conflict with the En-De optimization. From Tables 11, 14, 17, 20, 23, 26, 29, and 32, it can be observed that after optimizing the prediction logic, the En-De optimization effect mentioned in Section 4.3, which improved prediction accuracy by 15% and prediction speed by 100%, has been retained. Furthermore, as indicated by Figs 12–35 and Tables 10–33, after optimizing the prediction logic, R2 metric improved by around 20%. As discussed in Section 4.2, after optimizing the prediction logic, Informer’s prediction accuracy improved by around 20%, and the precision improvement from the logic optimization coexists with the precision improvement from the En-De optimization mentioned in Section 4.3. By conducting health assessments, optimizing the prediction logic can mitigate truncation error effects and enhance the predictive performance of the Informer model.
6 Conclusion
Through health assessment, this study derived a health matrix. Simultaneously, improving the encoder and decoder optimizes the model structure, thereby synchronously enhancing the Informer’s prediction accuracy and speed. Building on this basis, the study indirectly predicts SOC by conducting health assessments, optimizing prediction logic, introducing basic state variables, and employing equivalent conversion to mitigate truncation error effects. This approach further enhances the Informer prediction accuracy. Through simulation test, the study validated the En-De optimization Informer model, achieving approximately a 15% increase in prediction accuracy and a 100% increase in prediction speed. After optimizing the prediction logic, the Informer model, built on the foundation of En-De optimization, reduces the impact of truncation errors, thereby further enhancing prediction accuracy by around 20%. Compared to the unoptimized Informer model, the Informer optimization based on health assessment incurs lower prediction costs and delivers superior prediction performance in SOC prediction for electric vehicles.
References
- 1.. Hong J, Liang F, Zhang H, et al. Data-driven multi-dimension driving safety evaluation for real-world electric vehicles[J]. IEEE Trans Vehicular Tech. 2024.
- 2.. Silva G, Dutra TA, Nunes-Pereira J, et al. Coupled and decoupled structural batteries: a comparative analysis[J]. J Power Sources. 2024;604:234392.
- 3.. Alarrouqi RA, Bayhan S, Al-Fagih L. An assessment of the energy performance of battery-electric buses in hot environments[J]. Sustain Energy Grids Networks.2024;38:101352.
- 4.. Tekin M, Karamangil MI. Development of dual polarization battery model with high accuracy for a lithium-ion battery cell under dynamic driving cycle conditions[J]. Heliyon. 2024;10(7):e28454. pmid:38571645
- 5.. Kang HC. Experiment on extinguishing thermal runaway in a scaled-down model of an electric vehicle battery[J]. Int J Automot Technol. 2024:1–10.
- 6.. Mustaffa Z, Al-Qadami EHH, Topa A, et al. Numerical assessment of the side impacts on lithium-ion battery module integrated with honeycomb reinforcement[J]. Eng Fail Anal. 2024;161:108290.
- 7.. Jiang N, Zhang J, Jiang W. Driving behavior-guided battery health monitoring for electric vehicles using extreme learning machine[j]. Appl Energy. 2024;364:123122.
- 8.. Yao Z, Shao R, Zhan S, et al. Energy management strategy for fuel cell hybrid electric vehicles using Pontryagin’s minimum principle and dynamic SOC planning[J]. Energy Source Part A Recovery Util Environ Eff. 2024;46(1):5112–32.
- 9.. Sarmokadam S, Mathew R. Novel architectures for power management in AC ring main system connected to electric vehicle charging station[J]. Energy Rep. 2024;11:3876–88.
- 10.. Gong C, Xu J, Lin Y. Plug‐in hybrid electric vehicle energy management with clutch engagement control via continuous‐discrete reinforcement learning[J]. Energy Technol. 2024:2301512.
- 11.. Shao Y, Zheng Y, Zhang J, et al. A cloud capacity estimation method for electric vehicle lithium-ion battery independent of cloud SOC[J]. J Energy Storage. 2024;85:110998.
- 12.. Wang R, Shi X, Su Y, et al. A predictive energy management strategy for plug-in hybrid electric vehicles using real-time traffic based reference SOC planning[J]. Proc Inst Mech Eng Part D J Automob Eng. 2024:09544070241239996.
- 13.. Zhu C, Wang S, Yu C. An improved Cauchy robust correction-Sage Husa extended Kalman filtering algorithm for high-precision SOC estimation of lithium-ion batteries in new energy vehicles[J]. J Energy Storage. 2024;88:111552.
- 14.. Li Q, Zhong J, Du J, et al. Probabilistic neural network-based flexible estimation of lithium-ion battery capacity considering multidimensional charging habits[J]. Energy. 2024:130881.
- 15.. Lim H. Online ecological energy management for plug-in HEVs using optimal SOC prediction and stochastic optimization[J]. IEEE Trans Intell Transp Syst. 2024.
- 16.. Guo N, Zhang W, Li J, et al. Predictive energy management of fuel cell plug-in hybrid electric vehicles: a co-state boundaries-oriented PMP optimization approach[J]. Appl Energy. 2024;362:122882.
- 17.. Selvaraj V, Vairavasundaram I. A Bayesian optimized machine learning approach for accurate state of charge estimation of lithium ion batteries used for electric vehicle application[J]. J Energy Storage. 2024;86:111321.
- 18.. Kim D, Kwon K, Cha K. Deep neural network-based modeling and optimization methodology of fuel cell electric vehicles considering power sources and electric motors[J]. J Power Sources. 2024;603:234401.
- 19.. Singla P, Boora S, Singhal P. Design and simulation of 4 kW solar power-based hybrid EV charging station[J]. Sci Rep. 2024;14(1):7336.
- 20.. Altun Y, Kutlar OA. Energy management systems’ modeling and optimization in hybrid electric vehicles[J]. Energies. 2024;17(7):1696.
- 21..
Greaves B, Collins J, Parkes J, Landberg L. Increasing certainty-combination methods for reliable probabilistic wind production forecasts [J]. Europe’s Premier Wind Energy Event-EWEA; 2011. p. 3–10.
- 22.. Takyi-Aninakwa P, Wang S, Liu G. Enhanced extended-input LSTM with an adaptive singular value decomposition UKF for LIB SOC estimation using full-cycle current rate and temperature data[J]. Appl Energy. 2024;363:123056.
- 23.. Manivannan R. Research on IOT-based hybrid electrical vehicles energy management systems using machine learning-based algorithm[J]. Sustain Comput: Inform Syst. 2024;41:100943.
- 24.. Yasin A, Dhaouadi R, Mukhopadhyay S. A novel supercapacitor model parameters identification method using metaheuristic gradient-based optimization algorithms[J]. Energies. 2024;17(6):1500.
- 25.. Sun B, Zhang Q, Mao H, et al. Validation of a statistical-dynamic framework for predicting energy consumption: a study on vehicle energy conservation equation[J]. Energy Convers Manag. 2024;307:118330.
- 26.. Li X, Li W, Deng D, et al. Reliability evaluation of electric vehicle sharing considering charging load transfer in a distribution network containing microgrids[J]. IEEE Access. 2024.
- 27.. Chen Z, Wu S, Shen S, et al. Co-optimization of velocity planning and energy management for autonomous plug-in hybrid electric vehicles in urban driving scenarios[J]. Energy. 2023;263:126060.
- 28.. Mekkaoui DE, Midoun MA, Shen Y. LA-RCNN: Luong attention-recurrent-convolutional neural network for EV charging load prediction[J]. Appl Intell. 2024;54(5):4352–69.
- 29.. Zhou H, Zhang S, Peng J. Informer: beyond efficient transformer for long sequence time-series forecasting[C]. Proc AAAI Conf Artif Intell. 2021;35(12):11106–15.
- 30.. Hu Y, Fan Z, Liu W, et al. An integrated navigation algorithm assisted by CNN-informer during short-time GNSS outages[J]. Meas Sci Technol. 2024;35(9):096309.
- 31.. Xu W, Li D, Dai W. Informer short-term PV power prediction based on sparrow search algorithm optimised variational mode decomposition[J]. Energies. 2024;17(12):2984.
- 32.. Qiu R, Dai W, Wang G, et al. Evaluation of different deep learning methods for meteorological element forecasting[J]. IEEE Access. 2024.
- 33.. Wang Z, Li W, Wang S, et al. Enhancing load forecasting for large industrial users through feature preference and error correction[J]. IEEE Access. 2024.
- 34.. Bommidi BS, Teeparthi K, Dulla Mallesham VK. ICEEMDAN-Informer-GWO: a hybrid model for accurate wind speed prediction[J]. Environ Sci Pollut Res Int. 2024;31(23):34056–81. pmid:38696015
- 35.. Wang Y, Lou Y, Lin Y, et al. ROP prediction method based on PCA–informer modeling[J]. ACS omega. 2024.
- 36.. Liao L, Yang D, Li X, et al. Fault diagnosis of lithium-ion batteries based on wavelet packet decomposition and manhattan average distance[J]. Int J Green Energy. 2024:1–15.
- 37.. Lee K, Sakamoto J. Li stripping behavior of anode‐free solid‐state batteries under intermittent‐current discharge conditions[J]. Adv Energy Mater. 2024;14(17):2303571.
- 38.. Zhang C, Luo L, Yang Z, et al. Flexible method for estimating the state of health of lithium-ion batteries using partial charging segments[J]. Energy. 2024;295:131009.
- 39.. Zhou J, Wang S, Cao W, et al. State of health prediction of lithium-ion batteries based on SSA optimized hybrid neural network model[J]. Electrochimica Acta. 2024;487:144146.
- 40.. Ma Z, Luan YX, Zhang FQ, et al. A data-driven energy management strategy for plug-in hybrid electric buses considering vehicle mass uncertainty[J]. J Energy Storage. 2024;77:109963.
- 41.. Şen M, Özcan M, Eker YR. Fuzzy logic-based energy management system for regenerative braking of electric vehicles with hybrid energy storage system [J]. Appl Sci. 2024;14(7):3077.
- 42.. Shuai Q, Wang Y, Jiang Z, et al. Reinforcement learning-based energy management for fuel cell electrical vehicles considering fuel cell degradation[J]. Energies. 2024;17(7):1586.
- 43.. Larijani MR, Kia SH, Zolghadri M, et al. Linear parameter-varying model predictive control for intelligent energy management in battery/supercapacitor electric vehicles[J]. IEEE Access. 2024.
- 44.. He L, Tong B, Zhang Y, et al. Temperature rise characteristics research of integrated electric drive system for vehicles[J]. JEET. 2024:1–13.
- 45.. Tekin M, Karamangil Mİ. Comparative analysis of equivalent circuit battery models for electric vehicle battery management systems[J]. J Energy Storage. 2024;86:111327.
- 46.. Abideen FZ, Khalid HA, Khan MS, et al. Direct model predictive control of fuel cell and ultra-capacitor based hybrid electric vehicle[J]. IEEE Access. 2024.
- 47.. Nie J, Chen C, Wei C. Adaptive fuzzy energy management strategy for range‐extended electric vehicles integrated with deep learning[J]. Energy Sci Eng. 2024;12(5):2164–79.
- 48.. Sayed MA, Ghafouri M, Atallah R, et al. Grid chaos: an uncertainty-conscious robust dynamic EV load-altering attack strategy on power grid stability[J]. Appl Energy. 2024;363:122972.
- 49.. Wang J, Zhang Z, Guo D. Torque vectoring and multi-mode driving of electric vehicles with a novel dual-motor coupling electric drive system[J]. Automot Innov. 2024;7(2):236–47.
- 50.. Deng Z, Xu L, Liu H. Prognostics of battery capacity based on charging data and data-driven methods for on-road vehicles[J]. Appl Energy. 2023;339:120954.