Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Reinforcement learning algorithm for improving speed response of a five-phase permanent magnet synchronous motor based model predictive control

  • Ahmed M. Hassan ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

    ahmed.hassanin@feng.bu.edu.eg

    Affiliations Department of Electrical Power and Machines Engineering, Faculty of Engineering, Benha University, Shoubra, Cairo, Egypt, Department of Electrical Power and Machines Engineering, Higher Institute of Engineering (HIE), El-Shorouk Academy, El-Shorouk City, Egypt

  • Jafar Ababneh,

    Roles Funding acquisition, Investigation, Visualization

    Affiliation Cyber Security Department, Faculty of Information Technology, Zarqa University, Zarqa, Jordan

  • Hani Attar,

    Roles Formal analysis, Funding acquisition, Visualization

    Affiliations Faculty of Engineering, Zarqa University, Zarqa, Jordan, College of Engineering, University of Business and Technology, Jeddah, Saudi Arabia

  • Tamer Shamseldin,

    Roles Funding acquisition, Investigation, Visualization

    Affiliation Technical Research Center, Cairo, Egypt

  • Ahmed Abdelbaset,

    Roles Investigation, Resources, Visualization

    Affiliation Department of Electrical Power and Machines Engineering, Higher Institute of Engineering (HIE), El-Shorouk Academy, El-Shorouk City, Egypt

  • Mohamed Eladly Metwally

    Roles Formal analysis, Methodology, Software, Visualization, Writing – review & editing

    Affiliation Department of Electrical Power and Machines Engineering, Higher Institute of Engineering (HIE), El-Shorouk Academy, El-Shorouk City, Egypt

Abstract

Enhancing the performance of 5ph-IPMSM control plays a crucial role in advancing various innovative applications such as electric vehicles. This paper proposes a new reinforcement learning (RL) control algorithm based twin-delayed deep deterministic policy gradient (TD3) algorithm to tune two cascaded PI controllers in a five-phase interior permanent magnet synchronous motor (5ph-IPMSM) drive system based model predictive control (MPC). The main purpose of the control methodology is to optimize the 5ph-IPMSM speed response either in constant torque region or constant power region. The speed responses obtained using RL control algorithm are compared with those obtained using four of the most recent metaheuristic optimization techniques (MHOT) which are Transit Search (TS), Honey Badger Algorithm (HBA), Dwarf Mongoose (DM), and Dandelion-Optimizer (DO) optimization techniques. The speed response are compared in terms of the settling time, rise time, maximum time and maximum overshoot percentage. It is found that the suggested RL based TD3 give minimum settling time and relatively low values for the rise time, max time and overshoot percentage which makes the RL provide superior speed responses compared with those obtained from the four MHOT. The drive system speed responses are obtained in the constant torque region and constant power region using MATLAB SIMULINK package.

1. Introduction

IPMSMs are highly suitable for high-performance drive systems due to their various advantages. These include exceptional performance, high efficiency, compact size, low noise, and reliability [13]. IPMSMs are widely used in various applications, with robots and electric vehicles being among the most common [2, 3]. Due to their inveterate advantages, multiphase motor drives are gaining prominence as a viable alternative to conventional three-phase motor drives. These advantages encompass minimized torque pulsations, heightened power density, and enhanced capability for fault-tolerant [4, 5]. MPC is an increasingly prominent control method for drive systems, demonstrating superior performance and optimization [6]. Enhancing the performance of 5ph-IPMSM control plays a crucial role in advancing various innovative applications. In the drive systems, the tunning of PI controllers affects the speed response of the motor. The tuning is usually achieved by metahuristic techniques. Recently, RL are used instead to tune the PI controller [7] in DC drive system because it is an excellent multi-objective optimization technique due to its strong search capabilities and rapid convergence rate. To the best knowledge of the authors, no existing literature has investigated the optimization of the speed response of five-phase IPMSM based MPC using RL based TD3 algorithm across a wide speed range. Several authors presented several methods of control. These publications are explored as follows:

In [8], a harmonic elimination method utilizing vector control was introduced for the five-phase IPMSM. This method, known as harmonic elimination SVM, operates on the principles of space vector theory. In [9], a study was conducted on a 5ph-PMSM VC. A SVM algorithm was proposed to achieve high-performance VC in the drive system. Similarly, [10] introduced a VC strategy for a five-phase VSI utilizing SVM that powers a 5ph-PMSM. This approach optimized the performance of the 5ph-PMSM drive system. Furthermore, in [11], a mathematical model for the five-phase PMSM and corresponding control algorithms were presented. The modeling was achieved using coordinate transformation, and both the two-VC algorithm and four-VC algorithm SVPWMs were investigated. Additionally, a modified four-VC algorithm was introduced to modify the 3rd harmonic current, thereby improving the motor’s torque performance. In [12], a speed sensorless DTC method was proposed for five-phase IPMSM. This method relies on measuring the per-phase currents and the voltage of the DC bus. Additionally, [13] introduced a model reference adaptive control, incorporating a neural network, for a five-phase IPMSM. Combined with a hysteresis current controller, this controller enables motor speed control across a wide range of speeds. Furthermore, [14] presented a sensorless control methodology based on 3rd harmonic space for a five-phase PMSM, eliminating the need for motor parameters throughout the entire speed range. In [15], a super-twisting SMC was proposed to control a five-phase PMSM. This control technique exhibits superior response and performance compared to vector control in various conditions. DTMPC of a 5ph-PMSM drive system was presented in [16]. This control strategy effectively reduces the ripples of torque associated with conventional DTC, leading to reduced harmonics with high-orders and losses of the system. A PCC strategy based on a model of finite control set was introduced for a 5-ph PMSM in [17]. This approach reduces the THD to 9.47% and eliminates third harmonic currents. In [18], a MPDTC method was proposed based on the QEM method and a HVE method. The THD was reduced to 11.54% by employing this technique while disregarding third harmonic currents. [19] introduced a DTMPC technique for a five-phase PMSM. This approach aims to optimize torque development, improve speed regulation, reduce the ripples in torque, reduce current harmonics having higher orders, decrease losses in the drive system, and increase the power train’s efficiency. In [20], a MPTC technique based on double virtual voltage vectors utilizing geometric principles was presented for controlling a five-phase PMSM. This method achieves reduced processing time without the need for weighting factors. Additionally, [21] proposed an MPTC technique with additional weighting factors, reducing current harmonics and torque ripple. The THD was decreased to 7.11% using this method while disregarding third harmonic currents. Reference number [22] introduced a model-free PCC methodology. This methodology was based on a model called an ultra-local and the outputs of the motor for five-phase PMSM drives. This strategy mitigates the impact of motor parameter variations on current predictions. Moreover, [23] presented a MPCC methodology for a five-phase PMPSM, utilizing pre-selection for the voltage vector. The purpose of this methodology is to reduce computation time compared to conventional MPC by selecting the optimal voltage vector based on deviations in stator current and changes in the position of the stator current vector.

In [24], an ANN-based MPC strategy was suggested to control the speed of a 3-phase IPMSM. This approach utilized a predictive algorithm of back propagation (BP) network and MPC. The optimal selection of the gains of a PI controller in a VC three-phase PMSM drive system was addressed in [25]. Optimization algorithms called RGA and BBO were employed for this purpose. It was observed that using the BBO algorithm exhibited superior transient as well as steady-state performance in the PMSM drive system. In [26], a three-phase surface-mounted PMSM was controlled using an adaptive ANN internal model utilizing the PSO algorithm. Various optimization techniques were compared in [27] to obtain the optimal selection of PI controller gains for a three-phase PMSM drive system. A neural network-based MPC technique to reguate the speed of a 3-phase IPMSM was presented in [28]. Similarly, in [29], a control technique utilizing a BP ANN algorithm was given for a 3-phase IPMSM. Sensorless control of surface-mounted PMSM was achieved in [30] using an adaptive speed observer and a PID controller. In [31], a control technique for a three-phase IPMSM utilizing a MPC based on ANN was introduced. [32, 33] presented ANN-based MPC techniques for a three-phase IPMSM to overcome the effects of parameter mismatches. A dual-vector-based Particle Swarm Optimization MPC technique for a three-phase IPMSM was introduced in [34].

In [35] the reinforcement Q–learning algorithm was presented to tune fuzzy PD and PI controllers for SISO and TITO systems. The deep RL (DRL) was used to improve the tuning process for classical PID controller was presented in [36]. The algorithm that was used in RL is the DDPG. The concept of RL based FOC of a three-phase induction motor was presented in [37]. In [38] the design of adaptive RL PID controller under the structure of Actor-Critic which was based on RBF network for nonlinear systems was presented. Online training for a RL used to control real motor drive system was presented in [39]. The adopted drive system was composed of three-phase PMSM fed by VSI. In [40] DRL method for speed control of a three-phase PMSM servo drive system was presented. The presented DRL control improved the system performance especially in case of load variations. An adaptive PID controller using the algorithm of asynchronous advantage actor–critic was proposed in [41]. In [42] speed control of a PMSM is achieved by applying the PI based PSO and DDPG algorithms. In [43] a combined approach that leverages DRL and MPC was introduced to enhance the efficiency of electric vehicles. In [44] an open source toolbox called GEM was developed for training of RL-agents for the controlling of electric motors. The improved control performance for a three-phase PMSM was proposed in [45]. This improvement was achieved using four optimization techniques and RL. The used optimization techniques were PSO, SA, GA, and GWO. An RL control algorithm to control a three-phase PMSM based on the TD3 was suggested in [46]. In [47], an adaptive PID controller was designed for controlling the speed of a DC motor using RL based TD3 algorithm. In [48], a comparison was made between the conventional PID methodology and a TD3 RL algorithm to control three-phase PMSM based on the strategy of vector control (VC). In [7], researchers employed the TD3 method to learn PI controller for optimal dynamics in the simulation environment of controlling the speed of a DC motor. Energy management strategies for hybrid electric vehicles (HEVs) was dealt with in [49]. A hierarchical architecture that combines RL algorithms was proposed. The DDPG algorithm demonstrates superior performance in the energy management of HEVs. In [50], an intelligent system for energy management for a conventional autonomous vehicle using RL was proposed. A novel exploration strategy called self-adaptive Q-learning was introduced.

In this study, we suggest a novel RL control algorithm utilizing the TD3 approach. The algorithm is designed to fine-tune two cascaded PI controllers in a 5-phase voltage source inverter (VSI) / 5ph-IPMSM based MPC drive system. The primary goal of this control methodology is the optimization of the speed response of the 5ph-IPMSM under various operating conditions. We compare the speed responses obtained using the RL control algorithm with those achieved using four recent MHOT: TS, HBA, DM, and DO. The speed response are compared in terms of the settling time, rise time, maximum time and maximum overshoot percentage.

The following points summarize the main contributions of this paper:

  • A new RL control algorithm utilizing the TD3 algorithm for tuninig the two cascaded PI controllers in the drive system under consideration is proposed.
  • The tuning is achieved using the most recent optimization techniques: TS, HBA, DM, and DO.
  • A comparative study is achieved among the suggested RL control methodology and the MHOT adopted in this research for the drive system under consideration.
  • A MATLAB SIMULINK is accomplished for the drive system under consideration to obtain the results of simulation and verify the validity of the proposed control methodology.

The subsequent sections of this study are arranged as follows. In Section 2 the modeling of the drive system is introduced. Explanation of the MPC is given in Section 3. MHOT are explained briefly in Section 4. In Section 5, the suggested RL utilizing TD3 algorothm is explained. In Section 6, the proposed control methodology of the drive system is explored. In Section 7 simulation results are given. In Section 8, key conclusions drawn from our study are summarized.

2. Modeling of the drive system

2.1 Modeling of the five-phase VSI

The 5-phase VSI is expressed using the per-phase voltages (va to ve), which depend on the inverter switching functions [19] (1)

In Eq (1), Vdc represents the DC voltage supplied to the inverter. The switching functions, denoted as Sa to Se, correspond to the various states of the inverter. Specifically, a switching function equals one when the upper semiconductor switch in a leg is active, and it equals zero when the lower switch is inactive in that leg.

In a five-phase VSI, there exist 32 possible switching states. The MPC determines the most suitable switching state to obtain minimum cost function, and based on these states, gate pulses are generated for the ten switches.

2.2 Model of the 5ph-IPMSM

The 5ph-IPMSM is described using the DQ model in the synchronous frame of reference. The voltage ABCDE to DQ transformations are provided by references [19, 51]: (2) where α = 2π/5, θ is the rotor position angle, vd1, vq1 are the fundamental stator voltages in the DQ frame of reference, vd3, vq3 are the third-harmonic DQ components of stator voltage.

The transformation from the DQ frame of reference to the ABCDE frame of reference can be represented as follows [19]: (3) where id1, iq1 are the fundamental stator currents DQ components, id3, iq3 are the third-harmonic DQ components of stator currents. The DQ voltage equations for the 5ph-IPMSM, after removing the zero-sequence component, can be expressed as follows [19, 51]: (4) where Rs is the stator resistance, ω is the motor speed in electrical rad/sec, λd1, λq1 are the fundamental DQ stator flux linkages and λd3, λq3 are the 3rd harmonic DQ stator flux linkages. The DQ stator flux linkages take the following form: (5) where Ld1, Lq1 are the fundamental direct and quadrature self-inductances, Lm13 is the mutual inductance and λ1m and λ3m are the fundamental and 3rd harmonic components of the rotor PM flux linkages respectively.

The 5ph-IPMSM differential equation, Eq (4) can be rearranged to take the following formula: (6) where D is the operator d/dt, , ,

The motor torque equation can be represented as [19, 51]: (7)

Substitution of Eq (5) into Eq (7) gives [19]: (8) where p is the poles total number. When considering motor speed variations during the transient time interval, Eq (6) becomes nonlinear. As a result, the 5ph-IPMSM currents need to be numerically solved. To achieve this, we employ the 5ph-IPMSM mechanical equation, which is expressed as: (9)

In the given equation, the motor speed is denoted by ωm in mechanical radians per second, J denotes the inertia, and Tl(ωm) is defined as follows: (10)

Where TL and Tfw are the load and friction and windage torques respectively.

2.3 Five-phase IPMSM maximum torque per ampere operating mode of operation model

To maximize efficiency, the 5ph-IPMSM operates at maximum torque per ampere, particularly for speeds up to the motor’s rated speed. This section derives the reference fundamental and 3rd harmonic DQ currents components.

When Lm13 is disregarded, the torque equation for the fundamental component can be formulated using Eq (8) as follows: (11)

The reference fundamental direct current component is derived by differentiating the fundamental torque equation, Eq (11), w.r.t. the fundamental direct current and setting the result to zero, i.e. . Consequently, the reference fundamental direct current component is determined as shown in the following equation: (12)

The reference fundamental quadrature current component can be derived from Eq (11) as follows: (13)

The reference 3rd harmonic direct current component can be obtained from [52]: (14)

Also, the reference 3rd harmonic quadrature current component can be obtained from [52]: (15)

2.4 Five-phase IPMSM field weakening operating mode model

To extend the operating speed range beyond the rated speed, the 5ph-IPMSM will be operated in field weakening mode. The equations for the reference fundamental and 3rd harmonic DQ currents components are derived as follows.

The fundamental direct and quadrature steady-state voltages can be obtained from Eq (4) and neglecting the stator resistance voltage drops, we have:

The fundamental direct and quadrature steady-state voltages can be derived from Eq (4). By neglecting the stator resistance voltage drops, we obtain the following equations: (16)

By substituting λd1 and λq1 from Eq (5) into Eq (16) and ignoring Lm13, we obtain: (17)

The reference fundamental direct and quadrature components should satisfy the following equation to guarantee the maximum fundamental voltage, Vm1, of the IPMSM:

The reference fundamental direct and quadrature components must satisfy the following equation to ensure the maximum fundamental voltage, Vm1, of the 5ph-IPMSM: (18)

By substituting Eq (17) into Eq (18) and solving for the fundamental direct current component, we obtain: (19)

In this mode of operation, equations analogous to Eqs. (13), (14), and (15), expressed in terms of the fundamental direct current component provided in Eq (19), are utilized.

3. Model predictive control technique

The MPC consists of two primary components: the modelof the plant and the optimizer. The core concept of MPC is to choose the optimal sequence of inputs for the plant utilizing predictions of its future behavior. These predictions are achieved using the modelof the plant, which employs previous states to predict future states. At each discrete interval, the optimizer leverages the predicted states and the desired trajectory to address the optimization problem over the prediction horizon, thereby identifying the optimal set of inputs for future operations.

To successfully implement MPC, it is crucial to discretize the motor model. Thus, Eq (6) is converted into its discrete form using the Forward Euler approximation method. Consequently, the discrete model for the 5-phase IPMSM can be represented as follows: (20)

In this context, Ts denotes the sampling interval of the discretized system.

The MPC is specifically designed to minimize torque error as its primary objective. Since the electromagnetic torque equation for a five-phase IPMSM involves contributions from both the d-q axes currents, it is necessary to control these currents to reduce the error in torque. Consequently, the problem can be simplified by minimizing the error in current error instead of torque error. Therefore, the basic CF aimed at reducing current error can be formulated as follows:

The primary objective of the MPC is to reduce torque error. Given that the electromagnetic torque equation for a 5ph-IPMSM includes contributions from both the d-q axes currents, controlling these currents is essential to decrease torque error. Thus, the problem can be made simpler by focusing on reducing current error instead of torque error. Therefore, the basic cost function (CF) aimed at reducing current error can be formulated as follows: (21)

Where id1r, iq1r, id3r, iq3r represent the reference direct and quadrature fundamental and third harmonic currents. These currents are determined using either Eqs. (12), (13), (14), and (15) for motor operation in the region of constant torque, or Eqs. (19), (13), (14), and (15) motor operatin in the region of constant power. The optimal inverter switching functions are chosen based on the minimum cost function.

4. Metahuristic optimization techniques

4.1 PI controller

The PI (proportional-integral) controller is commonly utilized in various industrial applications because of its simplicity, ease of implementation, and robust performance. Achieving optimal performance with a PI controller involves fine-tuning two key parameters: the proportional gain (kp) and the integral gain (ki). To optimize the performance of PI controllers, researchers have employed various optimization algorithms. In this study, four of the most recent optimization techniques are used for obtaining the optimum PI gains. These techniques are Transit Search (TS) [53], Honey Badger Algorithm (HBA) [54], Dwarf Mongoose (DM) [55], and Dandelion-Optimizer (DO) [56]. These techniques are compared with the RL utilizing TD3 algorithm. The goal is to minimize the error associated with the control of speed of a 5-phase IPMSM, to ensure the best possible performance.

The optimization process involves reduccing the error e(t), which is generated by comparing the reference speed with the actual speed, as well as the reference torque and actual torque, using four standard performance indicators: IAE, ISE, ITAE, and ITSE. This optimization is achieved through the utilization of the following equation, Eq (22), that accurately represents the superior results obtained with the proposed control technique.

(22)

The optimization objective involves minimizing the steady-state time response (tss) and the error function e(t).

4.2 Objective

The optimization problem involves the simultaneous pursuit of two distinct objectives, allowing for a comprehensive optimization strategy tailored to address the specific aims of each target. The optimization focuses on minimizing the speed and torque errors of the individual controllers by fine-tuning their respective gain parameters.

There are two PI controllers—one for the regulation of speed and another for controlling the torque. Each controller has two adjustable gains, the proportional and integerals gains (kp and ki). Optimization must determine the optimal values for these four gain parameters, which are constrained within the range of [0, 300] for each gain. By optimizing these controller gains, the goal is to achieve high precision in both speed and torque regulation, thereby enhancing the overall system performance. The optimization strategy must navigate this multi-objective landscape to identify the set of gain values that best satisfies the conflicting targets of minimizing both speed and torque errors simultaneously.

(23)

5. Reinforcement learning based TD3 approach

The TD3 approach is an advancement over the DDPG algorithm. TD3 addresses the function approximation error that can occur in DDPG. The TD3 algorithm significantly enhances both the learning speed and performance of DDPG across various challenging continuous control tasks. The algorithm of TD3 outperforms many state-of-the-art methods. Due to the simplicity of TD3 modifications, they can be easily integrated into any other actor-critic algorithm [57, 58]. In addition to this, the TD3 algorithm learns two Q-value functions and uses the minimum estimate during policy updates.

It does this by using twin critic networks (CN), delayed target network updates, and added exploration noise, which together help to stabilize the process of training and improve the efficiency of the learned policy. The goal of RL is to obtain the optimum policy π that makes the predectid rewared to have maximum value which is achieved by tuning the parameter. This is typically achieved by renewing the parameter by employing the gradient ∇φJ(φ) RL employs an actor-critic structure, where the policy (actor) is renewed according to the DPG algorithm. In case of large value of state space, the Q-value function Q(s, a) is nearly determined using a function approximator Qθ’(s, a) with the aiding of a tuning parameter θ. To maintain a stable learning objective, a frozen target network Qθ’(s, a) is used to maintain a stable learning goal y across many updates [57].

(24)

Actions are selected by the algorithm from a desired actor-network (AN) πφ(s) that is separated from the main AN. The weights of the target network are periodically renewed to accurately approximate the weights of the current network using a soft update rule [57]. This helps maintain a stable learning objective during the training process.

In actor-critic (A-C) methods, the current and desired networks may be very similar, leading to inaccurate value prediction. I order to address this, the algorithm uses a set of two actors and critics [57]. The actors are optimized according to their respective critics, but this can cause overestimation. To mitigate this, a trimmed double Q-learning approach is used, that selects the minimum of the two critic estimates as the target update [58]. The AN and CN are then updated according to this formulation.

(25)

The algorithm uses a reward value r and a discount factor γ which determines the influence of previous reward values on next decisions. The value of discount factor γ ranges from 0 to 1 [57, 59]. To address the issue of deterministic policies overfitting sharp peaks in the value approximation, the algorithm adds a tiny amount of random noise to the desired policy, which is then trimmed to keep the objective within a limited range [57].

(26)

The TD3 approach is presented as flowchart in Fig 1, as proposed by [57, 60]. Table 1 lists some of the key parameters used in the TD3 algorithm.

The block diagram depicted in Fig 2 illustrates the RL agent utilizing the TD3 algorithm for process control. In this context, the goal of obtaining the optimum policy for the RL-TD3 agent may be interpreted as determining the appropriate command signals for the 5ph-PMSM in order to ehance a given effectiveness metric.

The simulation environment of Matlab Simulink is utilized to implement the TD3 algorithm. The objective is to train an optimum two-stage PI controller for regulating the speed and torque of a 5ph-IPMSM in the control system setup. The process of learning generates the optimum tuning parameters for the two PI controllers, which can effectively address the regulation challenges simulated in the system environment.

This setup can be viewed as analogous to the control of an industrial process, where the observed inputs are the speed error and its integral, as well as the torque, as shown in Fig 3 while the outputs represent the reference signals, and the enhancement objective is framed as a reward function.

thumbnail
Fig 3. Matlabsimulink block diagram for the TD3 approach.

https://doi.org/10.1371/journal.pone.0316326.g003

The reward function is defined by the subsequent expression: (27)

The weight coefficients wc1,wc2, and wc3 are used to balance the reduction of the reference error and the control signal value. The error signals e1(t),e2(t) represent the observed states from the environment of the control system, and the control signal u(t) corresponds to the action of the actor.

The optimal control reward equation can be represented by: (28)

Fig 4 shows the Matlab Simulink block diagram of reward equation, Eq (28). In this figure the weights, Wc1,Wc2 and Wc3 are taken to be 0.9, 0.9 and 0.1 respectively [7].

thumbnail
Fig 4. Matlab Simulink block diagram of the reward equation.

https://doi.org/10.1371/journal.pone.0316326.g004

The AN and CN, with their corresponding reference networks, were set up in the MATLAB environment to achieve RL agents using the TD3 approach. The CN utilize the action (a) and observed state (s) as inputs, and function as an approximation to approximate the quality value Q(s, a). The arrangement of CNs utilized in the drive system under consideration is given in Fig 5.

thumbnail
Fig 5. The architecture of CNs utilized in the drive system under consideration.

https://doi.org/10.1371/journal.pone.0316326.g005

The CNs consist of three layers:

  1. fully connected (fc) input layers for the action and state inputs,
  2. a fc common path layer,
  3. a fc output layer with one neuron representing Q(s, a).

The input and common path layers use ReLU activation. The actor networks take the observed state (s) as input and output is the action (a). The TD3 algorithm trains the AN and CN to enhance the reward, which is the optimum control goal [61].

The gains of the PI controllers ki1, kP1, ki2 and kp2 can be found using the Matlab functions: actor = getActor (agent) and parameters = getLearnableParameters (actor).

6. Proposed control methodology

Fig 6 illustrates the proposed control system, which comprises several blocks. The rotor position is measured at a specific load torque and reference speed of the 5ph-IPMSM to determine the actual motor speed, as shown in Fig 6. The speed error is processed by a primary PI controller to generate the reference torque. The motor currents of the 5-ph IPMSM are measured and transformed into the DQ frame of reference using the “abcde to d1q1d3q3 Transformation” block in Fig 6. This block represents Eq (2). The resulting fundamental and 3rd direct and quadrature currents are used to calculate the electromagnetic torque via the “Torque Calculation” block, which corresponds to Eq (8).

The gains of the primary and secondary PI controllers (ki1, kP1, ki2, and kp2) are determined using one of four metaheuristic optimization techniques (TS, HBA, DM, and DO) or the presented RL-based TD3 algorithm. The torque error is processed by the secondary PI controller, “PI Controller-02,” to obtain the corrected reference torque, Tref. The motor speed, ωm, and Tref are used to derive the reference DQ currents from the “MTPA & FW” block in Fig 6. This block represents Eqs (12), (13), (14), and (15) for MTPA operation when the motor speed is not greater than the rated speed, and Eqs (19), (13), (14), and (15) for FW operation when the motor speed exceeds the rated speed.

The reference fundamental and 3rd harmonic DQ currents, along with the rotor position angle and DC voltage, are used in the MPC, Eq (20), to generate the inverter gating signals that satisfy the cost function given in Eq (21) to minimize torque ripples and thus minimizing ripples in the motor speed.

7. Results

Several sets of results are obtained to prove the correctness of the suggested control methodology for the 5ph-IPMSM. The 5ph-IPMSM parameters are given in Table 2 [35].

The following set of results are obtained when the motor is driving a constant load torque whose value equals to the motor rated torque, 63.662 Nm, and the reference speed is 1600 rpm. In this operating condition the motor is operated at maximum torque per amper. Figs 710 show the speed responses of the 5ph-IPMSM when the PI controllers gaines are obtained using the metahuristic optimization techniques (TS, HBA, DM, and DO) and the presented RL based TD3 algorithm when the four types of error critera (IAE, ISE, ITAE and ITSE) are used in the optimization techniques respectively. From these figures it can be noticed that the speed response of the 5ph-IPMSM when the presented RL based TD3 algorithm is used, to obtain PI controllers gain, has the fastes time responce and relatively low overshoot.

thumbnail
Fig 7. The motor speed responces when IAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g007

thumbnail
Fig 8. The motor speed responces when ISE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g008

thumbnail
Fig 9. The motor speed responces when ITAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g009

thumbnail
Fig 10. The motor speed responces when ITSE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g010

Table 3 summarizes the values of the PI controllers gains, settling time, rise time, maximium time and overshoot percentage for the different speed responces shown in Figs 710 obtained using the presented RL based TD3 algorithm and the different optimization techniques with the different error criteria. From this table, it can be noniced that the suggested RL based TD3 results in the minimum settling time and relatively low values for the rise time, max time and overshoot percentage. This proves the correctness of the suggested control methodology in improving the motor 5p-IPMSM speed response in the constant torque region.

thumbnail
Table 3. Summary of the comparison between the RL based TD3 and the metahuristic optimization techniques at speed 1600 rpm.

https://doi.org/10.1371/journal.pone.0316326.t003

To show the correctness of the presented control methodology in case of sudden change in the reference speed, the above results are obtained when the desired speed is suddenly varied from 1600 rpm to 1200 rpm. Figs 1114 show the speed responses for the sudden change in the motor reference speed using the metahuristic optimization techniques and the presented RL based TD3 algorithm for the four types of error critera. From these figures it can be noticed that the speed response of the 5ph-IPMSM when the presented RL based TD3 algorithm is used, also, has the fastes time responce and relatively low overshoot.

thumbnail
Fig 11. The motor speed responces when IAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g011

thumbnail
Fig 12. The motor speed responces when ISE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g012

thumbnail
Fig 13. The motor speed responces when ITAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g013

thumbnail
Fig 14. The motor speed responces when ITSE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g014

Another set of results are obtained when the motor is driving a constant load torque whose value equals to 35.8 Nm, and the reference speed is 3200 rpm. In this operating condition the motor is operated in field weakening mode of operation, i.e constant power region. Figs 1518 show the speed responses of the 5ph-IPMSM when the PI controllers gaines are given using the metahuristic optimization techniques and the presented RL based TD3 approach when the four types of error critera are used in the optimization techniques. From these figures it can be noticed that the speed response of the 5ph-IPMSM when the presented RL based TD3 algorithm is used, to obtain PI controllers gain, has the fastes time responce and relatively low overshoot.

thumbnail
Fig 15. The motor speed responces when IAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g015

thumbnail
Fig 16. The motor speed responces when ISE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g016

thumbnail
Fig 17. The motor speed responces when ITAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g017

thumbnail
Fig 18. The motor speed responces when ITSE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g018

Table 4 summarizes the values of the PI controllers gains, settling time, rise time, maximium time and overshoot percentage for the different speed responces shown in Figs 1518 obtained using the presented RL based TD3 algorithm and the different optimization techniques with the different error criteria when the motor is operated in constant power region. It can be noniced from Table 4 that the proposed RL based TD3 gives the minimum values for the settling time and maximum time and relatively low values for the rise time and overshoot percentage. This again proves the correctness of the suggested control methodology in improving the motor 5p-IPMSM speed response in the constant power region.

thumbnail
Table 4. Summary of the comparison between the RL based TD3 and the metahuristic optimization techniques at speed 3200 rpm.

https://doi.org/10.1371/journal.pone.0316326.t004

The above results are obtained when the desired speed is suddenly varied from 3200 rpm at 35 Nm load torque to 4000 rpm at 28.6 Nm load torque, constant power region. Figs 1922 show the speed responses for the sudden change in the motor reference speed with the corresponding load torque utilizing the metahuristic optimization techniques and the presented RL based TD3 algorithm for the four types of error critera. From these figures it can be noticed that the speed response of the 5ph-IPMSM when the presented RL based TD3 algorithm is used, also, has the superior speed responce.

thumbnail
Fig 19. The motor speed responces when IAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g019

thumbnail
Fig 20. The motor speed responces when ISE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g020

thumbnail
Fig 21. The motor speed responces when ITAE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g021

thumbnail
Fig 22. The motor speed responces when ITSE error criteria is used.

https://doi.org/10.1371/journal.pone.0316326.g022

8. Conclusion

In this study, a newl RL control algorithm based TD3 algorithm is suggested to obtain gains of two cascaded PI controllers in a 5ph-IPMSM drive system. The purpose of this algorithm is to optimize the 5ph-IPMSM speed response either in constant torque region or constant power region. The PI controllers gains are also obtained using four recent metahuristic optimization techniques to be compared with the proposed algorithm. The most recent metaheuristic optimization techniques used are Transit Search, Honey Badger Algorithm, Dwarf Mongoose, and Dandelion-Optimizer optimization techniques. MATLAB Simulink package is utilized to obtain simulation results to validate the propsed algorithm. It can be concluded from the results that the suggested control RL algorithm based TD3 algorithm results in improved motor speed response compared with the metahuristic optimization techniques with the fastest response and with relatively lower overshoot either when the 5ph-IPMSM is operated either in the constant torque region or the constant power region. However, the MHOTs are more easily to be implemented and have more simple computations compared with the RL. As a future work, the experimental implementation of the proposed methodology to be achieved. In addition to this, the proposed control methodology of the drive system can be investigated with the utilization in the electric veihicles, robotics and renewable energy systems.

References

  1. 1. Khanh1 PQ, Anh HPH. Hybrid Optimal Fuzzy Jaya Technique for Advanced PMSM Driving Control. Electrical Engineering. 2023; 105, 3629–3646. https://doi.org/10.1007/s00202-023-01911-6.
  2. 2. Mercorelli P. Control of Permanent Magnet Synchronous Motors for Track Applications. Electronics. 2023; 12 (15), 1–20. https://doi.org/10.3390/electronics12153285.
  3. 3. Boumegouas MKB, Ilten E, Kouzi1 K, Demirtas M, M’hamed B. Application of a Novel Synergetic Observer for PMSM in Electrical Vehicle. Electrical Engineering; 2024. https://doi.org/10.1007/s00202-024-02297-9
  4. 4. Long L, Sun T and Liang J. Five Phase Permanent Magnet Synchronous Motor Decoupled Model with Dual Frame Frequency Adaptive Flux Observer. Elsevier, Energy Reports. 2020; 6, 1403–1408. https://doi.org/10.1016/j.egyr.2020.11.010.
  5. 5. Tian B, Lu R and Hu J. Single Line/Phase Open Fault-Tolerant Decoupling Control of a Five-Phase Permanent Magnet Synchronous Motor under Different Stator Connections. Energies. 2022; 15 (9), 1–18. https://doi.org/10.3390/en15093366.
  6. 6. Dharmasena S, Choi S. Model Predictive Control of Five-Phase Permanent Magnet Assisted Synchronous Reluctance Motor. 2019 IEEE Applied Power Electronics Conference and Exposition (APEC), Anaheim, CA, USA. 2019; 1885–1890.
  7. 7. Tufenkci S, Alagoz BB, Kavuran G, Yeroglu C, Herencsar N, Mahata S. A Theoretical Demonstration for Reinforcement Learning of PI Control Dynamics for Optimal Speed Control of DC Motors by using Twin Delay Deep Deterministic Policy Gradient Algorithm. Expert Systems With Applications. 2023; 213, 1–16. https://doi.org/10.1016/j.eswa.2022.119192.
  8. 8. Yu F, Zhang X and Wang S. Five-phase Permanent Magnet Synchronous Motor Vector Control Based on Harmonic Eliminating Space Vector Modulation. 2005 International Conference on Electrical Machines and Systems, Nanjing, China. 2005; 1, 392–396.
  9. 9. Hosseyni A, Trabelsi R, Mimouni MF and Iqbal A. Vector controlled five-phase permanent magnet synchronous motor drive. 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE), Istanbul, Turkey. 2014; 2122–2127.
  10. 10. Kamel T, Abdelkader D and Said B. Vector Control of Five-Phase Permanent Magnet Synchronous Motor Drive. 2015 4th International Conference on Electrical Engineering (ICEE), Boumerdes, Algeria. 2015; 1–4.
  11. 11. Zong ZL, Wang K and Zhang JY. Control Strategy of Five-Phase PMSM Utilizing Third Harmonic Current to Improve Output Torque. 2017 Chinese Automation Congress (CAC), Jinan, China. 2017; 6112–6117.
  12. 12. Parsa L and Toliyat HA. Sensorless Direct Torque Control of Five-phase Interior Permanent Magnet Motor Drives. IEEE Transactions on Industry Applications. 2007; 43 (4), 2007. https://doi.org/10.1016/j.jestch.2021.02.004.
  13. 13. Guo L, and Parsa L. Model Reference Adaptive Control of Five-phase IPM Motors Based on Neural Network. 2011 IEEE International Electric Machines & Drives Conference (IEMDC), Niagara Falls, ON, Canada. 2011; 563–568.
  14. 14. Li J, Du B, Zhao T, Cheng Y, and Cui S. Sensorless Control of Five-Phase Permanent-Magnet Synchronous Motor Based on Third-Harmonic Space. IEEE Transactions on Industrial Electronics. 2022; 69 (8), 7685–7695.
  15. 15. Mehedi F, Nezli L, Mahmoudi MO and Taleb R. Robust Speed Sontrol of Five-Phase Permanent Magnet Synchronous Motor using Super-Twisting Sliding Mode Control. J. Ren. Energies. 2017; 20 (4), 649–657.
  16. 16. Cao B, Grainger BM, Wang X, Zou Y, Reed GR and Mao ZH. Direct Torque Model Predictive Control of a Poly-Phase Permanent Magnet Synchronous Motor with Current Harmonic Suppression and Loss Reduction. 2018 IEEE Applied Power Electronics Conference and Exposition (APEC), San Antonio, TX, USA. 2018; 2460–2464.
  17. 17. Han X, Wang H, Zhang Z, Fan Y and Wang W. An Improved Finite Control Set Model Predictive Current Control Strategy for Five-Phase PMSMs. 2019 4th Asia Conference on Power and Electrical Engineering, ACPEE 2019, IOP Conf. Series: Materials Science and Engineering. 2019; 486, 1–6.
  18. 18. Li G, Hu J, Li Y, and Zhu J. An Improved Model Predictive Direct Torque Control Strategy for Reducing Harmonic Currents and Torque Ripples of Five-Phase Permanent Magnet Synchronous Motors. IEEE Transactions on Industrial Electronics. 2019; 66 (8), 5820–5829.
  19. 19. Cao B, Grainger BM, Wang X, Zou Y, Reed G, and Mao ZH. Direct Torque Model Predictive Control of a Five-Phase Permanent Magnet Synchronous Motor. IEEE Transactions on Power Electronics. 2021; 36 (2), 2346–2360.
  20. 20. Zhao W, Wang H, Tao T, and Xu D. Model Predictive Torque Control of Five-Phase PMSM by Using Double Virtual Voltage Vectors Based on Geometric Principle. IEEE Transactions on Transportation Electrification. 2021; 7 (4), 2635–2644.
  21. 21. Bahar ST, Omar RG. Torque Ripple Alleviation of a Five-Phase Permanent Magnet Synchronous Motor using Predictive Torque Control Method. International Journal of Power Electronics and Drive Systems (IJPEDS). 2022; 13 (4), 2207–2215.
  22. 22. Huang W, Huang Y and Xu D. Model-Free Predictive Current Control of Five-Phase PMSM Drives. Electronics. 2023; 12 (23), 1–16. https://doi.org/10.3390/electronics12234848.
  23. 23. Rajanikanth P, Parvathy ML and Thippiripati VK. Enhanced Model Predictive Current Control-Based Five-Phase PMSM Drive. IEEE Journal of Emerging and Selected Topics in Power Electronics. 2024; 12 (1), 838–848.
  24. 24. Guo B, Xia C and Han JF. NN-Based Model Predictive Direct Speed Control of PMSM Drive Systems. 2014 International Conference on Machine Learning and Cybernetics, Lanzhou. 2014; 163–168.
  25. 25. Kannan R, Gayathriy N, Natarajan M, Sankarkumary RS, Iyerz LV and Karz NC. Selection of PI Controller Tuning Parameters for Speed Control of PMSM using Biogeography based Optimization Algorithm. 2016 IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES), Trivandrum, India. 2016; 1–6.
  26. 26. Frijet Z, Zribi A and Chtourou M. Adaptive Neural Network Internal Model Control for PMSM Speed Regulation. Journal of Electrical Systems. 2018; 14 (2), 118–126.
  27. 27. Santos JLT, Mejia OA, Sanchez EP and Cortez RS. Parameter Tuning of PI Control for Speed Regulation of a PMSM Using Bio-Inspired Algorithms. Algorithms. 2019; 12 (3), 1–21. https://doi.org/10.3390/a12030054.
  28. 28. Venkatesan S, Kamaraj P, Priya MV. Speed Control of Permanent Magnet Synchronous Motor using Neural Network Model Predictive Control. Journal of Energy Systems. 2020; 4 (2), 71–82.
  29. 29. Lin Y, Hu H, Chang Y and Wei H. Permanent Magnet Synchronous Motor Vector Control Based on BP Neural Network. Journal of Physics: Conference Series, The 5th International Conference on Mechanical, Electric, and Industrial Engineering (MEIE 2022) 24/05/2022–26/05/2022 Online. 2022; 2369, 1–7.
  30. 30. Liu H, Zhang H, Zhang H, Chen G. Model Reference Adaptive Speed Observer Control of Permanent Magnet Synchronous Motor Based on Single Neuron PID. Journal of Physics: Conference Series, 2022 5th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2022) 20/01/2022–22/01/2022, Sanya, China. 2022; 2258.
  31. 31. Mao H, Tang X and Tang H. Speed Control of PMSM Based on Neural Network Model Predictive Control. Transactions of the Institute of Measurement and Control. 2022; 44 (14), 2781–2794. https://doi.org/10.1177/014233122210862.
  32. 32. Nguyen TT, Tran HN, Nguyen TH, and Jeon JW. Recurrent Neural Network-Based Robust Adaptive Model Predictive Speed Control for PMSM With Parameter Mismatch. IEEE Transactions on Industrial Electronics. 2023; 70 (6), 6219–6228.
  33. 33. Li H, Liu Z, Shao J. A Model Predictive Current Control Based on Adeline Neural Network for PMSM. Journal of Electrical Engineering & Technology. 2023; 18, 953–960. https://doi.org/10.1007/s42835-022-01324-8.
  34. 34. Wang B, Zhu J, Li Z, and Li Y. An Optimization Algorithm Used in PMSM Podel Predictive Control. Institute of Electronics, Information and Communication Engineers. 2024; 21 (2), 1–6. https://doi.org/10.1587/elex.20.20230444.
  35. 35. Boubertakh H, Tadjine M, Glorennec PY, Labiod S. Tuning Fuzzy PD and PI Controllers using Reinforcement Learning. ISA Transactions. 2010; 49 (4), 543–551. pmid:20605021
  36. 36. Qin Y, Zhang W, Shi J, Liu J. Improve PID Controller Through Reinforcement Learning. 2018 IEEE CSAA Guidance, Navigation and Control Conference (CGNCC), Xiamen, China. 2018; 1–6.
  37. 37. Kushwaha A, Gopal M. Reinforcement Learning-Based Controller for Field-Oriented Control of Induction Machine. In: Bansal J., Das K., Nagar A., Deep K., Ojha A. (eds) Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, Springer, Singapore. 2019; 817. https://doi.org/10.1007/978-981-13-1595-4_58.
  38. 38. Guan Z and Yamamoto T. Design of a Reinforcement Learning PID controller. 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK. 2020; 1–6.
  39. 39. Book G, Traue A, Balakrishna P, Brosch A, Schenke M, et al. Transferring Online Reinforcement Learning for Electric Motor Control From Simulation to Real-World Experiments. IEEE Open Journal of Power Electronics. 2021; 2, 187–201.
  40. 40. Song Z, Yang J, Mei X, Tao T and Xu M. Deep Reinforcement Learning for Permanent Magnet Synchronous Motor Speed Control Systems. Neural Computing and Applications. 2021; 33, 5409–5418. https://doi.org/10.1007/s00521-020-05352-1.
  41. 41. Sun Q, Du C, Duan Y, Ren H and Li H. Design and Application of Adaptive PID Controller Based on Asynchronous Advantage Actor–Critic Learning Method. Wireless Networks. 2021; 27, 3537–3547. https://doi.org/10.1007/s11276-019-02225-x.
  42. 42. Wang CS, Guo CW, Tsay DM and Perng JW. PMSM Speed Control Based on Particle Swarm Optimization and Deep Deterministic Policy Gradient under Load Disturbance. Machines. 2021; 9 (12), 1–19. https://doi.org/10.3390/machines9120343.
  43. 43. Yeom K. Model Predictive Control and Deep Reinforcement Learning Based Energy Efficient Eco-driving for Battery Electric Vehicles. Energy Reports. 2022; 8 (12), 34–42. https://doi.org/10.1016/j.egyr.2022.10.040.
  44. 44. Traue A, Book B, Kirchgassner W, and Wallscheid O. Toward a Reinforcement Learning Environment Toolbox for Intelligent Electric Motor Control. IEEE Transactions on Neural Networks and Learning Systems. 2022; 33 (3), 919–928. pmid:33112755
  45. 45. Nicola M and Nicola CI. Improvement of Linear and Nonlinear Control for PMSM Using Computational Intelligence and Reinforcement Learning. Mathematics. 2022; 10 (24), -34. https://doi.org/10.3390/math10244667.
  46. 46. Yin F, Yuan X, Ma Z and Xu X. Vector Control of PMSM Using TD3 Reinforcement Learning Algorithm. Algorithms. 2023; 16 (9), 1–20. https://doi.org/10.3390/a16090404.
  47. 47. Sanjines UA. Anthony Maisincho Jivaja, Victor Asanza, Leandro L. Lorente Leyva and Diego H. Peluffo Ordóñez. Adaptive PI Controller Based on a Reinforcement Learning Algorithm for Speed Control of a DC Motor. Biomimetics. 2023: 8 (5), 1–26. https://doi.org/10.3390/biomimetics8050434.
  48. 48. Adil Najem, Moutabir A, Rafik M, Ouchatti A. Comparative Study of PMSM Control Using Reinforcement Learning and PID Control. 2023 3rd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), Mohammedia, Morocco. 2023; 1–5. DOI: https://doi.org/10.1109/IRASET57153.2023.10153024.
  49. 49. Mei P, Karimi HR, Xie H, Chen F, Huang C and Yang S. A Deep Reinforcement Learning Approach to Energy Management Control with Connected Information for Hybrid Electric Vehicles. Engineering Applications of Artificial Intelligence. 2023; 123, 1–15. https://doi.org/10.1016/j.engappai.2023.106239.
  50. 50. Fayyazi M, Abdoos M, Phan D, Golafrouz M, Jalili M, et al. Real-Time Self-Adaptive Q-Learning Controller for Energy Management of Conventional Autonomous Vehicles. Expert Systems With Applications. 2023: 222, 1–14. https://doi.org/10.1016/j.eswa.2023.119770.
  51. 51. Parsa L and Toliyat HA. Five-Phase Permanent-Magnet Motor Drives. EEE Transactions on Industry Applications. 2005; 41 (1), 30–37.
  52. 52. Parsa L, Kim N and Toliyat HA. Field Weakening Operation of High Torque Density Five-Phase Permanent Magnet Motor Drives. IEEE International Conference on Electric Machines and Drives, 2005., San Antonio, TX, USA; 2005; 1507–1512.
  53. 53. Mirrashid M and Naderpour H. Transit Search: An Optimization Algorithm Based on Exoplanet Exploration. Results in Control and Optimization. 2022; 7, 1–37. https://doi.org/10.1016/j.rico.2022.100127.
  54. 54. Hashim FA, Houssein EH, Hussainc K, Mabroukd MS and Al-Atabany W. Honey Badger Algorithm: New metaheuristic algorithm for solving optimization problems. Mathematics and Computers in Simulation. 2022; 192, 84–110. https://doi.org/10.1016/j.matcom.2021.08.013.
  55. 55. Agushaka JO, Absalom E. Ezugwu AE, and Abualigah L. Dwarf mongoose optimization algorithm. Computer Methods in Applied Mechanics and Engineering. 2022; 391, 1–38. https://doi.org/10.1016/j.cma.2022.114570.
  56. 56. Zhao S, Zhang T, Shilin M and Chen M. Dandelion Optimizer: A Nature-Inspired Metaheuristic Algorithm for Engineering Applications. Engineering Applications of Artificial Intelligence. 2022; 114, 1–20. https://doi.org/10.1016/j.engappai.2022.105075.
  57. 57. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T and et al. Continuous Control with Deep Reinforcement Learning. 4th International Conference on Learning Representations (ICLR). 2016; 1–14. https://doi.org/10.48550/arXiv.1509.02971.
  58. 58. Fujimoto S, Hoof H and Meger D. Addressing Function Approximation Error in Actor-Critic Methods. Proceedings of the 35th International Conference on Machine Learning, PMLR. 2018; 80, 1587–1596. https://doi.org/10.48550/arXiv.1802.09477
  59. 59. Silver D, Lever G, Heess N, Degris T, Wierstra D and Riedmiller M. Deterministic Policy Gradient Algorithms. 31st International Conference on Machine Learning, ICML. 2014; 1, 605–619. https://dl.acm.org/doi/10.5555/3044805.3044850.
  60. 60. Dankwa S, and Zheng W. Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent. ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing. 2019; 66, 1–5. https://dl.acm.org/doi/abs/10.1145/3387168.3387199.
  61. 61. Sucar LE, Morales EF and Jesse Hoey J. Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions. (Book) IGI Global, 1st edition; 2011. https://doi.org/10.4018/978-1-60960-165-2