Zum Inhalt springen

Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids

Prometheus Redaktion

Abstract Large-scale renewable energy integration introduces random power fluctuations into microgrids, increasing the difficulty of frequency regulation. To improve regulation stability and training efficiency, this article proposes sample selection generative adversarial networks (SSGANs) based on sample selection networks (SSNs), conditional generative adversarial networks (CGANs), and the actor–critic framework. First, the SSNs are trained to evaluate sample information values and prioritize informative samples for model training. Second, the CGANs learn the conditional mapping between microgrid operating states and control actions, and the pretrained generator is transferred into the actor–critic framework as the actor. Third, the actor–critic framework further optimizes the control policy online to generate real-time frequency regulation commands. The proposed method is tested on a standard two-area system and further validated on a complex four-area system. Case studies show that SSGANs achieve faster convergence and better frequency regulation performance than typical control algorithms. 1. Introduction Large-scale renewable energy, such as wind and solar power, has been integrated into microgrids to reduce environmental pollution [ 1]. However, its random fluctuations increase the difficulty of frequency regulation. Meanwhile, massive and complex microgrid operational data make it difficult for intelligent control methods to efficiently select useful samples [ 2]. Therefore, this study aims to design an intelligent frequency regulation strategy for informative sample selection and accurate control action generation. Automatic generation control (AGC) is widely used to maintain power balance and suppress frequency deviations in microgrids [ 3]. Generally, AGC includes two main processes [ 4]: first, generation commands are obtained using control strategies [ 4]; second, the total generation command is then dispatched to individual generation units [ 5]. However, conventional AGC strategies usually depend on fixed control structures, making it difficult to adapt to renewable energy fluctuations and changing operating conditions [ 6]. Therefore, reinforcement-learning-based smart generation control (SGC) has been introduced to improve the adaptability of microgrid frequency regulation [ 7]. For instance, Q-learning adjusts control policies through continuous interaction with uncertain operating environments, which helps enhance microgrid stability and adaptability [ 8]. Nevertheless, conventional reinforcement learning suffers from the curse of dimensionality as microgrid complexity increases [ 9]. To address the above deficiencies, this paper proposes sample selection generative adversarial networks (SSGANs) based on conditional generative adversarial networks (CGANs) [ 25], sample selection networks (SSNs), and the actor–critic framework [ 26]. Specifically, CGANs are introduced to learn the conditional mapping between microgrid operating states and control actions, thereby improving state-dependent action prediction [ 27]. Meanwhile, SSNs are constructed based on bidirectional long short-term memory (BiLSTM) [ 28] to capture temporal correlations in electricity data and select samples with high information values, which improves training efficiency. Then, the pretrained generator of CGANs is integrated into the actor–critic framework as the actor for online policy optimization. The key contributions of this article are as follows. (1) The SSGANs introduce SSNs to evaluate sample information values and prioritize informative samples, thereby improving training efficiency. (2) The SSGANs use CGANs to learn the state-conditioned mapping between microgrid operating states and control actions, thereby improving action generation quality. (3) The SSGANs integrate the pretrained generator into the actor–critic framework as the actor, enabling online policy optimization for intelligent frequency regulation of microgrids. 2.1. Sample Selection Networks With the continuous connection of new energy to microgrids, the power data becomes more complicated [ 29]. Samples with high reward values provide more useful information for training, while low-value samples dominate the dataset and increase training cost. Therefore, BiLSTM-based sample selection networks (SSNs) are developed to identify high information value samples from the training dataset and improve training efficiency. Figure 1 shows the structure of the SSNs. Similarly to prioritized experience replay [ 30], the SSNs perform reward-based sample prioritization to select important samples from the training sample set S = x i , y i , which includes microgrid samples x i and their corresponding labels y i . Meanwhile, a validation set is defined as E = x i , y i , where y i denotes the known label of validation sample x i . The forecast validation reward is defined as R E , S , s k = ∑ x i , y i ∈ E log p ( y i ∣ x i , s k , S e ) (1) where x i denotes the input sample, y i denotes the known validation label, s k denotes the control state of the SSNs after querying k tags, and S e denotes the current labeled training subset. During SSN training, the label of the selected input sample is queried, and the SSNs update their state from s k − 1 to s k . The prediction of the SSNs is determined by the selected sample and its queried label, which are related to S e and y i , respectively. The ideal training objective of SSNs is max π E S , E ∼ F E π S , T ∑ i = 1 T R E , S , s i (2) where T denotes the maximum number of queried labels; S , E ∼ F indicates that the training set and validation set are sampled from distribution F ; π S , T denotes the sequential sample selection policy over T steps. For an unlabeled candidate sample x j , its true label is unavailable before annotation. Therefore, the current SSNs first predict a pseudo-label y ^ j for x j . The pair ( x j | y ^ j ) is then virtually added to the current labeled subset S e to estimate its potential contribution. Accordingly, the information value of the unlabeled candidate is defined as the predicted increase in the validation reward after this hypothetical update, i.e., I j = R E , S e ∪ x j , y ^ j , s k + 1 − R E , S e , s k (3) where I j denotes the information value of candidate sample x j ; s k + 1 denotes the updated SSNs state after virtually including ( x j | y ^ j ) . For a candidate batch with N samples, the mean information value is calculated as I ପ୍ତ = 1 N ∑ j = 1 N I j (4) where N denotes the number of candidate samples in the batch. I j is used as the adaptive threshold for sample classification. If I j > I ପ୍ତ , candidate sample x j is regarded as a high-information-value sample; otherwise, it is regarded as a low-information-value sample. In this way, the importance of each unlabeled candidate can be estimated before its true label is queried. Samples with larger I j are prioritized for subsequent training, thereby improving training efficiency. 2.2. Conditional Generative Adversarial Networks The actor–critic framework obtains actions through policy learning. However, in complex microgrid environments, policy exploration may become inefficient and costly, making it difficult to find effective control actions [ 31]. To address this issue, CGANs are introduced to provide state-conditioned action prediction so that action generation is guided by the current operating state rather than relying only on exploratory policy updates. GANs consist of a generator G and a discriminator D, which are trained in an adversarial manner. The generator converts a latent noise vector z sampled from a prior distribution into synthetic data, while the discriminator estimates whether an input sample comes from the real dataset or from G. During training, D improves its ability to distinguish real and generated samples, whereas G learns to produce samples that are difficult to identify as fake. This adversarial learning process can be described by the following minimax optimization problem : min G max D V D , G = E x ~ ρ data log D x + E z ~ ρ noise log ( 1 − D G ( z ) (5) The CGANs, as an extension of GANs, introduce conditional information d into both the generator and discriminator. In this study, the CGANs introduce the current microgrid state s t as the conditional input. Unlike a standard GAN that learns the marginal distribution of control actions, the CGANs learn the conditional action distribution under a given operating state. The value function of CGANs is min G max D V D , G ∣ s t = E a t + 1 ∼ p data a ∣ s t log D a t + 1 , s t + E z ∼ p z log 1 − D G z , s t , s t (6) where s t denotes the current microgrid state, a t + 1 denotes the real historical control action, z denotes the noise vector, G z , s t denotes the generated control action, and D a t + 1 , s t denotes the probability that the state–action pair comes from the real dataset. By conditioning both the generator and discriminator on s t , the CGANs can learn the state-dependent mapping from operating states to control actions. Therefore, compared with a standard GAN that only generates actions from noise, the CGANs can generate control actions more consistent with the current operating condition, thereby improving action forecasting accuracy. The structure of CGANs is shown in Figure 2. 2.3. Sample Selection Generative Adversarial Networks The SSGANs consist of three main components: SSNs, CGANs, and the actor–critic framework. The SSNs evaluate the information value of candidate samples and divide them into high-information-value samples and low-information-value samples. The selected high-information-value samples are used for CGAN pretraining. The CGANs learn the state-conditioned mapping from microgrid operating states to control actions through adversarial learning, where the generator predicts control actions and the discriminator distinguishes real and generated state–action samples. Finally, the pretrained generator is transferred to initialize the actor network, and the actor–critic framework further updates the control policy through value evaluation. The structure of the SSGANs is shown in Figure 3. For sample selection, this paper draws on the idea of reward-based prioritization [ 32]. The information value of each candidate sample is predicted by the SSNs. Samples with information values higher than the batch mean are regarded as high-information-value samples in experience pool 1; otherwise, they are regarded as low-information-value samples in experience pool 2. In this way, SSGANs can prioritize informative samples for CGAN pretraining and reduce the influence of low-information-value samples on policy learning. After the high-information-value sample set c = c 0 , … , c t is obtained by SSNs, the pretrained generator is transferred to initialize the actor network. In the online stage, G denotes the actor network and C denotes the critic network. The discriminator used in the offline CGAN pretraining stage is not involved in the online control process. Then, the actor G c t ∣ θ G predicts the next action a t + 1 according to the current condition c t , while the critic C c t , a t ∣ θ C estimates the corresponding action–value function. Therefore, the online control process follows a standard actor–critic framework. The critic target value is calculated as y i = r i + λ C ′ c i + 1 , G ′ c i + 1 ∣ θ G ′ | θ C ′ (7) where λ is the discount factor. The critic network is trained by minimizing the mean squared Bellman error L θ C = 1 I ∑ i = 1 I y i − C c i , a i ∣ θ C 2 (8) The actor network is optimized using the deterministic policy gradient ∇ θ G J ≈ 1 I ∑ i = 1 I ∇ a C c , a ∣ θ C ∣ c = c i , a = G c i ⋅ ∇ θ G G c ∣ θ G ∣ c i (9) where i is the mini-batch sample index; I is the mini-batch size. Finally, the target networks are updated through soft updating θ C ′ = γ θ C + 1 − γ θ C ′ θ G ′ = γ θ G + 1 − γ θ G ′ (10) where γ is the soft update coefficient. 2.4. Sample Selection Generative Adversarial Networks for SGC The proposed SSGANs are applied for SGC to reduce the frequency deviation Δ f and area control error (ACE) of microgrids. To clarify the decision-making formulation, the SGC problem is formulated as a Markov decision process S , A , P , r . At time t , the state s t ∈ S includes the measurable operating variables of the microgrid, such as frequency deviation, ACE, load disturbance, and renewable power fluctuation. The action a t ∈ A denotes the continuous generation control command generated by the actor. The environment is the microgrid frequency response model under load and renewable energy disturbances. After receiving a t , the environment returns the next state s t + 1 and the immediate reward r t . In the pretraining stage, historical operating data are collected as input samples. The SSNs are first trained to evaluate sample information values and select high-information-value samples for CGAN pretraining. The CGANs then learn the state-conditioned mapping between microgrid operating states and control actions, and the pretrained generator is transferred to initialize the actor network. After pretraining, the parameters of the SSNs are fixed and are not jointly updated with actor–critic networks. Although offline training is more computationally stable, it cannot exploit the online learning capability of the actor–critic framework. To improve adaptability to real-time complex disturbances, an online training stage is further introduced. The pretrained generator is transferred to initialize the actor, which is then updated under the actor–critic framework. Meanwhile, the trained SSNs with fixed parameters are used to select informative samples from the real-time experience replay buffer to support policy learning and optimization. After training, the SSGANs are deployed for online frequency control. At each control step, the current microgrid state is directly fed into the trained actor network, and the corresponding control command is generated in real time. Therefore, the CGANs learn the state-conditioned action distribution p a t + 1 ∣ s t , while the actor–critic framework further optimizes the generated actions through critic evaluation. The proposed method uses sample selection to improve training efficiency and uses adversarial pretraining to enhance action prediction, thereby supporting real-time frequency regulation of microgrids ( Figure 4 and Algorithm 1). Algorithm 1. Pseudo-code of SSGANs for SGC 1: Initialize parameters 2: for each training sample do 3: Estimate the validation reward using Equation (1) 4: Update the SSN state 5: Calculate the sample information value using Equation (3)) 6: If the predicted reward exceeds the mean, save the sample to experience pool 1; otherwise to experience pool 2 7: end for8: Pre-train the CGANs on experience pool 1 by Equation (6). 9: Transfer the pre-trained generator to initialize the online actor network 10: for t = 1… T do11: Generate action a t 12: Store samples ( s t , a t , r t , s t + 1 ) 13: Calculate the target value by Equation (7) 14: Update the critic and actor using Equation (8) and Equation (9) 15: Soft-update the target networks using Equation (10) 16: end for17: Deploy the trained actor network 18: Input the current microgrid state into the trained actor 19: Generate the control command for real-time frequency regulation 3.1. Evaluation and Reward Function To guide the controller toward stable frequency regulation, the reward function is constructed using Δ f and ACE. The weighted squared terms are used to penalize frequency fluctuation and tie-line power imbalance. The specific reward function used in the critic network only focuses on frequency deviation and ACE, which is designed as r = − ∑ k = 1 K η k ( Δ f k ) 2 + 1 − η k ACE k 2 / 1000 (14) where η k and 1 − η k are weight coefficients of Δ f k and ACE k in the k-th area, respectively. η k is set to 0.5 for all areas to balance Δ f k and ACE k . In addition, the precision, loss, and root mean square error (RMSE) are applied to evaluate the training process of SSNs and CGANs. Precision = N correct N total (15) Loss = − 1 N ∑ i = 1 N y i log y ^ i + 1 − y i log 1 − y ^ i (16) RMSE = 1 N ∑ i = 1 N ( z i − z ^ i ) 2 (17) where N correct is the number of correctly classified samples; N total is the total number of samples; y i is the true label; y ^ i is the predicted probability; z i is the real value; z ^ i is the predicted value. 3.2. Parameter Setting Table 1 lists the parameters of the SSGANs. These hyperparameters were selected based on empirical tuning and manual adjustment to balance prediction accuracy, training stability, and computational efficiency. For the SSNs, the BiLSTM consists of two LSTM layers with dimensions of 4-90-90-2. The hidden size of 90 is selected to capture temporal correlations in microgrid operating data without excessive computational cost. The dropout factor is set to 0.5 to avoid overfitting, and the learning rate is set to 0.01 to accelerate sample-selection training. For the CGANs, the generator and discriminator are DNNs with dimensions of 4-90-90-4 and 4-90-90-2, respectively. The generator output dimension corresponds to continuous control actions, and the discriminator output dimension corresponds to real/fake sample discrimination. Tanh is used in the generator output layer to bound the generated control actions, while Sigmoid is used in the discriminator output layer. ReLU is used in the hidden layers, and batch normalization [ 36] is introduced to alleviate undesirable initialization and improve adversarial training stability. For the actor–critic framework, the pretrained generator is transferred as the actor, and the critic is a DNN with dimensions of 4-90-90-1 to estimate the action value. The learning rate of both the actor and critic is set to 0.005 to maintain balanced updates. Adam is used as the optimizer, the discount factor is set to 0.99, the soft update coefficient is set to 0.005, the replay buffer size is set to 1 ୍ଠ 10 6 , the batch size is set to 128, and the total training steps are set to 10,000. These settings are kept consistent with the comparison algorithms to ensure a fair evaluation. 4. Case Studies All algorithms are evaluated in the IEEE two-area system (Case I) [ 37] and China Southern Power Grid (Case II) [ 38]. Case I, as the benchmark condition, is utilized to test the SSGANs; Case II is utilized to validate the performance of the SSGANs. 4.2. Case II Case II is a four-area interconnected power system. Compared with Case I, Case II has a larger system scale, stronger inter-area coupling, and more complex frequency regulation requirements. Therefore, it is used to further evaluate the reliability and adaptability of SSGANs in complex multi-area scenarios, as shown in Figure 13. The system parameters of Case II are listed in Table 6. The above results indicate that (i) the SSGANs can stably control more generator units and new energy units; (ii) the SSGANs are highly adaptive and robust in complicated multi-area systems. 4.3. Discussion To further evaluate the effectiveness of the proposed SSGANs, this section presents additional analyses, including ablation studies and statistical analysis, dynamic response performance, communication delay analysis, and limitations discussion. All experiments in this section are conducted in Area 1 of Case II. 4.3.1. Ablation Studies The complete SSGANs achieve the best performance across all evaluation indices ( Table 8). Removing SSNs increases Δ f by 24.4% and ACE by 9.9%, which demonstrates that the sample selection mechanism effectively prioritizes high-information-value samples and improves training efficiency. Removing CGAN pretraining leads to a larger degradation, with Δ f increasing by 31.7% and ACE by 19.1%, indicating that the pretrained generator provides a superior initialization for the actor network compared to random initialization. The standard actor–critic shows the worst performance across all indices, confirming that both components contribute meaningfully to the overall performance. Regarding computational cost, SSGANs require the longest training time (2.86 h) due to the additional SSN training and CGAN pretraining stages. However, the inference time of all methods remains comparable (approximately 0.5 s), which satisfies the real-time requirement of microgrid frequency regulation. 4.3.2. Dynamic Response Performance To further evaluate the dynamic response performance, settling time and frequency overshoot are introduced as supplementary indices, where settling time reflects the recovery speed after disturbance, and frequency overshoot reflects the maximum frequency deviation. As shown in Table 9, SSGANs achieve the best performance among all compared algorithms, with the shortest settling time of 486 s and the smallest frequency overshoot of 0.084 Hz. Compared with DDPG and SAC, the settling time of SSGANs is lower by 49.64% and 31.36%, respectively, and the frequency overshoot is lower by 42.47% and 25.66%, respectively. These results indicate that SSGANs can suppress the maximum frequency deviation more effectively and restore the system frequency to the steady-state range more rapidly after disturbance, demonstrating better dynamic regulation capability than standard DRL methods. 4.3.3. Communication Delay Analysis In practical interconnected microgrids, communication delay affects the transmission of measured states and control commands, thereby weakening the timeliness of frequency regulation. As shown in Table 10, the control performance of all algorithms degrades as the delay increases from 10 ms to 30 ms. For SSGANs, the average frequency deviation and average area control error increase by 40.82% and 34.57%, respectively. However, SSGANs still maintain the lowest values under each delay condition. Under the 30 ms delay, compared with DDPG and SAC, SSGANs reduce the average frequency deviation by 38.39% and 27.37%, respectively, indicating that SSGANs are less affected by communication delay. Frequency regulation mileage [ 8] is introduced as an auxiliary economic indicator, which is calculated by accumulating the absolute variations in adjacent control commands during the regulation process. A smaller regulation mileage indicates smoother control actions, lower regulation burden, and lower potential execution cost. As shown in Table 10, SSGANs achieve the lowest regulation mileage under different delay conditions. Under the 30 ms delay, the regulation mileage of SSGANs is 40.59% lower than DDPG and 26.99% lower than SAC, showing better regulation economy under delayed communication. 4.3.4. Statistical Significance Analysis To further verify whether the performance gains of SSGANs are statistically significant rather than caused by random initialization, paired significance analysis was conducted on the two main control indices (i.e., Δ f and ACE). Specifically, all compared methods were independently run 20 times under different random seeds, and the results obtained under the same seed were paired for comparison. For each method, the performance difference relative to SSGANs was evaluated by paired t-tests and Wilcoxon signed-rank tests. In addition, 95% bootstrap confidence intervals of the mean differences were calculated to further quantify the uncertainty of the observed improvement. The statistical results are summarized in Table 11. As shown in Table 11, the mean differences in all compared methods relative to SSGANs are positive on both Δ f and ACE, which indicates that SSGANs consistently achieve lower frequency deviation and lower area control error. Meanwhile, the p-values of both the paired t-test and the Wilcoxon signed-rank test are below 0.05 for all comparisons, and the corresponding 95% bootstrap confidence intervals do not include zero. These results demonstrate that the superiority of SSGANs over DDPG, TD3, SAC, PPO, and actor–critic GAN is statistically significant on the two main control indices. In particular, although actor–critic GAN already benefits from adversarial pretraining, SSGANs still achieve significant improvements, which further confirms that the SSN-based sample selection mechanism provides additional performance gains beyond adversarial actor initialization alone. 4.3.5. Discussion of Limitations Although the proposed SSGANs demonstrate superior performance in the case studies, several limitations should be acknowledged. (1) The training time of SSGANs is longer than that of standard deep RL methods due to the additional SSN training and CGAN pretraining stages. In applications where rapid deployment is required, this additional training overhead may become a constraint. (2) The current validation is based on simulation models, and experimental verification on physical microgrid platforms is required to further confirm the practical applicability. (3) The pretraining data is generated by a conventional PID controller, which may limit the diversity and quality of the initial training samples. Exploring more diverse data sources for pretraining could further improve the performance of SSGANs. (4) The current work mainly focuses on simulation-based control performance, while the theoretical stability analysis of the system is not considered. 5. Conclusions This paper proposes SSGANs for intelligent frequency regulation of microgrids by combining SSNs, CGANs, and the actor–critic framework. The main conclusions are as follows. (1) SSNs can evaluate sample information values and select informative samples, thereby improving sample utilization and training efficiency. (2) CGANs can learn the state-conditioned mapping between operating states and control actions, which improves action generation quality and reduces inefficient exploration. (3) By transferring the pretrained CGAN generator into the actor–critic framework, SSGANs achieve online policy optimization. Case studies show that SSGANs obtain smaller frequency deviation, lower ACE, and better dynamic response performance than the compared algorithms. In future works, the collaborative analysis of the business model and the operation cost of microgrids can be considered for SSGANs to obtain an improved control scheme. In addition, the SSGANs could modify the networks to improve the control effect, and more advanced GAN training stabilization techniques, such as Wasserstein loss and gradient penalty, could be explored. More detailed economic cost modeling, including operation cost, device degradation, and actuator wear, can also be further considered. Meanwhile, theoretical stability analysis of the system can be considered. Funding This work was financially supported by the Science and Technology Project of State Grid Sichuan Electric Power Company (Project Name: Research on Key Technologies for Dynamic Assessment and Enhancement of Frequency Support Strength of Hydro-Wind-Solar Coupled System; Project No. 52199723003H). The following abbreviations are used in this manuscript: ACE Area control error AGC Automatic generation control BiLSTM Bidirectional long short-term memory BN Batch normalization CGANs Conditional generative adversarial networks DDPG Deep deterministic policy gradient DNNs Deep neural networks GANs Generative adversarial networks IAE Integral absolute error ISE Integral squared error ITAE Integral time multiple absolute error PPO Proximal policy optimization RMSE Root mean square error SAC Soft actor–critic SGC Smart generation control SSGANs Sample selection generative adversarial networks SSNs Sample selection networks TD3 Twin delayed deep deterministic policy gradient Figure 1. Structure of SSNs. Figure 1. Structure of SSNs. Figure 2. Structure of CGANs. Figure 2. Structure of CGANs. Figure 3. Structure of SSGANs. Figure 3. Structure of SSGANs. Figure 4. Framework of SSGANs for SGC. Figure 4. Framework of SSGANs for SGC. Figure 5. IEEE two-area system. Figure 5. IEEE two-area system. Figure 6. Training curves of SSNs and CGANs: ( a) SSNs; ( b) CGANs. Figure 6. Training curves of SSNs and CGANs: ( a) SSNs; ( b) CGANs. Figure 7. Performance results of SSNs and CGANs: ( a) SSNs; ( b) CGANs. Figure 7. Performance results of SSNs and CGANs: ( a) SSNs; ( b) CGANs. Figure 8. Triangular load with Gaussian noise. Figure 8. Triangular load with Gaussian noise. Figure 9. Result of online training: ( a) curves of ∆ f; ( b) box diagram of ∆ f. Figure 9. Result of online training: ( a) curves of ∆ f; ( b) box diagram of ∆ f. Figure 10. Disturbance curve: ( a) resident load; ( b) renewable energy. Figure 10. Disturbance curve: ( a) resident load; ( b) renewable energy. Figure 11. Dynamic responses of different algorithms in Area A: ( a) ∆ f; ( b) ACE. Figure 11. Dynamic responses of different algorithms in Area A: ( a) ∆ f; ( b) ACE. Figure 12. Online operation results in Case I: ( a) Area 1; ( b) Area 2. Figure 12. Online operation results in Case I: ( a) Area 1; ( b) Area 2. Figure 13. China’s southern power grid. Figure 13. China’s southern power grid. Figure 14. Dynamic responses of different algorithms in Area 1: ( a) ∆ f; ( b) ACE. Figure 14. Dynamic responses of different algorithms in Area 1: ( a) ∆ f; ( b) ACE. Figure 15. Online operation results in Case II: ( a) Area 1; ( b) Area 2; ( c) Area 3; ( d) Area 4. Figure 15. Online operation results in Case II: ( a) Area 1; ( b) Area 2; ( c) Area 3; ( d) Area 4. Table 1. Parameters of SSGANs. Table 1. Parameters of SSGANs. Mode Layer Hidden Unit Active Function Batch Normalization Size Generator–Actor 1 4 ReLU 64 Generator–Actor 2 90 ReLU 64 Generator–Actor 3 90 ReLU 128 Generator–Actor 4 4 Tanh - Discriminator 1 4 ReLU 32 Discriminator 2 90 ReLU 32 Discriminator 3 90 ReLU 64 Discriminator 4 2 Sigmoid - Critic 1 4 ReLU 32 Critic 2 90 ReLU 32 Critic 3 90 ReLU 64 Critic 4 1 Sigmoid - BiLSTM 1 4 ReLU 64 BiLSTM 2 90 ReLU 64 BiLSTM 3 90 ReLU 128 BiLSTM 4 2 Sigmoid - Table 2. Main parameters of the comparison algorithms. Table 2. Main parameters of the comparison algorithms. Parameter DDPG TD3 SAC PPO Actor–Critic GAN Network type MLP MLP MLP MLP CGANs-Actor + Critic Hidden layers 2 2 2 2 2 Hidden units [128, 128] [128, 128] [128, 128] [128, 128] [128, 128] Activation function ReLU ReLU ReLU ReLU ReLU Optimizer Adam Adam Adam Adam Adam Learning rate ୧୍ଠ୧୦ −4/୧୍ଠ୧୦ −3୧୍ଠ୧୦ −4/୧୍ଠ୧୦ −3୩୍ଠ୧୦ −4/୩୍ଠ୧୦ −4୩୍ଠ୧୦ −4/୧୍ଠ୧୦ −3୧୍ଠ୧୦ −4/୧୍ଠ୧୦ −3Discount factor 0.99 0.99 0.99 0.99 0.99 Batch size 128 128 128 128 128 Training steps 10,000 10,000 10,000 10,000 10,000 Table 3. Specific parameter settings of the comparison algorithms. Table 3. Specific parameter settings of the comparison algorithms. Algorithms Specific Settings DDPG Replay buffer size = 1×10 6; soft update coefficient = 0.005; Gaussian exploration noise linearly decayed from 0.20 to 0.05 TD3 Replay buffer size = 1×10 6; soft update coefficient = 0.005; policy delay = 2; target policy smoothing noise = 0.20 SAC Replay buffer size = 1×10 6; soft update coefficient = 0.005; entropy coefficient automatically tuned PPO Rollout length = 2048; clip ratio = 0.20; GAE parameter = 0.95; update epochs per batch = 10 actor–critic GAN Adversarial pretraining; discriminator hidden units = [128, 128]; replay buffer size = 1×10 6 Table 4. Parameters of Case I. Table 4. Parameters of Case I. Symbol Parameter Value TgA, TgBGovernor time constant 0.08 s TtA, TtBTurbine time constant 0.3 s TpA, TpBFrequency response time constant 20 s BA, BBPrimary frequency bias coefficient 4166 Hz/p.u. KA, KBFrequency response coefficient 0.00012 Hz/p.u. RA, RBSecondary frequency deviation coefficient 0.0047 TABTime constant of the tie-line 3.42 s Table 5. Evaluation indices of different algorithms in Case I. Table 5. Evaluation indices of different algorithms in Case I. Area Algorithm Δ f ପ୍ତ (Hz) A C E ପ୍ତ (MW) ISE IAE ITAE (×10 7) DDPG 0.0106 84.7183 14.3926 884.6251 7.4368 PPO 0.0094 76.2854 12.8567 803.4927 6.7815 Area 1 TD3 0.0078 63.9142 10.3275 674.3186 5.6247 SAC 0.0076 62.4879 10.0843 661.7352 5.4819 actor–critic GAN 0.0056 51.2685 6.7429 503.6148 4.2176 SSGANs 0.0031 43.8267 4.0185 356.2943 3.0268 DDPG 0.0103 82.9641 13.8754 852.7365 7.1642 PPO 0.0091 74.6385 12.3948 781.4693 6.5147 Area 2 TD3 0.0075 61.8257 9.8756 642.5871 5.3195 SAC 0.0074 60.7462 9.6423 631.4268 5.2064 actor–critic GAN 0.0053 48.9254 6.2851 472.8639 3.9257 SSGANs 0.0029 41.5376 3.7824 341.7592 2.8463 Table 6. Parameters of Case II. Table 6. Parameters of Case II. Area Tg (s) Tt (s) Tp (s) B (Hz/p.u.) K (Hz/p.u.) R Area 1 0.08 0.3 20 4166 0.00012 0.0047 Area 2 0.08 0.3 20 3850 0.00012 0.0050 Area 3 0.08 0.3 20 3500 0.00012 0.0052 Area 4 0.08 0.3 20 3700 0.00012 0.0048 Table 7. Evaluation indices of different algorithms in Case II. Table 7. Evaluation indices of different algorithms in Case II. Area Algorithm Δ f ପ୍ତ (Hz) A C E ପ୍ତ (MW) ISE IAE ITAE (×10 7) Area 1 DDPG 0.0069 141.3285 7.9352 574.2168 4.9826 PPO 0.0066 136.5942 7.3627 552.4873 4.7765 TD3 0.0062 129.4736 6.8241 524.9586 4.5487 SAC 0.0061 127.8365 6.6758 517.3924 4.4823 actor–critic GAN 0.0055 120.6847 5.9826 487.2645 4.1568 SSGANs 0.0041 109.5738 5.3154 437.8256 3.8427 Area 2 DDPG 0.0071 145.8264 8.2865 591.7438 5.1254 PPO 0.0068 140.9573 7.6942 568.3157 4.9086 TD3 0.0064 133.6285 7.1056 540.8624 4.6635 SAC 0.0063 132.0746 6.9584 533.7426 4.5982 actor–critic GAN 0.0057 124.8639 6.2418 496.5284 4.2786 Area 3 SSGANs 0.0042 113.4927 5.5863 501.2468 3.9724 DDPG 0.0072 148.3657 8.5243 604.3852 5.2847 PPO 0.0069 143.2185 7.9286 581.7463 5.0625 TD3 0.0065 136.4728 7.3462 554.3286 4.8264 SAC 0.0064 134.8651 7.1845 547.1369 4.7583 actor–critic GAN 0.0058 127.5264 6.4867 513.7942 4.4128 Area 4 SSGANs 0.0043 116.3846 5.7825 462.8165 4.4657 DDPG 0.0070 143.7926 8.1568 584.9275 5.0642 PPO 0.0067 138.5264 7.5483 560.3841 4.8427 TD3 0.0063 131.8472 6.9765 532.9476 4.6128 SAC 0.0062 130.2857 6.8247 525.6183 4.5461 actor–critic GAN 0.0056 122.9465 6.1564 492.3857 4.2184 SSGANs 0.0041 111.8365 6.2489 444.7268 3.9146 Table 8. Evaluation indices of ablation studies. Table 8. Evaluation indices of ablation studies. Algorithm Δ f ପ୍ତ (Hz) A C E ପ୍ତ (MW) ISE IAE ITAE (×10 7) Training Time (h) Computing Time (s) SSGANs 0.0041 109.5738 5.3154 437.8256 3.8427 2.86 0.52 SSGANs without SSNs 0.0051 120.4486 5.7134 467.1862 4.2806 2.53 0.51 SSGANs without CGANs 0.0054 130.4773 6.1776 492.4875 4.5163 1.74 0.53 Actor–Critic 0.0059 141.3985 6.6608 526.7964 4.7757 1.35 0.5 Table 9. Quantitative comparison of dynamic response performance. Table 9. Quantitative comparison of dynamic response performance. Algorithm Settling Time (s) Frequency Overshoot (Hz) DDPG 965 0.146 PPO 842 0.132 TD3 735 0.119 SAC 708 0.113 actor–critic GAN 624 0.101 SSGANs 486 0.084 Table 10. Communication delay analysis of different algorithms. Table 10. Communication delay analysis of different algorithms. Delay Algorithm Δ f ପ୍ତ (Hz) A C E ପ୍ତ (MW) ISE IAE ITAE (×10 7) Regulation Mileage (MW) 10 ms DDPG 0.0081 163.8724 9.1046 655.4382 5.7341 2117.86 PPO 0.0077 156.4826 8.3983 628.9164 5.4718 1964.27 TD3 0.0072 146.3958 7.7395 592.8731 5.1784 1786.39 SAC 0.0071 144.6287 7.5624 584.9365 5.0867 1735.42 actor–critic GAN 0.0064 135.9276 6.8137 551.4628 4.7825 1548.73 SSGANs 0.0049 123.6845 5.9276 476.3184 4.2865 1328.54 20 ms DDPG 0.0094 188.5367 10.5864 756.8291 6.6237 2468.15 PPO 0.0089 179.4825 9.7246 718.5372 6.2984 2296.43 TD3 0.0082 165.7934 8.9185 666.4829 5.8925 2075.86 SAC 0.0081 163.2846 8.7348 657.2396 5.7862 2014.38 actor–critic GAN 0.0073 153.8462 7.8463 613.9584 5.4176 1796.27 30 ms SSGANs 0.0058 141.9273 6.7824 538.6427 4.9738 1517.69 DDPG 0.0112 221.6845 12.6482 897.4163 7.8654 2942.68 PPO 0.0105 209.7538 11.5237 846.5284 7.4326 2734.15 TD3 0.0097 191.4827 10.4526 778.3495 6.9148 2468.72 SAC 0.0095 188.9364 10.2185 765.2841 6.7845 2394.56 actor–critic GAN 0.0086 176.5482 9.1487 701.6283 6.2857 2142.83 SSGANs 0.0069 166.4386 7.9465 627.8153 5.8462 1748.35 Table 11. Paired significance analysis of SSGANs on the two main control indices. Table 11. Paired significance analysis of SSGANs on the two main control indices. Algorithm Metric Mean Difference Paired t-Test p-Value Wilcoxon p-Value 95% Bootstrap CI DDPG Δ f ପ୍ତ (Hz) 0.0013 <0.001 <0.001 [0.0010, 0.0016] DDPG A C E ପ୍ତ (MW) 22.7099 <0.001 <0.001 [18.72, 26.91] TD3 Δ f ପ୍ତ (Hz) 0.0009 0.001 0.001 [0.0006, 0.0011] TD3 A C E ପ୍ତ (MW) 15.0188 0.001 0.001 [11.03, 18.76] SAC Δ f ପ୍ତ (Hz) 0.0007 0.002 0.002 [0.0004, 0.0009] SAC A C E ପ୍ତ (MW) 9.7874 0.002 0.002 [6.21, 12.94] PPO Δ f ପ୍ତ (Hz) 0.0011 <0.001 <0.001 [0.0008, 0.0014] PPO A C E ପ୍ତ (MW) 18.1640 <0.001 0.001 [13.47, 21.82] actor–critic GAN Δ f ପ୍ତ (Hz) 0.0004 0.018 0.015 [0.0001, 0.0006] actor–critic GAN A C E ପ୍ତ (MW) 4.5995 0.014 0.012 [1.72, 7.33] Ye, X.; Ouyang, X.; Chen, B.; Wang, X.; Zhu, T.; Yang, K.; Chen, R. Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes 2026, 14, 1872. https://doi.org/10.3390/pr14121872 Ye X, Ouyang X, Chen B, Wang X, Zhu T, Yang K, Chen R. Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes. 2026; 14(12):1872. https://doi.org/10.3390/pr14121872 Ye, Xi, Xuetong Ouyang, Baorui Chen, Xi Wang, Tong Zhu, Kai Yang, and Runzhi Chen. 2026. "Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids" Processes 14, no. 12: 1872. https://doi.org/10.3390/pr14121872 Ye, X., Ouyang, X., Chen, B., Wang, X., Zhu, T., Yang, K., & Chen, R. (2026). Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes, 14(12), 1872. https://doi.org/10.3390/pr14121872

www.mdpi.com

Zum Originalartikel