Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks

Jian-Dong Yao; Wen-Bin Hao; Zhi-Gao Meng; Bo Xie; Jian-Hua Chen; Jia-Qi Wei

doi:10.1016/j.jnlest.2024.100290

Journal of Electronic Science and Technology, Volume. 23, Issue 1, 100290(2025)

Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks

Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng^*, Bo Xie, Jian-Hua Chen, and Jia-Qi Wei

Author Affiliations

State Grid Sichuan Electric Power Company Chengdu Power Supply Company, Chengdu, 610041, China

show less

Figures & Tables(14)

Fig. 1. Interaction dynamics between the DSO and VPPs within the MAMDP framework.

Download full size

View in Article

Fig. 2. Average cumulative reward for all agents across training episodes: MARL learning curve (above) and zoomed-in view of final 5000 episodes (below).

Download full size

View in Article

Fig. 3. DSO’s pricing and net demand over a week.

Download full size

View in Article

Fig. 4. Temporal dynamics of key state variables in the VPP network over a representative week.

Download full size

View in Article

Fig. 5. Computational time and solution quality as the number of VPPs increases from 10 to 200.

Download full size

View in Article

Fig. 6. System’s performance over a 30-day period following a permanent 15% reduction in average renewable generation capacity.

Download full size

View in Article

Table 1. MARL and system parameters.

View table

View in Article

Table 1. MARL and system parameters.

Parameter category	Parameter name	Value	Description
MARL hyperparameter	Actor learning rate	1×10^–4	Learning rate for actor network updates
	Critic learning rate	5×10^–4	Learning rate for critic network updates
	Discount factor (γ)	0.99	Discount factor for future rewards
	Exploration rate (ε)	0.1	Initial exploration rate for epsilon-greedy policy
	Replay buffer size	1×10⁶	Capacity of experience replay buffer
	Batch size	256	Number of samples per training iteration
Network architecture	Actor network	[64, 32]	Hidden layer sizes for the actor network
Network architecture	Critic network	[128, 64]	Hidden layer sizes for the critic network
System parameters	Number of VPPs	10	Total number of VPPs in the network
	Simulation time steps	8760	Number of hourly time steps (1 year)
	Battery capacity	1000 kWh	Energy storage capacity per VPP
	Renewable generation limit	500 kW	Maximum renewable generation capacity per VPP
	Grid frequency limits	[49.8, 50.2] Hz	Allowable range for grid frequency

Table 2. Economic efficiency comparison.
View table
View in Article
Table 2. Economic efficiency comparison.
Model Reduction in costs (%) Increase in VPP profits (%)
MARL (ours) 18.73 22.46
Stackelberg game 12.58 15.29
MPC 14.92 17.81
SARL 16.05 19.37

Table 3. Computational performance comparison.
View table
View in Article
Table 3. Computational performance comparison.
Model Convergence time (h) Scalability (max VPPs)
MARL (ours) 8.64 127
Stackelberg game 2.31 43
MPC 5.17 76
SARL 11.89 92

Table 4. Adaptability score (0–100).
View table
View in Article
Table 4. Adaptability score (0–100).
Model Scenario changes Unexpected events Overall score
MARL (ours) 89.27 83.15 86.21
Stackelberg game 62.43 58.79 60.61
MPC 75.68 71.92 73.80
SARL 81.36 76.54 78.95

Table 5. Sensitivity analysis results (percentage change in system performance).
View table
View in Article
Table 5. Sensitivity analysis results (percentage change in system performance).
Parameter –50% –25% Base +25% +50%
Number of VPPs –8.73 –3.42 0 2.91 4.68
Renewable energy penetration –12.56 –5.87 0 7.23 11.95
Price volatility 5.32 2.14 0 –2.89 –6.71

Table 6. VPP resource utilization (percentage of capacity).
View table
View in Article
Table 6. VPP resource utilization (percentage of capacity).
Condition Battery Flexible load Renewable curtailment
High demand 78.4% 89.2% 2.3%
Low demand 34.6% 12.7% 15.8%
High renewable 82.1% 8.9% 7.5%
Low renewable 45.3% 67.8% 0.1%

Table 7. System performance under unexpected events (percentage deviation from normal operations).

View table

View in Article

Table 7. System performance under unexpected events (percentage deviation from normal operations).

Event type	Metric	MARL	Stackelberg	MPC	SARL
Renewable drop (30%)	Cost increase	8.37%	14.62%	11.28%	10.05%
	Stability index	–3.21%	–7.89%	–5.43%	–4.76%
	Recovery time (h)	2.34	4.81	3.67	3.12
Demand spike (25%)	Cost increase	6.93%	12.37%	9.84%	8.51%
	Stability index	–2.78%	–6.42%	–4.95%	–3.89%
	Recovery time (h)	1.87	3.95	2.83	2.41
Price forecast error (20%)	Cost increase	4.52%	9.76%	7.31%	6.18%
	Stability index	–1.43%	–4.28%	–3.12%	–2.35%
	Recovery time (h)	1.26	2.73	2.05	1.68

Table 8. System response to renewable generation forecast errors.
View table
View in Article
Table 8. System response to renewable generation forecast errors.
Forecast error MARL cost increase MARL stability index MARL recovery time (h)
5% 1.28% –0.54% 0.37
10% 2.76% –1.19% 0.83
15% 4.65% –2.03% 1.42
20% 6.93% –3.11% 2.18
25% 9.87% –4.46% 3.09

Tools

Get Citation

Copy Citation Text

Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng, Bo Xie, Jian-Hua Chen, Jia-Qi Wei. Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks[J]. Journal of Electronic Science and Technology, 2025, 23(1): 100290

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites