Journal of Electronic Science and Technology, Volume. 23, Issue 1, 100290(2025)

Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks

Jian-Dong Yao... Wen-Bin Hao, Zhi-Gao Meng*, Bo Xie, Jian-Hua Chen and Jia-Qi Wei |Show fewer author(s)
Author Affiliations
  • State Grid Sichuan Electric Power Company Chengdu Power Supply Company, Chengdu, 610041, China
  • show less
    Figures & Tables(14)
    Interaction dynamics between the DSO and VPPs within the MAMDP framework.
    Average cumulative reward for all agents across training episodes: MARL learning curve (above) and zoomed-in view of final 5000 episodes (below).
    DSO’s pricing and net demand over a week.
    Temporal dynamics of key state variables in the VPP network over a representative week.
    Computational time and solution quality as the number of VPPs increases from 10 to 200.
    System’s performance over a 30-day period following a permanent 15% reduction in average renewable generation capacity.
    • Table 1. MARL and system parameters.

      View table
      View in Article

      Table 1. MARL and system parameters.

      Parameter categoryParameter nameValueDescription
      MARL hyperparameterActor learning rate1×10–4Learning rate for actor network updates
      Critic learning rate5×10–4Learning rate for critic network updates
      Discount factor (γ)0.99Discount factor for future rewards
      Exploration rate (ε)0.1Initial exploration rate for epsilon-greedy policy
      Replay buffer size1×106Capacity of experience replay buffer
      Batch size256Number of samples per training iteration
      Network architectureActor network[64, 32]Hidden layer sizes for the actor network
      Critic network[128, 64]Hidden layer sizes for the critic network
      System parametersNumber of VPPs10Total number of VPPs in the network
      Simulation time steps8760Number of hourly time steps (1 year)
      Battery capacity1000 kWhEnergy storage capacity per VPP
      Renewable generation limit500 kWMaximum renewable generation capacity per VPP
      Grid frequency limits[49.8, 50.2] HzAllowable range for grid frequency
    • Table 2. Economic efficiency comparison.

      View table
      View in Article

      Table 2. Economic efficiency comparison.

      ModelReduction in costs (%)Increase in VPP profits (%)
      MARL (ours)18.7322.46
      Stackelberg game12.5815.29
      MPC14.9217.81
      SARL16.0519.37
    • Table 3. Computational performance comparison.

      View table
      View in Article

      Table 3. Computational performance comparison.

      ModelConvergence time (h)Scalability (max VPPs)
      MARL (ours)8.64127
      Stackelberg game2.3143
      MPC5.1776
      SARL11.8992
    • Table 4. Adaptability score (0–100).

      View table
      View in Article

      Table 4. Adaptability score (0–100).

      ModelScenario changesUnexpected eventsOverall score
      MARL (ours)89.2783.1586.21
      Stackelberg game62.4358.7960.61
      MPC75.6871.9273.80
      SARL81.3676.5478.95
    • Table 5. Sensitivity analysis results (percentage change in system performance).

      View table
      View in Article

      Table 5. Sensitivity analysis results (percentage change in system performance).

      Parameter–50%–25%Base+25%+50%
      Number of VPPs–8.73–3.4202.914.68
      Renewable energy penetration–12.56–5.8707.2311.95
      Price volatility5.322.140–2.89–6.71
    • Table 6. VPP resource utilization (percentage of capacity).

      View table
      View in Article

      Table 6. VPP resource utilization (percentage of capacity).

      ConditionBatteryFlexible loadRenewable curtailment
      High demand78.4%89.2%2.3%
      Low demand34.6%12.7%15.8%
      High renewable82.1%8.9%7.5%
      Low renewable45.3%67.8%0.1%
    • Table 7. System performance under unexpected events (percentage deviation from normal operations).

      View table
      View in Article

      Table 7. System performance under unexpected events (percentage deviation from normal operations).

      Event typeMetricMARLStackelbergMPCSARL
      Renewable drop (30%)Cost increase8.37%14.62%11.28%10.05%
      Stability index–3.21%–7.89%–5.43%–4.76%
      Recovery time (h)2.344.813.673.12
      Demand spike (25%)Cost increase6.93%12.37%9.84%8.51%
      Stability index–2.78%–6.42%–4.95%–3.89%
      Recovery time (h)1.873.952.832.41
      Price forecast error (20%)Cost increase4.52%9.76%7.31%6.18%
      Stability index–1.43%–4.28%–3.12%–2.35%
      Recovery time (h)1.262.732.051.68
    • Table 8. System response to renewable generation forecast errors.

      View table
      View in Article

      Table 8. System response to renewable generation forecast errors.

      Forecast errorMARL cost increaseMARL stability indexMARL recovery time (h)
      5%1.28%–0.54%0.37
      10%2.76%–1.19%0.83
      15%4.65%–2.03%1.42
      20%6.93%–3.11%2.18
      25%9.87%–4.46%3.09
    Tools

    Get Citation

    Copy Citation Text

    Jian-Dong Yao, Wen-Bin Hao, Zhi-Gao Meng, Bo Xie, Jian-Hua Chen, Jia-Qi Wei. Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks[J]. Journal of Electronic Science and Technology, 2025, 23(1): 100290

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Aug. 22, 2024

    Accepted: Oct. 31, 2024

    Published Online: Apr. 7, 2025

    The Author Email: Zhi-Gao Meng (mengzhigao718@163.com)

    DOI:10.1016/j.jnlest.2024.100290

    Topics