Autonomous decision-making for spacecraft close approaches in the Earth-Moon environment

环境设置：设置仿真步长、目标状态、推力上限、运动范围、训练数据长度…
$f o r$ $k = 1 : N_{s t e p}$ （1）初始化策略网络和价值网络参数；
（2）产生动作：将航天器质量、相对位置和速度信息，输入到新网络输出航天器的推力；
（3）状态更新：将推力输入到环境中得到奖励 $r$ 和下一时刻状态 $s_{-}$ ；
（4）状态收集：收集交互的经验数据 $(s_{1}, a_{1}, r_{1}, \dots, s_{T}, a_{T}, r_{T})$ ，放入数据缓存区；
$i f$ 数据长度 $n = b u f f e r_s i z e$ ：
（5）利用本文第3.2.2节提出的内部奖励探索机制计算新的奖励值作为后续折扣奖励计算的数据；
（6）将存储的状态 $s$ 集合输入 $C r i t i c$ 网络，得到对应所有状态的状态价值函数 $V (s)$ ，结合折扣奖励 $R_{t}$ 计算优势函数估计值 $A_{t}$ ；

Table 1. Training pseudocode

View table

View in Article

Table 1. Training pseudocode

环境设置：设置仿真步长、目标状态、推力上限、运动范围、训练数据长度…
（7）计算 $C r i t i c$ 网络的 $c_l o s s$ 函数，然后反向传播更新 $C r i t i c$ 网络的参数；
（8）将存储的动作 $a$ 组合输入两个策略网络，分别得到一个正态分布，进而求得重要性抽样比率 $r (θ_{k})$ ；
（9）根据式（24）计算 $A c t o r_o l d$ 网络的 $l o s s$ 函数，反向传播更新 $A c t o r_n e w$ 网络参数；
（10）将 $A c t o r_n e w$ 网络的参数更新至 $A c t o r_o l d$ 网络；
$e l s e$ ：
（11）返回步骤（2），继续采集数据；
$e n d$ .

Table 2. Earth-Moon system parameters
View table
View in Article
Table 2. Earth-Moon system parameters
参数值
质量参数 0.012 150 585 6
系统质量 6.045 8×1 0²⁴ kg
地月距离 3.844×10⁸ m
系统周期 375 200 s

Table 3. Improved neural network structure for PPO algorithm
View table
View in Article
Table 3. Improved neural network structure for PPO algorithm
层神经元个数（A/C）激活函数类型
输入层 16/16 Linear
隐藏层1 256/256 Tanh LSTM
隐藏层2 256/256 Tanh LSTM
隐藏层3 64/64 Tanh MLP
隐藏层4 64/64 Tanh MLP
输出层 3/1 Linear

Table 4. Improving the parameters of the PPO algorithm

View table

View in Article

Table 4. Improving the parameters of the PPO algorithm

参数	值（Actor）	值（Critic）
折扣因子 $γ$	0.99	-
GAE超参数 $λ$	1	-
Clip函数参数 $ε$	0.1	-
学习率 $α$	0.000 05	0.000 05
交叉熵系数	0.000 03	-
梯度裁剪参数	0.1	0.1
批次大小	64	64
训练轮数	10	10
重要性抽样比率阈值	1.5	-

Table 5. Comparison of 50 m approach task test

View table

View in Article

Table 5. Comparison of 50 m approach task test

算法	成功率/%	最终位置 $ρ_{f}$ /m	最终速度 ${\dot{ρ}}_{f}$ /（m·^s-1）	燃料消耗 $Δ V$ /（m·^s-1）	飞行时间 $T$ /s
PPO	100	0.415 $\pm$ 0.011	0.093 $\pm$ 0.004	11.218 $\pm$ 1.071	29.537 $\pm$ 0.215
改进PPO	100	0.543 $\pm$ 0.061	0.086 $\pm$ 0.003	6.735 $\pm$ 0.122	40.158 $\pm$ 1.092

Table 6. Comparison of 200 m approach task test

View table

View in Article

Table 6. Comparison of 200 m approach task test

算法	成功率/%	最终位置 $ρ_{f}$ /m	最终速度 ${\dot{ρ}}_{f}$ /（m·^s-1）	燃料消耗 $Δ V$ /（m·^s-1）	飞行时间 $T$ /s
PPO	100	0.694 $\pm$ 0.014	0.095 $\pm$ 0.001	19.133 $\pm$ 1.213	60.331 $\pm$ 1.472
改进PPO	100	0.612 $\pm$ 0.090	0.090 $\pm$ 0.005	11.234 $\pm$ 0.521	88.733 $\pm$ 3.557

Table 7. Results of approach task test with interference

View table

View in Article

Table 7. Results of approach task test with interference

算法	任务	成功率/%	最终位置 $ρ_{f}$ /m	最终速度 ${\dot{ρ}}_{f}$ /（m·^s-1）	燃料消耗 $Δ V$ /（m·^s-1）	飞行时间 $T$ /s
PPO算法	50 m	100	0.997 $\pm$ 0.002	0.008 $\pm$ 0.006	15.464 $\pm$ 0.544	47.167 $\pm$ 0.955
PPO算法	200 m	99	0.683 $\pm$ 0.026	0.097 $\pm$ 0.001	19.952 $\pm$ 1.032	63.173 $\pm$ 1.071
改进PPO算法	50 m	100	0.896 $\pm$ 0.050	0.097 $\pm$ 0.001	7.064 $\pm$ 0.230	76.738 $\pm$ 1.942
改进PPO算法	200 m	100	0.723 $\pm$ 0.097	0.082 $\pm$ 0.007	10.994 $\pm$ 0.301	93.992 $\pm$ 1.422

Tools

Get Citation

Copy Citation Text

Cheng HUANG, Zhicong QIU, Jiazhong XU. Autonomous decision-making for spacecraft close approaches in the Earth-Moon environment[J]. Optics and Precision Engineering, 2025, 33(6): 979

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites