Photonics Research, Volume. 9, Issue 8, 1493(2021)

Low-latency deep-reinforcement learning algorithm for ultrafast fiber lasers

Qiuquan Yan1, Qinghui Deng2, Jun Zhang1, Ying Zhu2, Ke Yin3, Teng Li2,4, Dan Wu1,5, and Tian Jiang2,4、*
Author Affiliations
  • 1College of Computer, National University of Defense Technology, Changsha 410073, China
  • 2College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, China
  • 3National Innovation Institute of Defense Technology, Academy of Military Sciences PLA China, Beijing 100071, China
  • 4Beijing Institute for Advanced Study, National University of Defense Technology, Beijing 100020, China
  • 5Hefei Interdisciplinary Center, National University of Defense Technology, Hefei 230037, China
  • show less
    Figures & Tables(10)
    Structure of the low-latency deep-reinforcement learning algorithm based on DDPG strategy in the laser environment.
    Flow chart for stable mode-locked state monitoring.
    Experimental setup of UFL based on SA. WDM, 980/1550 nm wavelength division multiplexer; EDF, erbium-doped fiber; EPC, electrical polarization controller; FPGA, field-programmable gate array; SA, saturable absorber; OC, optical coupler; ISO, isolator; PD, photodetector.
    Characterization of the output when the laser is in the FML state. (a) Time-domain pulse output within 0.2 μs laboratory time. The pulse interval is 25.34 ns. (b) Frequency-domain signal characterization in 4 GHz bandwidth. The frep is 39.459 MHz. (c) The pulse autocorrelation and the result of fitting using sech2 function. (d) The spectrum of the laser output.
    Comparison of the time-domain and frequency-domain signal output by the laser in different polarization states. From left to right: FML state and Q switch mode-locked state. In each column, the top row shows the time-domain signal within 40 μs laboratory time, and the bottom plot is the frequency-domain signal within the 4 GHz bandwidth.
    Effect diagram of algorithm recovery after the laser loses mode-locked state due to motor vibration. (a) The convergence curve of the reward value in the last 100 rounds of stable mode-locked calculation model-training iterations. (b) The frep and power change of the laser output during the process of applying vibration to the laser and starting the recovery algorithm. (c) The recovery time statistics of 1500 vibration tests. (d) The output frep and power change of the system within 10 min under the condition of vibrating for 1.5 s per minute and running the mode-locked recovery algorithm all of the time.
    Changes in the reward value of the designed algorithm model in the system at different temperatures. (a)–(f) are the results of the last 100 rounds of training recorded from 15°C to 40°C.
    • Table 1. Hyperparameters in the Training Process

      View table
      View in Article

      Table 1. Hyperparameters in the Training Process

      HyperparameterValueHyperparameterValue
      actor_lr106critic_lr106
      γ0.99τ0.02
      buffer_size100batch_size8
    • Table 2. Comparison of Different Algorithms in Mode-Locking Time

      View table
      View in Article

      Table 2. Comparison of Different Algorithms in Mode-Locking Time

      Algorithm NameTime
      GA [39]30  min mode-locking time
      EA [42]30  min mode-locking time
      GA [24]30  s recovery time
      HLA [41]0.22 s fastest recovery time 3.1 s average recovery time
      Running on this systemGA377 s average recovery time
      PSO216 s average recovery time
      DELAY0.472 s fastest recovery time 1.948 s average recovery time
    • Table 3. Average Mode-Locked State Recovery Time at Different Temperatures

      View table
      View in Article

      Table 3. Average Mode-Locked State Recovery Time at Different Temperatures

      Temperature (°C)152025
      Time (s)1.6161.8692.163
      Temperature (°C)303540
      Time (s)1.9512.4891.645
    Tools

    Get Citation

    Copy Citation Text

    Qiuquan Yan, Qinghui Deng, Jun Zhang, Ying Zhu, Ke Yin, Teng Li, Dan Wu, Tian Jiang. Low-latency deep-reinforcement learning algorithm for ultrafast fiber lasers[J]. Photonics Research, 2021, 9(8): 1493

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Lasers and Laser Optics

    Received: Apr. 19, 2021

    Accepted: Jun. 6, 2021

    Published Online: Jul. 22, 2021

    The Author Email: Tian Jiang (tjiang@nudt.edu.cn)

    DOI:10.1364/PRJ.428117

    Topics