Advanced Photonics Nexus, Volume. 3, Issue 4, 046003(2024)

Decision-making and control with diffractive optical networks

Jumin Qiu1, Shuyuan Xiao2,3, Lujun Huang4、*, Andrey Miroshnichenko5, Dejian Zhang1, Tingting Liu2,3、*, and Tianbao Yu1、*
Author Affiliations
  • 1Nanchang University, School of Physics and Materials Science, Nanchang, China
  • 2Nanchang University, School of Information Engineering, Nanchang, China
  • 3Nanchang University, Institute for Advanced Study, Nanchang, China
  • 4East China Normal University, School of Physics and Electronic Science, Shanghai, China
  • 5University of New South Wales Canberra, School of Physics and Electronic Science, Canberra, Australia
  • show less
    Figures & Tables(5)
    DON for decision-making and control. (a)–(c) The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous and high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). (d) Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by an error backpropagate algorithm. (e) The experimental setup of the DON for decision-making and control. (f) The building block of the DON.
    Playing tic-tac-toe. (a) The schematic illustration of the DON composed of an input layer, hidden layers of three cascaded diffractive blocks, and an output layer for playing tic-tac-toe. (b) and (c) The sequential control of the DON in performing gameplay tasks for X and O. (d) The accuracy rate of playing tic-tac-toe. There is a collection of 87 games utilized for predicting X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. (e) Dependence of the prediction accuracy on the number of hidden layers.
    Playing Super Mario Bros. (a) The layout of the designed network for playing Super Mario Bros. (b) and (c) Snapshots of Mario’s jumping and crouching actions by comparing the output intensities of actions. The output intensity of the jump is maximum at the 201st frame, so the predicted action is jump, and Mario is controlled to act, as shown in panel (b). A similar series of prediction and control for another crouch action can also be observed in panel (c). (d) The inverse prediction result. Considering the predicted crouch at the current state is crucial for updating Mario’s action, we use the maximized output intensity of the crouch as input, ignoring the simultaneous output of other actions (Video 1, MP4, 19.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s1]).
    Playing Car Racing. (a) The layout of the designed network for playing Car Racing. (b) The control of the steering direction and angle of the car with respect to the difference value between the intensities at the current state, normalized between −1 and 1. (c)–(f) Snapshots of controlling the car steering. When the car is facing a left-turn track in panel (c), the output intensity on the left keeps the value greater than the right intensity, allowing continuous control in updating the rotation angle of the left-turn action. A similar control process can also be performed for the right-turn track in panel (e). In addition, the anti-disturbance of the network is validated by introducing (d) the Gaussian blur and (f) Gaussian noise to the game images (Video 2, MP4, 8.36 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s2]; Video 3, MP4, 6.78 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s3]; Video 4, MP4, 16.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s4]).
    Experimental demonstration of the DON for tic-tac-toe. (a) The photo of the experimental system, where the unlabeled devices are lenses, a spatial filter is used to remove the unwanted multiple-order energy peaks, and a filter is mounted on the camera. (b) The output of the first layer of the sample in Fig. 2(a), and the red arrows represent the polarization direction of the incident light. (c) and (d) The sequential control of the DON in playing the same two games as in Figs. 2(b) and 2(c), respectively. The experimental results are normalized based on simulation results. Sim., simulation result; Exp., experimental result.
    Tools

    Get Citation

    Copy Citation Text

    Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu, "Decision-making and control with diffractive optical networks," Adv. Photon. Nexus 3, 046003 (2024)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Research Articles

    Received: Feb. 7, 2024

    Accepted: May. 13, 2024

    Published Online: May. 31, 2024

    The Author Email: Huang Lujun (ljhuang@phy.ecnu.edu.cn), Liu Tingting (ttliu@ncu.edu.cn), Yu Tianbao (yutianbao@ncu.edu.cn)

    DOI:10.1117/1.APN.3.4.046003

    CSTR:32397.14.1.APN.3.4.046003

    Topics