Decision-making and control with diffractive optical networks

$DON for decision-making and control. (a)–(c) The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous and high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). (d) Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by an error backpropagate algorithm. (e) The experimental setup of the DON for decision-making and control. (f) The building block of the DON.$

Fig. 1. DON for decision-making and control. (a)–(c) The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous and high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). (d) Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by an error backpropagate algorithm. (e) The experimental setup of the DON for decision-making and control. (f) The building block of the DON.

Download full size

View in Article

$Playing tic-tac-toe. (a) The schematic illustration of the DON composed of an input layer, hidden layers of three cascaded diffractive blocks, and an output layer for playing tic-tac-toe. (b) and (c) The sequential control of the DON in performing gameplay tasks for X and O. (d) The accuracy rate of playing tic-tac-toe. There is a collection of 87 games utilized for predicting X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. (e) Dependence of the prediction accuracy on the number of hidden layers.$

Fig. 2. Playing tic-tac-toe. (a) The schematic illustration of the DON composed of an input layer, hidden layers of three cascaded diffractive blocks, and an output layer for playing tic-tac-toe. (b) and (c) The sequential control of the DON in performing gameplay tasks for X and O. (d) The accuracy rate of playing tic-tac-toe. There is a collection of 87 games utilized for predicting X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. (e) Dependence of the prediction accuracy on the number of hidden layers.

Download full size

View in Article

Fig. 3. Playing Super Mario Bros. (a) The layout of the designed network for playing Super Mario Bros. (b) and (c) Snapshots of Mario’s jumping and crouching actions by comparing the output intensities of actions. The output intensity of the jump is maximum at the 201st frame, so the predicted action is jump, and Mario is controlled to act, as shown in panel (b). A similar series of prediction and control for another crouch action can also be observed in panel (c). (d) The inverse prediction result. Considering the predicted crouch at the current state is crucial for updating Mario’s action, we use the maximized output intensity of the crouch as input, ignoring the simultaneous output of other actions (Video 1, MP4, 19.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s1]).

Download full size

View in Article

Fig. 4. Playing Car Racing. (a) The layout of the designed network for playing Car Racing. (b) The control of the steering direction and angle of the car with respect to the difference value between the intensities at the current state, normalized between −1 and 1. (c)–(f) Snapshots of controlling the car steering. When the car is facing a left-turn track in panel (c), the output intensity on the left keeps the value greater than the right intensity, allowing continuous control in updating the rotation angle of the left-turn action. A similar control process can also be performed for the right-turn track in panel (e). In addition, the anti-disturbance of the network is validated by introducing (d) the Gaussian blur and (f) Gaussian noise to the game images (Video 2, MP4, 8.36 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s2]; Video 3, MP4, 6.78 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s3]; Video 4, MP4, 16.8 MB [URL: https://doi.org/10.1117/1.APN.3.4.046003.s4]).

Download full size

View in Article

Fig. 5. Experimental demonstration of the DON for tic-tac-toe. (a) The photo of the experimental system, where the unlabeled devices are lenses, a spatial filter is used to remove the unwanted multiple-order energy peaks, and a filter is mounted on the camera. (b) The output of the first layer of the sample in Fig. 2(a), and the red arrows represent the polarization direction of the incident light. (c) and (d) The sequential control of the DON in playing the same two games as in Figs. 2(b) and 2(c), respectively. The experimental results are normalized based on simulation results. Sim., simulation result; Exp., experimental result.

Download full size

View in Article

Tools

Get Citation

Copy Citation Text

Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu, "Decision-making and control with diffractive optical networks," Adv. Photon. Nexus 3, 046003 (2024)

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Research Articles

Received: Feb. 7, 2024

Accepted: May. 13, 2024

Published Online: May. 31, 2024

The Author Email: Lujun Huang (ljhuang@phy.ecnu.edu.cn), Tingting Liu (ttliu@ncu.edu.cn), Tianbao Yu (yutianbao@ncu.edu.cn)

DOI:10.1117/1.APN.3.4.046003

CSTR:32397.14.1.APN.3.4.046003

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology