Screening COVID-19 from chest X-ray images by an optical diffractive neural network with the optimized F number

Jialong Wang; Shouyu Chai; Wenting Gu; Boyi Li; Xue Jiang; Yunxiang Zhang; Hongen Liao; Xin Liu; Dean Ta

doi:10.1364/PRJ.513537

1. INTRODUCTION

The Corona Virus Disease 2019 (COVID-19) has evolved into a pervasive global pandemic. The statistics reveal more than 750 million confirmed cases and a staggering death toll of more than 6.5 million at the time of this composition [1]. Correspondingly, it is imperative to accurately and promptly detect COVID-19 and mitigate its transmission. The biological detection methods of COVID-19 include the reverse transcriptase-polymerase chain reaction (RT-PCR) technique, which has been endorsed as the prevailing gold standard [2], and the lateral flow test, which is an alternative that has improved convenience and timeliness compared to RT-PCR. On the other hand, the radiological imaging detection methods, e.g., the chest X-ray (CXR) and computed tomography (CT) scans, have demonstrated the capability to detect COVID-19-related radiographic abnormalities and provided the stable sensitivity [3 –6]. Compared to the CT technique, CXR scans can be obtained in a faster, convenient, and lower radiation way, making CXR a good choice for COVID-19 detection [7,8], particularly in the urgency of a scarcity of medical resources and a massive infection. However, it is worth noting that the manual interpretation of CXR images with suspected COVID-19 characteristics is tedious and time-consuming. In addition, the manual diagnosis may also bring human bias and mistakes.

Given the limitations of the aforementioned detection methods, the implementation of deep learning (DL) frameworks has been explored as a feasible alternative for COVID-19 detection. Notably, the DL methods have achieved significant breakthroughs in diverse domains, including speech recognition [9], image classification [10], and reconstruction [11]. Inspired by these, the DL frameworks, e.g., the deep convolutional neural networks, the weakly supervised learning, and the transfer learning methods, have been specifically designed to extract the relevant, quantitative, and high-level imaging features in a high-throughput manner from CXR images, allowing a notable boost in the detection of COVID-19 [12 –19]. However, a well-performing DL method commonly requires large-scale and high-throughput computing provided by digital computers, resulting in a high computation burden and energy consumption.

To overcome these limitations, optical computing provides a potential and promising way to allow enjoying the inherent advantages of light propagation, including high power efficiency, speed, parallelization, and throughput [20 –23]. Recently, diffractive deep neural networks ( $D^{2} NNs$ ) [24] based on optical computing methods have been proposed and well used to implement various tasks, such as image classification [25,26], imaging [27], holography [28,29], and logic operation [30]. Inspired by $D^{2} NN$ , in this work, we propose an alternative COVID-19 detection method, which is achieved by using an optical diffractive neural network framework, termed as ODNN-COVID. ODNN-COVID enables to process the diffraction light propagating wave derived from a CXR image through the neurons/pixels [24] on a series of transmissive/reflective diffractive layers. These neurons collectively modulate the phase of the light field through a certain way of interacting between light and matter, and eventually project the diagnosis result of the patient CXR image onto an output plane. According to specific COVID-19 diagnosis tasks (see Section 4), the value of phase modulation for each neuron is designed through the computer-based training by DL methods. Once the training process is completed, this passive optical diffractive neural network can parallelly modulate the phase of the light field and then execute the diagnosis task of COVID-19 between an input plane and an output plane at the speed of light propagation. And, notably, the computation processes of the entire inference require less additional power consumption, making it also an energy-efficient, fast computing speed, and high-throughput approach.

Meanwhile, in our work, the F number is used as an overall characterization of the physical parameters (including the axial distance, the pixel size, the wavelength of the illumination light, etc.) in ODNN-COVID framework. Through a series of theoretical derivations and comparative simulations, the optimized range of F number is investigated to better optimize the performance of the network without affecting the spatial complexity. The comprehensive examinations of these physical parameters gain insight into how they impact the performance of the optical diffractive networks and optimize the design and configuration of the ODNN-COVID system, thereby facilitating its practical application in real-world scenarios.

2. MAIN

A. Optical Diffractive Neural Network for COVID-19 Detection

The schematic of the ODNN-COVID framework is depicted in Fig. 1. Briefly, ODNN-COVID consists of several successive planes, including the input plane, the diffractive layers, and the output plane. Before being fed into the optical system, the grayscale-related pixel information of CXR images is first encoded by the optical coding processes. In the input plane, the optically encoded CXR image is illuminated by a plane light wave with the wavelength $λ$ . After that, the formed light field in the input plane will pass through, layer by layer, the subsequent diffractive layers. Here, these spatially distributed and successive diffractive layers are designed by using DL techniques, and are connected through diffraction propagation.

$Architecture of the ODNN-COVID framework for COVID-19 detection based on the CXR images. An optically encoded CXR image is illuminated by a plane wave with the wavelength λ in the input plane. The incident light sequentially propagates between the diffractive layers in turn. Finally, the output light intensity will be concentrated in the specific region of the output plane to the greatest extent, which depicts the final diagnosis results of COVID-19. The axial distances among these planes and layers are denoted by di,i=0,…,L, where L is the number of diffractive layers in the network.$

Figure 1.Architecture of the ODNN-COVID framework for COVID-19 detection based on the CXR images. An optically encoded CXR image is illuminated by a plane wave with the wavelength $λ$ in the input plane. The incident light sequentially propagates between the diffractive layers in turn. Finally, the output light intensity will be concentrated in the specific region of the output plane to the greatest extent, which depicts the final diagnosis results of COVID-19. The axial distances among these planes and layers are denoted by $d_{i}, i = 0, \dots, L$ , where $L$ is the number of diffractive layers in the network.

Download full size

View all figures

In detail, each diffractive layer is composed of independent and trainable neurons, and each neuron has the capability to independently modulate the phase of the light field. When the incident light passes through these neurons, the phase information will correspondingly alter depending on the modulation strength of the neurons. Thus, according to a diffractive wave containing information of CXR images brought by light–matter interactions and Rayleigh–Sommerfeld diffraction theory (see Section 4), these diffractive layers enable the implementation of COVID-19 diagnosis in an optical computation way. Ultimately, the diagnosis (classification) result of the CXR image will be shown in the specific region on the output plane. In detail, during both the training and the testing phases, some small detector regions are deliberately divided in the output plane where each region represents a distinct diagnosis outcome. Moreover, it is worth noting that the inherent advantages of optical computing enable ODNN-COVID to power-efficiently and rapidly perform COVID-19 detection.

B. Diagnosis Tasks Implemented by ODNN-COVID

First, the numerical simulations of the binary-classification diagnosis task are conducted to demonstrate the capability of proposed ODNN-COVID method in diagnosing COVID-19. After that, three-classification and four-classification diagnosis tasks are also implemented for demonstrating the feasibility of ODNN-COVID in achieving a finer diagnosis between the other types of lung pneumonia and COVID-19 from the CXR images.

In detail, when constructing ODNN-COVID, the number of diffractive layers is set to $L = 3$ , with each layer separated by a distance of $d_{i} = 10 cm$ and the wavelength of the illumination light $λ = 670 nm$ . Each diffractive layer consists of a total of $200$ × 200 neurons. Therefore, the three successive diffractive layers in the optical network collectively comprise a total of 0.12 million neurons. This configuration allows improving computational efficiency and setting controlled simulations. The actual physical dimensions of each neuron measure $8 μm \times 8 μm$ . All detection regions, situated in the middle of the output plane, encompass $20 \times 20$ neurons. The above parameters are determined by the optimized F number. The detailed information can be referred to in Section 2.D and Fig. 9 in Appendix A.

The CXR images from Curated X-Ray Dataset [31] and CC-CXRI-P Dataset [32] are used to train and test the ODNN-COVID network in this work. Briefly, the Curated X-Ray Dataset is constructed by collating 15 different available datasets [16,33 –35], and there are a total of 9208 CXR images. Among them, 1281 images are from COVID-19, 4657 images are from other types of pneumonia, including 1656 viral pneumonia and 3001 bacterial pneumonia, and 3270 images show normal. The CC-CXRI-P Dataset is constructed from the China Consortium of Chest X-ray Image Investigation (CC-CXRI), which releases 612 COVID-19 CXR images, 1659 viral pneumonia CXR images, 2021 bacterial pneumonia CXR images, and 3629 normal CXR images. To avoid the unbalanced data problem [12], a total of 1200 CXR images from the Curated X-Ray Dataset and 600 CXR images from the CC-CXRI-P Dataset are randomly selected in each image class. In the binary-classification task, all CXR images are divided into COVID-19 and non-COVID-19, where non-COVID-19 represents samples that do not contain COVID-19 features. Comparably, the three-classification task consists of three classes: COVID-19, pneumonia, and normal, where pneumonia indicates types of pneumonia other than COVID-19. Furthermore, pneumonia is subdivided into bacterial pneumonia and viral pneumonia in the four-classification task. Here, the training set and test set are divided according to the ratio of 8:2. Table 1 summarizes the used training and testing datasets. Before being fed into the network, all CXR images are processed into single-channel grayscale images and resized to $200 \times 200$ for matching the compatibility requirements as input to the model. After ODNN-COVID is trained by the training set, it is blindly tested by the optically encoded testing set images. In this work, none of images in the training set are used for the final blind testing of the ODNN-COVID model.Table 1.

Training and Testing Datasets for COVID-19 Diagnosis Tasks^a

Classification Tasks		COVID-19	Non-COVID-19	Pneumonia	Normal	Bacterial	Viral
Binary-classification diagnosis	Training set	960 + 480	960 + 480	×	×	×	×
Binary-classification diagnosis	Testing set	240 + 120	240 + 120	×	×	×	×
Three-classification diagnosis	Training set	960 + 480	×	960 + 480	960 + 480	×	×
Three-classification diagnosis	Testing set	240 + 120	×	240 + 120	240 + 120	×	×
Four-classification diagnosis	Training set	960 + 480	×	×	960 + 480	960 + 480	960 + 480
Four-classification diagnosis	Testing set	240 + 120	×	×	240 + 120	240 + 120	240 + 120

The number on the left of the plus sign depicts the number of images selected from the Curated X-Ray Dataset, and the number on the right depicts the number of images selected from the CC-CXRI-P Dataset.

Figure 2 shows the diagnosis results of ODNN-COVID, demonstrated by the simulations of binary-, three-, and four-classification tasks, respectively. In Fig. 2, the patient CXR images from different classes, which are used as the input data of ODNN-COVID, are shown at the input plane. Correspondingly, the diagnosis results (i.e., the light intensity distributions after phase modulation of each diffractive layer) are shown at the output planes. To more clearly clarify the diagnosis results in multi-classification tasks, the histograms of the normalized light intensity for each detection region are further depicted. Moreover, the phase modulation maps ( $0 - 2 π$ ) of three diffractive layers after training are also shown, where the light field will be affected by the phase values at the corresponding pixel position. It can be clearly observed that after the phase modulation and the diffraction propagation, the output light intensity values (brightness) within the pre-defined detection regions exhibit a noticeable distinction when facing CXR images with different classes, indicating that good diagnosis accuracy can be obtained by ODNN-COVID.

$Diagnosis results implemented by ODNN-COVID from numerical simulations. (a) Diagnosis results of the binary-classification task. The first row depicts the phase modulation map (0−2π) of three diffractive layers of ODNN-COVID after training. The second through fifth rows report the patient CXR samples from COVID-19 and non-COVID-19 (at the input plane) and the corresponding optical diagnosis results (at the output plane). The sixth and seventh rows show the intensity distribution of the light fields just after the phase modulation of each diffractive layer (the second through fourth columns, respectively) and the final outputs (the last column), when taking a COVID-19 and a non-COVID-19 as input data, respectively. (b), (c) Diagnosis results of three-classification and four-classification diagnosis tasks. Examples of input CXR images for each class, the corresponding optical diagnosis results, the normalized results of the output light intensity in the detection regions, and phase modulation maps are shown, respectively.$

Figure 2.Diagnosis results implemented by ODNN-COVID from numerical simulations. (a) Diagnosis results of the binary-classification task. The first row depicts the phase modulation map ( $0 - 2 π$ ) of three diffractive layers of ODNN-COVID after training. The second through fifth rows report the patient CXR samples from COVID-19 and non-COVID-19 (at the input plane) and the corresponding optical diagnosis results (at the output plane). The sixth and seventh rows show the intensity distribution of the light fields just after the phase modulation of each diffractive layer (the second through fourth columns, respectively) and the final outputs (the last column), when taking a COVID-19 and a non-COVID-19 as input data, respectively. (b), (c) Diagnosis results of three-classification and four-classification diagnosis tasks. Examples of input CXR images for each class, the corresponding optical diagnosis results, the normalized results of the output light intensity in the detection regions, and phase modulation maps are shown, respectively.

Download full size

View all figures

The quantitative results from Fig. 3 further demonstrate the capability of ODNN-COVID for COVID-19 diagnosis tasks. An overall accuracy rate of 92.64% is achieved during the blind test conducted for the binary diagnosis task. Specifically, the accuracy for correctly diagnosing COVID-19 stands at 89.44%, and, for non-COVID-19, it reaches a higher 95.83% [Fig. 3(a)]. Moreover, the three-classification diagnosis task yielded an overall accuracy rate of approximately 88.89%, with a COVID-19 diagnosis accuracy of 90.28%, a pneumonia diagnosis accuracy of 85.28%, and a normal diagnosis accuracy of 91.11% [Fig. 3(b)]. In contrast, as reported in other studies [17,35 –38], the overall accuracy decreases relatively significantly when faced with the four-classification diagnostic task. Despite this decline to 75.49%, ODNN-COVID still demonstrates relatively high accuracy in identifying COVID-19 at 91.94% and normal at 85.28% [Fig. 3(c)].

Figure 3.Quantitative analysis of diagnosis results implemented by ODNN-COVID. (a)–(c) Confusion matrices of all classification tasks. (d) Loss and accuracy curves of all classification tasks. The values in dashed boxes represent the accuracy and the standard deviation. It can be observed that between the 20th and 25th epochs, the loss and accuracy of all models have approached convergence.

Download full size

View all figures

C. Experimental Results of COVID-19 by ODNN-COVID

To confirm the diagnosis results obtained from the above simulations, ODNN-COVID is experimentally demonstrated by a custom-built ODNN system based on spatial light modulators (SLMs). Figure 4 shows the schematic diagram and photograph of the optical system. Briefly, an amplitude SLM (ASLM) is used to modulate the amplitude of the input plane wave, serving as the optical input. Additionally, a phase SLM (PSLM) is utilized as a diffractive layer for the phase modulation, contributing to the construction of the ODNN-COVID system (see Section 4). As a proof of concept and for the convenience of observing experimental results, the binary- and three-classification tasks are focused on and implemented through a single-layer ODNN-COVID framework. The SLMs used in the optical system possess $1920 \times 1200 pixels$ . To minimize distortion and calculation error, all input CXR images are resized to $800 \times 800$ and then employ zero padding to match the dimensions of $1920 \times 1200$ before being loaded onto the ASLM. The neuron size of both SLMs is $8 μm \times 8 μm$ , the wavelength of selected illumination light is 670 nm, and the initial distance between any two continuous planes is set to 20 cm. The above parameters are also determined by the optimized F number. Each detection region contains $20 \times 20$ neurons. The relatively small size can better display the difference of light intensity between detection regions in the single-layer ODNN-COVID. The diagnosis result is determined based on the differences of light intensity detected in the pre-defined detection regions. The light intensity, captured by the CMOS sensor, is expected to be directly identifiable by the naked eyes, which avoids extra efforts on subsequent data collection and processing, and allows for the convenient interpretation and evaluation.

Figure 4.Diagram and photograph of the experimental setup for ODNN-COVID. (a) Diagram of the experimental setup. L1 to L4 are lenses. P1 to P4 are polarizers. BS1 and BS2 are beam splitters. M1 and M2 are 4f systems, which are used for collimation and pixel matching. The plane wave emitted by the laser is coded by the ASLM and then phase modulated by the PSLM, and the final diagnosis result is captured by the CMOS camera. (b) Photograph of the experimental setup. The ASLM, PSLM, and CMOS camera form the ODNN system as the main components.

Download full size

View all figures

Note that it is challenging to implement the precise pixel-to-pixel alignment in the experiments, which may introduce errors that affect the detection performance of the optical diffractive neural network system. To mitigate the impact of the mechanical misalignments, the random displacements are intentionally incorporated into the training process to make the ODNN-COVID network more resilient and robust [39,40]. Specifically, the random lateral perturbations $(- Δ x, Δ x)$ and $(- Δ y, Δ y)$ are introduced to the diffractive layers, and a random axial perturbation $(- Δ z, Δ z)$ is also incorporated between any two consecutive planes. Here, the perturbations in lateral directions $Δ x$ , $Δ y$ are both set to 0, 10, 20, and 30 pixel sizes. The perturbation in axial direction $Δ z$ is set to 0, 1, 2, and 3 cm. Figure 5 shows the diagnosis performance of ODNN-COVID with the anti-perturbation, where the $y$ axis represents the perturbation range introduced during training phase (orange dots) and the $x$ axis represents the range of perturbations presented during blind testing (purple dots). The results demonstrate that although increasing the error range during the training process may reduce the accuracy, ODNN-COVID does demonstrate stable performance in terms of accuracy indices and distinguishability of light intensity within detection regions. And when the testing error range exceeds the training error range, the diagnosis accuracy will drop significantly, and the differences in the light intensity between detection regions will become smaller. This phenomenon may be more significant in multi-classification diagnosis tasks.

Figure 5.Anti-perturbation strategy in the experimental system. The upper left corner shows the diagram of a single-layer ODNN-COVID and CXR images with COVID-19 and non-COVID-19 features. The $y$ axis represents the perturbation range introduced during the training (orange dots), and the $x$ axis represents the range of perturbations present in practice during the blind testing (purple dots). The middle part shows the light intensity output with the same CXR input and the corresponding overall accuracy and standard deviation (blue dashed boxes).

Download full size

View all figures

With the help of the anti-perturbation strategy, by loading the CXR images on the ASLM and the phase modulation map on the PSLM, the diagnosis results made by ODNN-COVID are finally captured by the CMOS camera. Figure 6(b) shows the phase modulation map ( $0 - 2 π$ ) of all $1920 \times 1200$ neurons in ODNN-COVID, which is the image loaded on the PSLM. Additionally, two patient samples of COVID-19 and non-COVID-19 with the corresponding simulations and experimental results are shown in the first to third rows of Fig. 6(c). We can observe that when the COVID-19 patient CXR images are loaded on the ASLM, the detection region on the left (i.e., the pre-defined detection region indicating COVID-19) gets higher light intensity (brighter). Conversely, when the non-COVID-19 CXR images are loaded on the ASLM, the detection region on the right (i.e., the pre-defined region corresponding to non-COVID-19) will be brighter. The normalized results of the experimental light intensity in the detection regions are also shown in the last row of Fig. 6(c). Figure 6(d) shows the confusion matrix of the simulation results of the single-layer ODNN-COVID. The single-layer network achieves the simulation accuracy of 84.17%. Figure 6(e) shows the confusion matrix of the experimental results of this single-layer network, where 60 images are chosen in each of the CXR image classes. The accuracy of 80.83% can be obtained in the experiment.

$Experimental results of the binary-classification task implemented by ODNN-COVID. (a) Diagram of a single-layer ODNN-COVID architecture. (b) Phase modulation map (0−2π) of the diffractive layer containing 1920×1200 neurons, which is loaded on the PSLM. (c) The diagnosis results of simulation and experiments. The first row shows the patient CXR sample images, which are loaded on the ASLM. All CXR images are resized to 800×800 and then employ zero padding to the size of 1920×1200. The second and third rows show the simulation results and final diagnosis images captured experimentally by the CMOS camera where the light intensity is focused on the pre-defined detection regions in a way that can be visually discerned. The fourth row shows the normalized experimental results of the sum of output light intensity within the specific detection regions. (d), (e) Confusion matrices and accuracies of the simulation (with the anti-perturbation strategy) and experimental results are shown.$

Figure 6.Experimental results of the binary-classification task implemented by ODNN-COVID. (a) Diagram of a single-layer ODNN-COVID architecture. (b) Phase modulation map ( $0 - 2 π$ ) of the diffractive layer containing $1920 \times 1200$ neurons, which is loaded on the PSLM. (c) The diagnosis results of simulation and experiments. The first row shows the patient CXR sample images, which are loaded on the ASLM. All CXR images are resized to $800 \times 800$ and then employ zero padding to the size of $1920 \times 1200$ . The second and third rows show the simulation results and final diagnosis images captured experimentally by the CMOS camera where the light intensity is focused on the pre-defined detection regions in a way that can be visually discerned. The fourth row shows the normalized experimental results of the sum of output light intensity within the specific detection regions. (d), (e) Confusion matrices and accuracies of the simulation (with the anti-perturbation strategy) and experimental results are shown.

Download full size

View all figures

In the three-classification task, as seen in Fig. 7, the phase modulation map ( $0 - 2 π$ ) loaded on the PSLM is shown in Fig. 7(a). Figure 7(b) shows a sample of COVID-19, normal, and pneumonia and the corresponding simulation and experimental results. The normalized experimental light intensities of the detection regions are also shown. It can be observed that when the CXR images of different classes are sequentially input into ODNN-COVID, the light intensity values within the three pre-set detection regions will also sequentially reach the maximum. The 60 images per class are also chosen for the experiment of the three-classification task. The network can also achieve accuracies of 80.19% and 74.44% in the simulation and experimental results. The above results demonstrate that the designed ODNN-COVID framework enables accurately and optically identifying COVID-19 cases from the acquired CXR images at the speed of light propagation, obviating the need for an external power source for computation.

Figure 7.Experimental results of the three-classification task implemented by ODNN-COVID. (a) Phase modulation map ( $0 - 2 π$ ) for the three-classification task, which is loaded on the PSLM. (b) Diagnosis results of simulation and experiments. The patient CXR sample, simulation, and experimental results, as well as the normalized experimental results of the sum of output light intensity within the detection regions. (c), (d) Confusion matrices and accuracies of the simulation (with the anti-perturbation strategy) and experimental results are shown.

Download full size

View all figures

D. Control Connectivity-Related Fresnel Numbers to Optimize the Performance of ODNN-COVID

Both simulations and experiments are implemented to demonstrate that the proposed ODNN-COVID method can use light as the computing medium and recognize the COVID-19 features according to the input patient CXR images. However, it is noteworthy that when implementing ODNN-COVID, the adopted parameters (including the axial distance, the pixel size, the wavelength of the illumination light, etc.) are critical and have a direct influence on the diagnosis performance. Hence, exploring how to select these physical parameters to achieve the optimal performance of ODNN-COVID within the limited spatial complexity is important, especially for the practical application in real-world scenarios.

The diffractive propagation of the light field between two consecutive diffractive layers can be mathematically expressed by the following formula: ${\hat{U}}_{m, n}^{l + 1} = U_{m, n}^{l} \times M,$ (1)where $U_{m, n}^{l}$ represents the vectorized light field after the phase modulation of the $l$ th diffractive layer, and ${\hat{U}}_{m, n}^{l + 1}$ represents the vectorized light field before the phase modulation of the $(l + 1) th$ diffraction layer. $M$ represents a symmetric complex-valued matrix.

ODNN-COVID can actually be analogized as a fully connected neural network. Therefore, an implicit physical parameter, the Connectivity between two consecutive layers, needs to be considered in advance. The Connectivity plays an important role in neural networks [24]; high Connectivity can enhance the transmission, expression, and inference capabilities of the network, allowing it to learn more complex and abstract features, but it may also introduce the over-fitting problem. The Connectivity of ODNN-COVID is generally related to two parameters, i.e., the distance $d$ between layers and the maximum half cone diffraction angle $φ_{\max}$ . The parameter $φ_{\max}$ can be further regulated by the illumination light wavelength $λ$ and the side length $a$ of the neurons, specifically as follows [24,41]: $φ_{\max} = \arcsin^{} (\frac{λ}{2 a}) .$ (2)

If every neuron can transmit information to all neurons in the next layer, the network can be regarded as a full connection network. It is obvious that the neurons at the corners of the diffractive layer have difficulty affecting all neurons of the next layer. Hence, the following formula is used to define the degree of the network’s Connectivity: $Connectivity = \frac{{(\tan (φ_{\max}) d)}^{2} π}{4 {(N a)}^{2}} (Connectivity = 1 if Connectivity > 1),$ (3)where $N$ is the number of neurons of one side of the diffraction layer.

As an example, the effect of Connectivity on the binary COVID-19 diagnosis task is explored through simulations. When $φ_{\max}$ remains unchanged under the premise of $N = 200$ , $λ = 670 nm$ , and $a = 8 μm$ , the distance $d$ is set to 60 mm, 30 mm, 10 mm, and 1 mm to adjust the Connectivity of a three-layer ODNN-COVID to about 100% (full connection), 50%, 5%, and 0.05%. Figure 8(a) depicts the corresponding results. When the Connectivity achieves 100%, the ODNN reaches the highest accuracy and presents outputs with high signal-to-noise ratio. And as the Connectivity drops to 50% and 5%, the ODNN’s accuracy shows slight changes. However, the output contains more noisy light interference, resulting in a decrease in signal-to-noise ratio. When Connectivity further drops as low as 0.05%, the transfer matrix $M$ in Eq. (1) will tend to approach a diagonal matrix, which means the information transmitted and changed through diffraction is quite limited. Correspondingly, the accuracy of ODNN is changed significantly for worse and the outline of the CXR images can be vaguely observed on the output plane [see the fifth column of Fig. 8(a)]. Therefore, keeping appropriate Connectivity between any successive layers is beneficial to improving the detection performance of the ODNN-COVID network. To ensure appropriate Connectivity, the following formula should be satisfied: $N \times a \leq 4 \times d \times \tan (φ_{\max}) .$ (4)

On the other hand, the Fresnel number (F number) is an important dimensionless parameter in optics, which describes the regime of diffraction effects. Referring to Zheng et al. [42], the calculation formula for F number is described as follows: $F = \frac{a^{2}}{λ d} .$ (5)

It can be seen that the calculation process of the F number is intuitive. It is only related to three key physical parameters for constructing the ODNN-COVID, i.e., neuron size, axial distance between layers, and wavelength of illumination light. Therefore, in this work, the F number can be considered as a characterization of the combined effect of various physical parameters in ODNN-COVID. By controlling the F number within a reasonable range, the network can achieve more suitable Connectivity and implement the improved performance. By combining Eqs. (4) and (5), the following formula is obtained: $F = \frac{a^{2}}{λ d} \leq \frac{4 a \tan (φ_{\max})}{λ N} .$ (6)

Further, by combining Eqs. (2) and (6), the following equation is obtained: $F \leq \frac{2}{N \cos (φ_{\max})} .$ (7)

By Eq. (7), the upper bound of the optimized F number can be determined. Particularly, when the used illumination light is in the visible light band (e.g., 670 nm), the value of $φ_{\max}$ is generally small, so it can be considered that the upper bound of the F number is only related to the number of neurons of one side $N$ . This point is consistent with previous conclusions [24,41,42]. It should be noted that if the used illumination light is in the terahertz band (e.g., 0.75 mm), then the cosine value of $φ_{\max}$ is usually no longer negligible, as described in this work. On the other hand, there is also a lower bound on the optimized range of F number. In this work, the lower bound is determined by referring to previous studies [42].

$Effect of Connectivity and F number on ODNN-COVID’s performance. (a) Performance of ODNN-COVID with different Connectivity. A diagram of a three-layer ODNN-COVID and the used illumination light are shown in the upper left corner. The first row shows the diagrams of Connectivity, which are set to 100%, 50%, 5%, and 0.05%, respectively. The second to fifth rows show the results of the output light intensity (i.e., diagnosis results) by ODNN-COVID with the varying Connectivity when facing different types of CXR images. The corresponding overall accuracy and standard deviation are also shown in the blue dashed box. (b) Effect of F number on accuracy performance of a three-layer ODNN-COVID with different illumination light sources. (c) Light intensity output by the ODNN-COVID models with different F numbers and diffraction angles for the same CXR images. The output results highlighted by the black dashed box show that large diffraction angle indeed increased the upper bound of the optimized range of F number. The corresponding overall accuracy and standard deviation are also shown in the blue dashed box.$

Figure 8.Effect of Connectivity and F number on ODNN-COVID’s performance. (a) Performance of ODNN-COVID with different Connectivity. A diagram of a three-layer ODNN-COVID and the used illumination light are shown in the upper left corner. The first row shows the diagrams of Connectivity, which are set to 100%, 50%, 5%, and 0.05%, respectively. The second to fifth rows show the results of the output light intensity (i.e., diagnosis results) by ODNN-COVID with the varying Connectivity when facing different types of CXR images. The corresponding overall accuracy and standard deviation are also shown in the blue dashed box. (b) Effect of F number on accuracy performance of a three-layer ODNN-COVID with different illumination light sources. (c) Light intensity output by the ODNN-COVID models with different F numbers and diffraction angles for the same CXR images. The output results highlighted by the black dashed box show that large diffraction angle indeed increased the upper bound of the optimized range of F number. The corresponding overall accuracy and standard deviation are also shown in the blue dashed box.

Download full size

View all figures

To demonstrate the impact of changes in F number on the performance of ODNN-COVID, we acquire the three-layer ODNN with different F numbers ( $10^{- 5}$ to $10^{- 1}$ ) by changing the axial distances and conduct comparative simulations (see the Fig. 9 in Appendix A). At the same time, in order to highlight the role of the maximum half cone diffraction angle $φ_{\max}$ , visible light with a wavelength of 670 nm and terahertz light with a wavelength of 0.75 mm are both used as illumination light sources, corresponding to neuron side sizes of 8 μm and 0.38 mm, respectively. Correspondingly, the values of $φ_{\max}$ are 2.4° and 80.7°, and the upper bounds of the optimized ranges of F number are approximately $10^{- 2}$ and $5 \times 10^{- 2}$ .

Figure 8(b) shows that in the process of the F number decreasing from $10^{- 4}$ to $10^{- 5}$ , the accuracy of ODNN systems with different illumination lights remains at a low level. Meanwhile, it can be seen that it is increasingly blurry in the network’s output until the ODNN completely loses its diagnostic ability ( $F number = 10^{- 5}$ ). This is because a too small F number will cause the elements of the transfer matrix $M$ to converge [see the second to fourth columns of Fig. 8(c)]. During the process of the F number decreasing from $10^{- 2}$ to $5 \times 10^{- 4}$ , the Connectivity within the ODNN systems illuminated by visible and terahertz light is in an appropriate range, so the accuracy fluctuates around its peak at this point. The positions, shapes, and light intensity difference of both detection regions can be easily distinguished [see the fifth to eighth columns of Fig. 8(c)]. In the case where the F number is equal to $5 \times 10^{- 2}$ , owing to its large diffraction angle, the Connectivity of the terahertz-based ODNN remains in a relatively suitable range; thus the accuracy can still be maintained above 90%. However, the accuracy of visible light-based ODNN experiences a decrease, attributed to its small diffraction angle and weak Connectivity. The terahertz-based ODNN indeed achieves an output that can better promote diagnostic judgment compared to the visible light-based ODNN [see the ninth column of Fig. 8(c)]. Finally, when the F number is equal to $10^{- 1}$ , the ODNN systems based on both illumination light sources are at a low Connectivity, exhibiting a relatively low accuracy. The limited information transfer through diffraction propagation also leads to the rough outline of the CXR images being displayed in the output [see the tenth column of Fig. 8(c)]. It is also anticipated that as the network undertakes more intricate tasks, the influence of the F number on the network’s performance will also become more significant.

3. DISCUSSION AND CONCLUSION

The accurate and efficient COVID-19 detection methods play a vital role in the prevention and control of infectious diseases. Compared to the other COVID-19 detection methods, with the help of deep neural networks, the DL-based COVID-19 detection methods provide great capability in COVID-19 detection. However, the existing DL-based detection methods require a large amount of computing resources and energy consumption to obtain as accurate diagnosis results as possible. In this work, we propose a novel computational method for COVID-19 detection (ODNN-COVID) that utilizes visible light as the computing medium. Correspondingly, ODNN-COVID enjoys the advantages of optical computing such as low power consumption, high parallelization, and fast computational speed (at the speed of light) while maintaining a high diagnosis accuracy.

The results from the simulations and experiments demonstrate the capability of the proposed ODNN-COVID method. The three-layer system achieves an overall accuracy of 92.64% and 88.89% in binary- and three-classification diagnostic tasks from the numerical simulations. Although there is a noticeable decline in the simulation accuracy of the four-classification task, this phenomenon has also been reported by the other works. For the single-layer system, the simulation accuracy of 84.17% and the experimental accuracy of 80.83% can be obtained with the same configuration for the binary-classification task, and the simulation accuracy of 80.19% and the experimental accuracy of 74.44% can be obtained for the three-classification task. It should also be mentioned that the performance of the single-layer network in the four-classification task is limited, with simulation and experimental accuracies of 64.58% and 54.17%. Therefore, this part is not specifically shown in Section 2.

Furthermore, we also investigate the effects of the physical parameters of ODNN-COVID on the diagnosis performance, and find that the F number can be used as a key parameter to evaluate the overall detection characterization of the optical diffractive neural network. To further verify that the effect of each physical parameter on the detection performance of ODNN-COVID is actually depending on the change of F number and Connectivity, multiple groups of the controlled simulations are conducted by independently altering the specific parameters while retraining the ODNN-COVID detection capability. Details are presented in the Fig. 9 of Appendix A. The effect of the number of diffractive layers and the versatility of ODNN-COVID are also represented in Fig. 10 of Appendix B and Fig. 11 of Appendix C, respectively. Moreover, in the experimental process of ODNN, the F number can usually be easily adjusted by changing the axial distance $d$ since the pixel side size $a$ and the wavelength of the illumination light $λ$ are generally constant parameters. Therefore, making the F number in the optimized range can actually be understood as finding the optimized axial distance $d$ . Correspondingly, the axial distance $d$ between any two planes can also be iteratively updated as a training parameter, fine-tuned within the optimized range, in order to further improve the performance of ODNN-COVID. At the same time, neurons with side sizes of the same order of magnitude as the wavelength of the illumination light can reach a larger diffraction angle; then a smaller axial distance is able to reach the appropriate F number, which is beneficial to build a more compact ODNN system with high performance (see Fig. 8).

On the other hand, the optical implementation of ODNN-COVID truly exploits the large connectivity, the parallelization of free-space optical diffraction, and the massive parameters of layer-by-layer optical processing. Correspondingly, we believe that benefiting from its power efficiency and instantaneous computation capability, optical processors, e.g., ODNN-COVID demonstrated in this work, can make a considerable contribution to the detection of some highly contagious diseases (e.g., COVID-19) in emergency situations. Of course, the approach of optical computing is a breakthrough and possesses the potential for other medical applications in many scenarios. Furthermore, the new single-pixel imaging or microscope architectures, autonomous driving, high scattering imaging, logic operations, and hologram, etc., can also benefit from its extreme parallelism, power efficiency, and superfast computing speed.

It should be pointed out that although by using the anti-perturbation strategy, the relative positions of the optical elements in the experimental optical system have allowed for a certain amount of perturbation, the alignment among ASLM, PSLM, CMOS, and the collimation of illumination light is still critical, which may still directly affect the detection performance of ODNN-COVID. Of course, the complexity and space of the experimental system can be simplified by some methods such as the optimized range of the F number we proposed, transmissive type of SLMs, and the ultra-precision micro–nano printing technology. Moreover, in addition to SLMs, the phase modulation of the light field can also be realized by the planar structures fabricated by 3D printing [24], where the phase value that each neuron needs to modulate can be converted to different thicknesses of pixels. When light passes through these neurons with different thicknesses, the optical path difference will be generated, thereby achieving the purpose of phase modulation. However, once the thickness of each neuron in the diffractive layers is determined and printed, it cannot be changed anymore. In other words, the 3D printing optical diffractive network cannot be reconfigured or scaled. Meanwhile, if the frequency of the incident light is in the visible light band, the diffractive layer needs to be fabricated with the finer printing technology [41] (e.g., photolithography system), imposing the fabrication complexity. The above problems will be systematically investigated in the future.

In summary, we introduce an optical diffractive neural network, named ODNN-COVID, for COVID-19 detection. ODNN-COVID functions as a passive optical processor, enabling the detection of COVID-19 across various types of CXR images. Comprehensive simulations and experimental findings substantiate the efficacy of ODNN-COVID in diagnosing COVID-19, characterized by its low power consumption, high parallelization, and fast computing capabilities. As we move forward, our focus will be further refining the optical diffractive network’s architecture and exploring its applications in various other medical domains.

4. MATERIALS AND METHODS

A. Light–Matter Interactions and Rayleigh–Sommerfeld Diffraction Theory

When the light field of a plane wave reaches the diffractive layer, the phase modulation of the whole light field can be completed by the neurons with different phase modulation capabilities. The phase modulation capability of each neuron in the diffractive layer $Δ θ_{m, n}^{l}$ is the parameter that the network needs to train, which can be described as follows: $e^{i {\hat{θ}}_{m, n}^{l}} = e^{i (θ_{m, n}^{l} + Δ θ_{m, n}^{l}),} {\hat{U}}_{m, n}^{l} = U_{m, n}^{l} Δ U_{m, n}^{l},$ (8)where $m$ , $n$ represent the neuron located in the $m$ th row and the $n$ th column. $l$ represents the $l$ th diffractive layer. $θ_{m, n}^{l}$ represents the phase value of the neuron at the corresponding $(m, n)$ position when the light wave has just reached the $l$ th diffractive layer. ${\hat{θ}}_{m, n}^{l}$ represents the phase value of the corresponding neuron at the moment when the light wave has just passed through the diffractive layer. And $Δ θ_{m, n}^{l}$ represents the phase modulation obtained when the light wave passes through the neuron. $U$ represents the complex-valued light field.

According to the Rayleigh–Sommerfeld diffraction equation [43] and Huygens–Fresnel principle [44], each point of the wavefront can be regarded as a point wave source that produces the spherical secondary waves, and the wavefront at any time in the future can be regarded as the envelope of the secondary waves. Therefore, the propagation mode of a coherent plane wave between adjacent diffracted layers can be described as follows: $U_{m, n}^{l + 1} = \sum_{x = 1}^{x} \sum_{y = 1}^{y} {\hat{U}}_{x, y}^{l} \frac{e^{i k r}}{r} \frac{d}{r} (\frac{1}{2 π r} + \frac{1}{i λ}),$ (9)where $k = 2 π / λ$ and $e$ stands for natural logarithm. $x$ , $y$ represent the neuron at the position of the $x$ th row and $y$ th column of the diffraction layer. $i$ represents the imaginary unit. $r$ represents the Euclidean distance between neurons in adjacent diffractive layers with $r = \sqrt{{(m - x)}^{2} + {(n - y)}^{2} + d^{2}}$ .

Considering the computational complexity, the angular spectrum approach [45], which transformed the distribution of the light field from the spatial domain to the frequency domain through Fourier transform, is applied in this work: $U_{m, n}^{l + 1} = F^{- 1} (F ({\hat{U}}_{m, n}^{l}) F ((\frac{e^{i k r}}{r^{'}} \frac{d}{r^{'}} (\frac{1}{2 π r^{'}} + \frac{1}{i λ})))),$ (10)where $r^{'} = \sqrt{m^{2} + n^{2} + d^{2}}$ . $F$ represents the Fourier transform, and $F^{- 1}$ represents the inverse of the Fourier transform. In Eq. (10), $F ((\frac{e^{i k r}}{r^{'}} \frac{d}{r^{'}} (\frac{1}{2 π r^{'}} + \frac{1}{i λ})))$ can also be replaced by the frequency domain transfer function $H_{i} (f_{m}, f_{n})$ , which can be shown as follows: $U_{m, n}^{l + 1} = F^{- 1} (F ({\hat{U}}_{m, n}^{l}) H_{i} (f_{m}, f_{n})), H_{i} (f_{m}, f_{n}) = {\begin{cases} e^{(i 2 π {(λ^{- 2} - f_{m}^{2} - f_{n}^{2})}^{1 / 2}} d, & f_{m}^{2} + f_{n}^{2} \leq λ^{- 2} \\ 0, & f_{m}^{2} + f_{n}^{2} > λ^{- 2} \end{cases},$ (11)where $f_{m}, f_{n}$ are Fourier frequencies.

During the experiments, after passing through the ASLM loaded with the input CXR image, the parallel light formed by the beam expansion and collimation is used as the input light field of the network: ${\hat{U}}_{m, n}^{0} = A_{m, n} U_{m, n}^{p w},$ (12)where $A_{m, n}$ represents the 2D distribution of amplitude modulation that is determined by the pixel values of the input CXR image and realized by the ASLM. $U_{m, n}^{p w}$ represents the complex-valued light field of the plane wave originally input by the laser. ${\hat{U}}_{m, n}^{0}$ represents the complex-valued light field output by the plane wave after ASLM modulation.

When the incident plane wave enters an ODNN system, after $l$ times of phase modulation and $l + 1$ times of free propagation, the CMOS camera is used to capture the output light intensity distribution, that is, the network output $I_{m, n}$ . In this work, $I_{m, n}$ is used as the basis for COVID-19 diagnosis and described as follows: $I_{m, n} = U_{m, n}^{l + 1} U_{m, n}^{* l + 1},$ (13)where $U_{m, n}^{* l + 1}$ is the complex-valued light field conjugated to $U_{m, n}^{l + 1}$ .

B. Experimental System

Undergoing beam expansion and collimation, the parallel illumination light from the laser (MRL-FN-671-300 mW, 670 nm) is incident as a plane wave and then impinges upon the ASLM (HDSLM80R, UPOLabs). The ASLM loads the patient CXR images expressed in 0–255 gray scales serving as the optical code of the input images. The incident light then travels a distance of 20 cm to reach the PSLM (E-Series $1920 \times 1200$ SLM, Meadowlark), where it undergoes phase modulation. Subsequently, after the phase modulation of PSLM, it freely propagates 20 cm again to reach the CMOS camera (Zyla 4.2 Plus sCMOS, Andor) for capturing the network output. In this work, the pixel sizes of ASLM and PSLM are both $8 μm \times 8 μm$ , and the pixel size of the CMOS camera is $6.5 μm \times 6.5 μm$ . Correspondingly, the two lenses [i.e., L3 and L4 in Fig. 4(a)] with focal lengths of 18 cm and 15 cm are used to form a 4f system and eliminate the possible errors caused by the pixel size mismatch. Moreover, the linear polarizers are used to adjust the light field to the polarization angles that keep SLMs in the best working state.

C. Network Training

In the training process of the ODNN-COVID, the error backpropagation is used to update the modulation ability of each neuron. In this work, ODNN-COVID is trained with stochastic gradient descent and optimized by Adam with an initial learning rate of 0.01. The learning rate gradually decreases as the epoch increases. All the optical diffractive networks demonstrated in this work are trained for 25 epochs with a batch size of 10, and with shuffling of samples between every epoch.

The applied loss function is constructed to achieve two objectives. One is to focus the input light intensity distribution as much as possible within the pre-defined detection regions, and the other is to make the differences of the light intensity values within the detection regions as strong as possible. Therefore, the loss function is divided into two parts, i.e., the mean squared error (MSE) function and the cross-entropy function, which are described as follows: $loss = φ_{1} l_{MSE} (I^{o}, I_{gt}^{o}) + l_{crossentropy} (T, φ_{2} D), l_{MSE} (I^{o}, I_{gt}^{o}) = {‖ I^{o} - I_{gt}^{o} ‖}_{2}^{2}, l_{crossentropy} (T, φ_{2} D) = - \sum_{i = 1}^{C} T_{i} \log (softmax (φ_{2} D_{i})), softmax (D_{i}) = e^{D_{i}} / \sum_{j = 1}^{C} e^{D_{j}},$ (14)where $I^{o}$ denotes the sum of the light intensity outside the detector regions. $I_{gt}^{o}$ denotes the ground truth value. $T$ denotes the true value label after one-hot encoding. $D$ denotes the sum of the light intensity of each pixel in the detection regions. $C$ represents the number of classification categories. $φ_{1}$ and $φ_{2}$ represent the weight parameters.

Category: Image Processing and Image Analysis

Received: Nov. 20, 2023

Accepted: Apr. 7, 2024

Published Online: Jun. 17, 2024

The Author Email: Xin Liu (xin_liu@fudan.edu.cn)

DOI:10.1364/PRJ.513537

CSTR:32188.14.PRJ.513537