Endoscopes are the devices that acquire images or other information through thin tubular structures, which have gradually developed into popular tools in medical and industrial fields[
Chinese Optics Letters, Volume. 19, Issue 8, 081101(2021)
High-speed multimode fiber imaging system based on conditional generative adversarial network
The multimode fiber (MMF) has great potential to transmit high-resolution images with less invasive methods in endoscopy due to its large number of spatial modes and small core diameter. However, spatial modes crosstalk will inevitably occur in MMFs, which makes the received images become speckles. A conditional generative adversarial network (GAN) composed of a generator and a discriminator was utilized to reconstruct the received speckles. We conduct an MMF imaging experimental system of transmitting over 1 m MMF with a 50 μm core. Compared with the conventional method of U-net, this conditional GAN could reconstruct images with fewer training datasets to achieve the same performance and shows higher feature extraction capability.
1. Introduction
Endoscopes are the devices that acquire images or other information through thin tubular structures, which have gradually developed into popular tools in medical and industrial fields[
Some approaches developed from computational imaging[
Generative adversarial networks (GANs) are utilized to optimize the DNN method. GAN was first proposed to get the natural image distribution from a random vector[
Sign up for Chinese Optics Letters TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
2. Operating Principle
The structure of the conditional GAN is shown in Fig. 1. Figure 1(a) is the network of the generator, which is a U-net in essence. The sizes of the input layer and output layer are determined by the resolution of the input image. The generator is divided into a contraction path and an expansion path in the network structure setting. The discriminator is a convolutional neural network. The input of the discriminator is the ground truth or the output image of the generator splicing with a corresponding speckle pattern, which follows the idea of conditional GANs, as shown in Fig. 1(b). The generator is trained to confuse the discriminator, which aims to make the discriminator fail to distinguish the output of the generator from the real images, while the discriminator is trained to distinguish the output of the generator as fake as far as possible. Unlike the discriminator in the conventional GAN, which determines whether the image generated by the generator matches the distribution of the real sample set, the function of the discriminator in this conditional GAN is to determine whether the image generated by the generator is the same as the label image of the real sample set. The objective of the conditional GAN is defined as[
Figure 1.Structure of the conditional GAN; (a) architecture of the generator; (b) principle of the discriminator. G, generator; D, discriminator.
Therefore, the final objective is[
3. Experimental Setup and Results
The experiment setup is built as shown in Fig. 2 in order to get training data. A He–Ne laser (Thorlabs HNL210LB) operating at 632.8 nm generates a narrow laser beam. The laser beam illuminates a pixels digital micromirror device (DMD, V-7001, VIALUX) after transmitting through an optical expander (Edmund #2186) and an attenuator (NDC-100C-4M). The DMD modulates the gray-scale images onto the incident beam in amplitude by controlling the deflection of the internal micromirror. Then, the beam is coupled into a stable 1-m-long MMF (50-µm-core, Thorlabs) by the microscope objective lens (OBJ1, PLN objective and single port tube lens, Olympus) and magnified by OBJ2 (the same as OBJ1). After that, the output speckle patterns are captured by a pixels CCD camera (PL-D721, PixeLINK).
Figure 2.Experiment setup. DMD, digital micromirror device; OBJ, microscope objective lens; MMF, multimode fiber; CCD, charge-coupled device.
The Modified National Institute of Standards and Technology (MNIST) database and Fashion-MNIST database are used as input datasets, respectively. The images with a resolution of pixels are padded to pixels as the ground truth. The input images are then adapted to the pixels DMD by the operation of zero-padding and up-sampling. We can acquire one pixels speckle pattern for every single image displayed on the DMD. Finally, the speckle patterns are resized into pixels in the MNIST dataset and pixels in the Fashion-MNIST dataset to facilitate subsequent processing.
U-net has been proved to solve the image reconstruction problem with the MNIST dataset in MMF imaging system. In this conditional GAN, the structure of the generator is similar to the U-net, as shown in Fig. 3, which contains a down-sampling unit and an up-sampling unit. The down-sampling unit is made of several stride convolutional layers, which uses rectified linear units (ReLU) as the activation function. The features of the input image are concentrated in the bottleneck layer with a resolution of . The up-sampling unit is distributed symmetrically with the down-sampling unit, which has several deconvolutional layers. Skip connections are added between the down-sampling layers and the up-sampling layers with the same size. Due to the higher complexity of the Fashion-MNIST dataset, the resolution of the input speckle pattern and the depth of the network are increased. We use an asymmetric U-net-like network with input resolution of and out resolution of as the generator. At the same time, the resolution of the bottleneck layer is set to to prevent overfitting caused by an excessively deep network.
Figure 3.Structures of generator in the (a) MNIST experiment and (b) Fashion-MNIST experiment.
The discriminator is a convolutional neural network, which is shown in Fig. 4. The input is the speckle pattern concatenated with the image generated by the generator or the ground truth. The input is followed by several convolutional layers. Finally, an eigenmatrix is output for discrimination. When the input contains the ground truth, the label is set to all ones, indicating that the input is real. If not, it is set to all zeros indicating that the input is fake.
It is known that an image often has a low-frequency part and a high-frequency part. in Eq. (2) could enforce correctness at the low frequencies[
|
Firstly, the networks are trained with different output resolutions and receptive fields of discriminators for 100 epochs. We collect 500 speckle patterns of the MNIST dataset. All of the programs are run in Python 3.7 environment with NVIDIA Geforce GTX1080 graphics processing unit (GPU). We train our networks with an Adam optimizer, and the learning rate is set as [
In the MNIST experiment, reconstruction images with an output resolution achieve the best performance in both PSNR and SSIM, and the corresponding receptive field is , as shown in Fig. 5(a). In the Fashion-MNIST experiment, the images are relatively complex and have more high-frequency features; therefore, the receptive field of the discriminator should be relatively smaller. We also obtain the reconstructed images with different output resolutions, as shown in Fig. 5(b). It shows that the best reconstruction performance is achieved when the output resolution is , and the receptive field size is . Setting up the appropriate output resolution could decrease the time of network training together with high-quality results. Accordingly, we set the discriminator with the above resolution in the following experiments, and the discriminator network structure is shown in Fig. 4.
Figure 4.Structures of discriminator in the (a) MNIST experiment and (b) Fashion-MNIST experiment.
Figure 5.Reconstruction performances with different output resolutions of the discriminator in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.
After training the conditional GAN, we compare its performance with that of U-net. The structure of U-net for comparison is similar to that of the generator. In the U-net of the MNIST experiment, the bottleneck layer is after only three convolutional layers. In the U-net of the Fashion-MNIST experiment, the bottleneck layer is after only four convolutional layers. The conditional GAN and U-net in the two experiments are trained by 2800 training sets and 4800 training sets for 100 epochs, respectively. As shown in Fig. 6, with the increase of training epochs, the loss decreases gradually, and finally both networks converge. Due to the process of adversarial training, the training process of conditional GAN is not quite stable compared with that of U-net. The same images in test sets are chosen to compare the reconstruction quality in Fig. 7. The conditional GAN shows slightly better reconstruction performance on some typical image features, such as boundary and brightness changes, than U-net. The advantages are more obvious in conditional GAN for images with more details, such as Fashion-MNIST, which also validates our previous analysis. U-net is proved to have good reconstruction performance for some simple images with fewer high-frequency features in MNIST datasets. However, when the images contain more high-frequency features, the reconstruction performance of U-net is poor. The discriminator of the conditional GAN enables the generator to better mine the high-frequency features of the image, which improves its feature extraction ability. The conditional GAN provides the possibility for complicated image reconstruction from speckle patterns.
Figure 6.Loss for training process of U-net and the conditional GAN in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.
Figure 7.Reconstruction results of U-Net and the conditional GAN in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.
The number of training datasets is an important parameter that affects the performance of networks. In order to compare the performance of the two networks with different numbers of training datasets, the conditional GAN and U-net are trained by different numbers of training sets for 100 epochs, respectively. Figure 8 shows the comparison of the reconstructions between our network and U-net in the test set evaluated by PSNR and SSIM. The horizontal axis is the size of different training sets. Generally, increasing the number of training sets in both networks can significantly improve the quality of image reconstruction. With the same number of training sets, the reconstruction quality of the conditional GAN is better than that of U-net, which is more obvious with the small training set and complex images. In other words, the conditional GAN only requires smaller training sets than U-net to achieve the same reconstruction quality. It can also be concluded that this conditional GAN has stronger capabilities for feature extraction than U-net.
Figure 8.PSNR and SSIM at each training set number by U-net and the conditional GAN in (a) the MNIST experiment and (b) the Fashion-MNIST experiment.
4. Conclusion
Considering the structure of the conditional GAN, the generator is essentially a U-net, while the additional discriminator can guide the training of the generator to converge faster. Moreover, the discriminator enables the generator to show better ability in discrimination and constraint for both high- and low-frequency parts of the image to improve the feature extraction ability of the generator. Therefore, smaller datasets can work in reconstruction with the conditional GAN. The experimental results show that this conditional GAN could reconstruct images with fewer training datasets and shows higher feature extraction capability compared with the conventional method of U-net. For both MNIST and Fashion-MNIST, hundreds of training images are reduced by using this conditional GAN in the case of achieving the same accuracy of reconstructed images. In addition, this conditional GAN also has higher feature extraction capability because of the discriminator and the advantage in high-frequency processing. However, the network only performs well for the samples similar to the training data. In the future, it is important to use stronger network or transfer learning methods to improve the generalization ability of models.
[1] M. Kyrish, T. S. Tkaczyk. Achromatized endomicroscope objective for optical biopsy. Biomed. Opt. Express, 4, 287(2013).
[2] M. Hughes, T. P. Chang, G.-Z. Yang. Fiber bundle endocytoscopy. Biomed. Opt. Express, 4, 2781(2013).
[3] T. Čižmár, K. Dholakia. Exploiting multimode waveguides for pure fibre-based imaging. Nat. Commun., 3, 1027(2012).
[4] C. Liu, L. Deng, D. Liu, L. Su. Modeling of a single multimode fiber imaging system(2016).
[5] H. Shen, J. Gao. Deep learning virtual colorful lens-free on-chip microscopy. Chin. Opt. Lett., 18, 121705(2020).
[6] X. Wang, H. Liu, M. Chen, Z. Liu, S. Han. Imaging through dynamic scattering media with stitched speckle patterns. Chin. Opt. Lett., 18, 042604(2020).
[7] I. N. Papadopoulos, S. Farahi, C. Moser, D. Psaltis. Focusing and scanning light through a multimode optical fiber using digital phase conjugation. Opt. Express, 20, 10583(2012).
[8] Y. Choi, C. Yoon, M. Kim, T. D. Yang, C. Fang-Yen, R. R. Dasari, K. J. Lee, W. Choi. Scanner-free and wide-field endoscopic imaging by using a single multimode optical fiber. Phys. Rev. Lett., 109, 203901(2012).
[9] D. B. Conkey, E. Kakkava, T. Lanvin, D. Loterie, N. Stasio, E. Morales-Delgado, C. Moser, D. Psaltis. High power, ultrashort pulse control through a multi-core fiber for ablation. Opt. Express, 25, 11491(2017).
[10] R. Di Leonardo, S. Bianchi. Hologram transmission through multi-mode optical fibers. Opt. Express, 19, 247(2011).
[11] B. Rahmani, D. Loterie, G. Konstantinou, D. Psaltis, C. Moser. Multimode optical fiber transmission with a deep learning network. Light: Sci. Appl., 7, 69(2018).
[12] N. Borhani, E. Kakkava, C. Moser, D. Psaltis. Learning to see through multimode fibers. Optica, 5, 960(2018).
[13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley27th International Conference on Neural Information Processing Systems. Generative adversarial nets, 2672(2014).
[14] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125(2017).
[15] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther. Autoencoding beyond pixels using a learned similarity metric. International Conference on Machine Learning, 1558(2016).
[16] V. Dumoulin, F. Visin. A guide to convolution arithmetic for deep learning(2016).
[17] D. P. Kingma, J. Ba. Adam: a method for stochastic optimization(2014).
Get Citation
Copy Citation Text
Zhenming Yu, Zhenyu Ju, Xinlei Zhang, Ziyi Meng, Feifei Yin, Kun Xu, "High-speed multimode fiber imaging system based on conditional generative adversarial network," Chin. Opt. Lett. 19, 081101 (2021)
Category: Imaging Systems and Image Processing
Received: Dec. 19, 2020
Accepted: Feb. 2, 2021
Posted: Feb. 3, 2021
Published Online: May. 25, 2021
The Author Email: Kun Xu (xukun@bupt.edu.cn)