Photonics Research, Volume. 12, Issue 11, 2418(2024)

Speckle-free holography with a diffraction-aware global perceptual model

Yiran Wei1, Yiyun Chen1, Mi Zhou1, Mu Ku Chen2, Shuming Jiao3, Qinghua Song1, Xiao-Ping Zhang1, and Zihan Geng1、*
Author Affiliations
  • 1Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
  • 2Department of Electrical Engineering, City University of Hong Kong, Hong Kong 999077, China
  • 3Department of Engineering, Shenzhen MSU-BIT University, Shenzhen 518172, China
  • show less

    Computer-generated holography (CGH) based on neural networks has been actively investigated in recent years, and convolutional neural networks (CNNs) are frequently adopted. A convolutional kernel captures local dependencies between neighboring pixels. However, in CGH, each pixel on the hologram influences all the image pixels on the observation plane, thus requiring a network capable of learning long-distance dependencies. To tackle this problem, we propose a CGH model called Holomer. Its single-layer perceptual field is 43 times larger than that of a widely used 3×3 convolutional kernel, thanks to the embedding-based feature dimensionality reduction and multi-head sliding-window self-attention mechanisms. In addition, we propose a metric to measure the networks’ learning ability of the inverse diffraction process. In the simulation, our method demonstrated noteworthy performance on the DIV2K dataset at a resolution of 1920×1024, achieving a PSNR and an SSIM of 35.59 dB and 0.93, respectively. The optical experiments reveal that our results have excellent image details and no observable background speckle noise. This work paves the path of high-quality hologram generation.

    1. INTRODUCTION

    A holographic display retains the amplitude and phase information of the object and can physically reproduce the complete light wave emitted by the object, resulting in realistic displays [110]. Computer-generated holography (CGH), which eliminates the need for complex optical-interference-based recording processes and enables the display of physically non-existent objects, is the dominant technology for holographic displays [1118].

    Contemporary mainstream CGH algorithms utilizing artificial neural networks mainly employ convolutional neural networks (CNNs) [5,1934]. CNNs have demonstrated a robust performance in image processing tasks owing to their translational invariance and the inductive bias of local receptive fields [35]. However, in the hologram generation process, as each pixel on the hologram can act as a point light source affecting the entire reconstructed image, the network’s capacity to learn non-local features significantly impacts the quality of the generated hologram [31]. Unfortunately, the receptive fields of current CNN-based CGH models are typically much smaller than the image size, and thus cannot effectively learn long-range dependencies.

    A more versatile neural network model, the Transformer, has been proposed and demonstrated superior performance over CNNs in a multitude of computer vision tasks. This superiority is primarily attributed to the Transformer’s capacity to capture long-range dependencies in images through a global self-attention mechanism [3638]. With the global self-attention mechanism, the Transformer is capable of learning the comprehensive relationships between holographic images and target images within a single network layer. Despite the potent performance of transformers, their computational demands far exceed those of CNNs, limiting their applicability to high-resolution images.

    In this work, we propose a diffraction-aware CGH model called Holomer. Through the sliding window self-attention mechanism and embedding-based feature dimensionality reduction, Holomer is capable of learning the complex relationship between target images and holograms with fewer layers compared to some existing methods [10,24,34]. At the same time, we propose a metric called perceptive index (PI). It evaluates the network-based CGH model’s ability to learn the inverse diffraction process. Holomer achieves a PSNR and an SSIM of 35.59 dB and 0.93, respectively, on the DIV2K dataset. The fidelity of the generated holograms with a resolution of 1920×1024 is evaluated through numerical simulations and experimental validation.

    2. METHOD

    Holomer generates holograms using a two-stage architecture as shown in Fig. 1. The target phase generator estimates the phase of the target amplitude. Afterward, the predicted phase and the target amplitude are propagated to the SLM plane using an angular spectrum method [39,40]. The amplitude and phase propagated to the SLM plane are converted to a phase-only hologram through a phase-only hologram encoder. During the training phase, the generated phase-only hologram undergoes further propagation to the imaging plane through ASM. The resulting reconstructed amplitude of the hologram is compared to the target amplitude, and the mean squared error (MSE) between the amplitude of the two images is computed as a loss. This loss is utilized to update the parameters of both the target phase generator and the phase-only hologram encoder. The MSE loss is calculated as follows: Loss=1Ni=1NArec,iAtar,i2,where Arec,i and Atar,i represent the amplitude values of the reconstructed and target images at the ith pixel, respectively, and N is the total number of pixels.

    (a) Schematic of Holomer’s (our method) two-stage architecture and its training pipeline. (b) Network architecture schematic for the target phase generator and the phase-only hologram encoder. The target phase generator receives a single-channel target amplitude as input. The input of the phase-only hologram encoder is dual-channel amplitude and phase images. The U-shaped network architecture ensures the consistency between output and input dimensions, enhancing the network’s capability to learn multi-level features. (c) Holomer block schematic; each Holomer block contains two modules, window multihead self-attention (W-MSA) and sliding window multihead self-attention (SW-MSA). The two diagrams correspond to the receptive field of the two modules. By employing the sliding window self-attention mechanism, the Holomer block significantly enhances its receptive field, thereby improving its ability to learn long-range features of the diffraction process.

    Figure 1.(a) Schematic of Holomer’s (our method) two-stage architecture and its training pipeline. (b) Network architecture schematic for the target phase generator and the phase-only hologram encoder. The target phase generator receives a single-channel target amplitude as input. The input of the phase-only hologram encoder is dual-channel amplitude and phase images. The U-shaped network architecture ensures the consistency between output and input dimensions, enhancing the network’s capability to learn multi-level features. (c) Holomer block schematic; each Holomer block contains two modules, window multihead self-attention (W-MSA) and sliding window multihead self-attention (SW-MSA). The two diagrams correspond to the receptive field of the two modules. By employing the sliding window self-attention mechanism, the Holomer block significantly enhances its receptive field, thereby improving its ability to learn long-range features of the diffraction process.

    Holomer maintains resolution consistency between the generated hologram and input image through a Unet architecture as depicted in Fig. 1(b). The model incorporates a residual connection from Unet [35] for efficient training. The Holomer block, with window self-attention and sliding window self-attention, first calculates the self-attention of each independent window. Following the self-attention process, the windows are shifted by half pixels, both vertically and horizontally, introducing a cross-window connection as shown in Fig. 1(c). This shifting mechanism ensures that subsequent layers capture interactions beyond the initially defined windows, enabling the model to effectively integrate local and global information.

    According to the angular spectrum theory [41], the reconstructed image of a hologram I(x,y,z) at a specific depth z can be obtained by convolving the hologram h(x,y) with the diffraction impulse response (DIR) t(x,y,k,z), in which k=2πλ represents the wave number, and λ denotes the wavelength of the light source. The computational process can be referred to in Fig. 2: I(x,y,z)=h(x,y)*t(x,y,k,z).

    (a), (b) Illustration of the amplitude and phase distribution of diffraction impulse response (DIR), respectively. In the computation of the DIR, the wavelength employed is 520 nm, the pixel size is 6.4 μm, and the propagation distance is 20 cm. (c) The schematic illustration for image reconstruction through the convolution of DIR with a hologram. To fulfill the conditions of linear convolution, it is necessary to zero-pad the hologram to twice its original size before proceeding with the convolution. (d) A comparative schematic illustration contrasting the receptive fields of Holomer (ours) and a single-layer CNN, along with their respective coverage of the DIR region size. (e) The curve illustrating the variation in the proportion of the sum of intensities over the region covered by the receptive field to the total DIR intensity with respect to changes in the receptive field size, annotated with markers for a single-layer CNN and Holomer. The DIR coverage of Holomer exhibits a notable improvement compared to CNN.

    Figure 2.(a), (b) Illustration of the amplitude and phase distribution of diffraction impulse response (DIR), respectively. In the computation of the DIR, the wavelength employed is 520 nm, the pixel size is 6.4 μm, and the propagation distance is 20 cm. (c) The schematic illustration for image reconstruction through the convolution of DIR with a hologram. To fulfill the conditions of linear convolution, it is necessary to zero-pad the hologram to twice its original size before proceeding with the convolution. (d) A comparative schematic illustration contrasting the receptive fields of Holomer (ours) and a single-layer CNN, along with their respective coverage of the DIR region size. (e) The curve illustrating the variation in the proportion of the sum of intensities over the region covered by the receptive field to the total DIR intensity with respect to changes in the receptive field size, annotated with markers for a single-layer CNN and Holomer. The DIR coverage of Holomer exhibits a notable improvement compared to CNN.

    Since the angular spectrum method is a rigorous solution to the Rayleigh-Sommerfeld diffraction formula, we can derive the diffraction impulse response (DIR) t(x,y,k,z) as follows: t(x,y,k,z)=12πzr(1rjk)ejkrr.

    In the formula, r=x2+y2+z2 denotes the distance between the source point and the observation point.

    The current common CNN-based CGH models are limited by the smaller receptive field of CNNs, which cannot learn the whole diffraction process in a single layer. However, this can be achieved using a global self-attention mechanism: Attention(Q,K,V)=Softmax(QKTdk)V,where Q, K, and V are the query, key, and value matrices derived from multiplying the input features X with three learnable parameter matrices WK, WQ, and WV, and dk is the dimensionality of the key. Despite the extraordinary performance of the Transformer in computing holograms, its computational demands are substantial. Calculating self-attention for a 1920×1080 image requires a staggering 17,060 GB of GPU memory resources, making it unfeasible.

    Therefore, there is a trade-off between computational efficiency and the receptive field in model design. Our proposed Holomer employs two designs to address this issue. First, an embedding layer is used in the input layer to reduce the dimension of the target image, where each pixel in the low-dimensional feature map contains information from 16 pixels of the original image. Second, the main structure of the model employs window self-attention layers and sliding-window self-attention layers, with each window size being 16×16. After computing the window self-attention, the windows are shifted to compute sliding window self-attention, which expands the perceptive field by a factor of 1.5. The receptive field of each Holomer block is 16×16×1.5, which is 43 times that of the convolutional layer.

    To directly compare the learning capabilities of CNN and Holomer concerning the diffraction process, we propose a metric called perceptive index (PI), which is the ratio of the intensity integral of the DIR covered by receptive fields of single module Irecep to the total intensity integral of the DIR Itotal. The definition is as follows: PI=IrecepItotal.

    A higher value of PI indicates that the receptive field of the module covers a larger area of the DIR, signifying stronger learning capabilities for the diffraction model. The calculated results are shown in Fig. 2(e). The DIR used for this computation is based on a pixel size of 6.4 μm, a holographic image resolution of 256×256  pixels, and a propagation distance of 20 cm. It is noteworthy that using a larger receptive field can indeed enhance the learning capabilities, but this also increases the computational complexity. Additionally, due to the spherical wave characteristics of the energy distribution in DIR, the intensity at positions far from the central point is quite low, imposing minimal impact on the reconstructed image. Therefore, the performance improvement gained from enlarging the receptive field has diminishing returns. Considering these trade-offs, we set the window size of Holomer to 16×16.

    3. RESULTS

    A. Performance Comparison

    In order to validate Holomer’s performance in generating holograms, we demonstrate the reconstructed images through numerical simulations and optical experiments. The dataset used for training and testing is DIV2K [42], the training and test sets contain 800 and 100 high-resolution images, respectively, and the device used to train the network is an NVIDIA RTX 3090 GPU. We compare the performance of Holomer with an iterative algorithm and model-driven self-supervised models, including the Wirtinger algorithm [10], HoloNet [24], and CCNN [34].

    Table 1 compares the performance of Wirtinger, HoloNet, CCNN, and Holomer on the DIV2K test dataset. In the simulation, we trained three different networks to generate holograms at wavelengths of 450, 520, and 638 nm, respectively. The reconstruction distance is 20 cm. Each network is trained using the images from the training set corresponding to the respective color channel. Holomer’s PSNR improves by 1.76 dB to 6.14 dB over the previous algorithm, and its SSIM improves by 0.02 to 0.11. The presented metrics unequivocally demonstrate the pronounced enhancement in holographic fidelity achieved by Holomer in contrast to the aforementioned algorithms. Additionally, the accompanying Fig. 3 juxtaposes the outcomes of the four algorithms alongside Holomer’s full-color simulation, utilizing images sourced from the test set of the DIV2K dataset. The holographic reconstructions generated by the Wirtinger algorithm exhibit superior detail quality, albeit compromised by the pronounced speckle noise resulting from its stochastic initialization. The holographic reconstruction quality produced by Holomer stands as paramount among all learning-based models, concurrently surpassing the metrics of all presented algorithms.

    Performance Comparison of CGH Algorithms

    AlgorithmPerceptive IndexPSNR (dB)SSIM
    WirtingerNone31.880.82
    HoloNet0.6%33.830.91
    CCNN0.6%29.450.84
    Holomer (ours)32.6%35.590.93

    A comparative demonstration of numerical simulation results encompasses Wirtinger, HoloNet, CCNN, and our proposed Holomer, utilizing images sourced from the DIV2K validation set. In specific experiments, Wirtinger underwent 200 iterations, while HoloNet, CCNN, and Holomer underwent training for an equivalent duration on the DIV2K training set.

    Figure 3.A comparative demonstration of numerical simulation results encompasses Wirtinger, HoloNet, CCNN, and our proposed Holomer, utilizing images sourced from the DIV2K validation set. In specific experiments, Wirtinger underwent 200 iterations, while HoloNet, CCNN, and Holomer underwent training for an equivalent duration on the DIV2K training set.

    B. Experimental Demonstration

    In the experiments, the laser source employed is the FISBA READYBeam, operating at wavelengths of 450, 520, and 638 nm. The spatial modulator used is the HOLOEYE LETO-3-CFS-127 SLM with a pixel size of 6.4 μm. The experiment setup is shown in Fig. 5(a).

    The display efficacy of holograms generated by Holomer was validated through optical reconstruction, with the results depicted in Fig. 4. The reconstructed images indicate that iterative algorithms are severely impacted by significant speckle noise. Additionally, CGH generation models based on CNNs are constrained by small receptive fields, unable to eliminate the influence of speckle noise. In contrast, Holomer, leveraging a diffraction-inspired module, successfully eradicates background speckle noise. Furthermore, Holomer exhibits superior local learning capabilities, such as enhanced detail representation and brightness fidelity, when compared to other methods. Experimental results demonstrate that Holomer achieves higher resolution in the reconstructed holographic images, showcasing improved detail texture, and enhanced contrast in darker regions.

    Captured reconstructed images of the green channel (520 nm) for Wirtinger, HoloNet, CCNN, and Holomer methods, along with the corresponding ground truth image.

    Figure 4.Captured reconstructed images of the green channel (520 nm) for Wirtinger, HoloNet, CCNN, and Holomer methods, along with the corresponding ground truth image.

    (a) Schematic diagram of the experimental setup used to verify the optical reconstruction of Holomer-generated holograms using an aperture to eliminate diffracted images from other levels. (b), (c) Optical reconstruction images of our method in color channels directly captured by a camera. (d), (e) The corresponding color holograms loaded on the SLM.

    Figure 5.(a) Schematic diagram of the experimental setup used to verify the optical reconstruction of Holomer-generated holograms using an aperture to eliminate diffracted images from other levels. (b), (c) Optical reconstruction images of our method in color channels directly captured by a camera. (d), (e) The corresponding color holograms loaded on the SLM.

    To further substantiate the practical utility of our proposed methodology, the holograms generated in simulation are synthesized into a full-color holographic image via a temporal multiplexing approach. The holograms of the three RGB channels are sequentially loaded on the SLM and the light sources are synchronized to the corresponding wavelengths to achieve full-color display. Experimental results, as illustrated in Fig. 5, utilize images sourced from the DIV2K dataset. It is observable that the full-color display rendered by our method exhibits high contrast and clarity. The ringing artifacts at the edge occur because of signal truncation at the edges of the SLM and the SLM’s limited aperture [43].

    4. CONCLUSION

    In summary, we propose Holomer, a diffraction-aware deep learning model for the generation of high-quality phase-only holograms. Leveraging the sliding window self-attention mechanism and embedding-based feature dimensionality reduction, the model’s receptive field is significantly enhanced. A larger receptive field in Holomer aligns the inference with the diffraction process of holography, resulting in higher-quality holograms. Meanwhile, we propose a metric called perceptive index for evaluating the learning ability of network-based CGH models for diffraction processes. On the DIV2K dataset, Holomer achieved a PSNR of 35.59 dB and an SSIM of 0.93. Optical reconstruction validates that our method surpasses traditional iterative algorithms and those based on convolutional neural networks in terms of reconstruction quality. The effectiveness of full-color reconstruction demonstrates Holomer’s practical utility in the application domain of computer-generated holography.

    [4] P.-A. Blanche. Holography, and the future of 3D display. Light Adv. Manuf., 2, 446-459(2021).

    [8] R. W. Gerchberg. A practical algorithm for the determination of plane from image and diffraction pictures. Optik, 35, 237-246(1972).

    [16] J. Xi, J. Shen, M. T. Chow. Deep-learning assisted polarization holograms. Adv. Opt. Mater., 12, 2202663(2023).

    [23] N. Muramatsu, C. W. Ooi, Y. Itoh. Deepholo: recognizing 3D objects using a binary-weighted computer-generated hologram. SIGGRAPH Asia 2017 Posters, 1-2(2017).

    [28] W. J. Dallas. Computer-generated holograms. Digital Holography and Three-Dimensional Display: Principles and Applications, 1-49(2006).

    [35] O. Ronneberger, P. Fischer, T. Brox. U-net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Proceedings, Part III, 234-241(2015).

    [36] A. Vaswani, N. Shazeer, N. Parmar. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000-6010(2017).

    [38] Z. Liu, Y. Lin, Y. Cao. Swin transformer: hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012-10022(2021).

    [41] J. W. Goodman. Introduction to Fourier Optics(2005).

    [42] E. Agustsson, R. Timofte. NTIRE 2017 challenge on single image super-resolution: dataset and study. Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 1122-1131(2017).

    Tools

    Get Citation

    Copy Citation Text

    Yiran Wei, Yiyun Chen, Mi Zhou, Mu Ku Chen, Shuming Jiao, Qinghua Song, Xiao-Ping Zhang, Zihan Geng, "Speckle-free holography with a diffraction-aware global perceptual model," Photonics Res. 12, 2418 (2024)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Holography, Gratings, and Diffraction

    Received: Mar. 14, 2024

    Accepted: Jul. 21, 2024

    Published Online: Oct. 10, 2024

    The Author Email: Zihan Geng (geng.zihan@sz.tsinghua.edu.cn)

    DOI:10.1364/PRJ.523650

    Topics