The rapid advancement of computer-generated holography has bridged deep learning with traditional optical principles in recent years. However, a critical challenge in this evolution is the efficient and accurate conversion from the amplitude to phase domain for high-quality phase-only hologram (POH) generation. Existing computational models often struggle to address the inherent complexities of optical phenomena, compromising the conversion process. In this study, we present the cross-domain fusion network (CDFN), an architecture designed to tackle the complexities involved in POH generation. The CDFN employs a multi-stage (MS) mechanism to progressively learn the translation from amplitude to phase domain, complemented by the deep supervision (DS) strategy of middle features to enhance task-relevant feature learning from the initial stages. Additionally, we propose an infinite phase mapper (IPM), a phase-mapping function that circumvents the limitations of conventional activation functions and encapsulates the physical essence of holography. Through simulations, our proposed method successfully reconstructs high-quality 2K color images from the DIV2K dataset, achieving an average PSNR of 31.68 dB and SSIM of 0.944. Furthermore, we realize high-quality color image reconstruction in optical experiments. The experimental results highlight the computational intelligence and optical fidelity achieved by our proposed physics-aware cross-domain fusion.
【AIGC One Sentence Reading】:Cross-domain fusion network enhances POH generation by learning amplitude-to-phase translation, achieving high-quality color image reconstruction.
【AIGC Short Abstract】:Our study introduces a cross-domain fusion network (CDFN) for high-quality phase-only hologram generation. By leveraging deep learning and optical principles, CDFN efficiently converts amplitude to phase, enhancing hologram fidelity. With an infinite phase mapper, it overcomes limitations of traditional methods. Simulations and optical experiments validate its effectiveness, achieving high PSNR and SSIM values.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
1. INTRODUCTION
Holographic technology, with its unique ability to reconstruct images, stands at the forefront of advancing optical displays, offering realism and immersive qualities [1–10]. The quest for dynamic, high-quality computer-generated holography (CGH) has long been driven by the dual research in optical engineering and computational science [11–15]. Despite remarkable progress [16,17], the challenge of efficiently generating phase-only holograms (POHs) remains, primarily due to the intricate nature of modulating light with spatial light modulators (SLMs) to achieve desired visual effects without compromising on quality or computational demand [17].
To date, the typical methods for generating POHs can be divided into iterative and non-iterative ones. Iterative methods, including the Gerchberg–Saxton (GS) algorithm [18–20], non-convex optimization algorithms [21,22], and the stochastic gradient descent method (SGD) [23,24], facilitate accurate hologram generation but fall short in terms of efficiency, hindering real-time applications. In contrast, non-iterative methods such as double phase-amplitude coding (DPAC) and its variants [22,25–28], enable quicker hologram generation in fewer steps, but this comes from a significant quality decline. Therefore, a compromise between fidelity and efficiency is inevitable.
Recent advancements in computational techniques, particularly deep learning, have provided promising solutions to longstanding problems in CGH [29–35]. These approaches, leveraging the vast computational power and flexibility of neural networks, have shown potential in transcending traditional barriers, enabling more control over the hologram generation process. By automating the intricate phase modulation required for POHs, deep learning models have facilitated significant strides towards achieving real-time, high-fidelity holographic displays.
Sign up for Photonics Research TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
During the exploration of the combination of neural networks with CGH, the UNet architecture [36] has emerged as the processing backbone in most cases due to its effective feature-extraction capabilities and encoder–decoder design [29–31,34,35,37,38]. However, adapting UNet to the specific requirements of CGH presents a significant challenge. The fundamental issue stems from the fact that the UNet’s architecture may not be optimally designed for the intricate cross-domain transformations inherent in CGH. These transformations involve converting color images from the amplitude domain to phase-only holograms in the phase domain, a critical process that demands precise handling to ensure fidelity and accuracy in hologram reconstruction [39]. Furthermore, the traditional activation functions, borrowed from 2D image processing, map output values to a fixed interval through nonlinear variations. This oversimplification of phase values overlooks their intrinsic periodic nature, leading to potential inaccuracies in the reconstructed holograms.
In recognition of these challenges, our study introduces an approach that combines the computational power of neural networks with the rigorous requirements of optical physics. The cross-domain fusion network (CDFN) is designed to facilitate the efficient and accurate conversion from the amplitude domain to the phase domain. By dissecting the cross-domain conversion process into multiple stages and incorporating the physics-aware mapping function, CDFN significantly enhances the quality and efficiency of hologram generation. This innovation addresses the technical hurdles inherent in CGH and aligns with the pursuit of optical excellence. Figure 1 presents a visual comparison between conventional approaches and our approach. The results of various experiments demonstrate the capability of our method to reconstruct high-quality 2K color images both in simulation and real-world experiments.
Figure 1.(a) Workflow of conventional methods based on the vanilla UNet architecture. (b) Our proposed cross-domain fusion network highlights the role of our multi-stage conversion architecture and the infinity phase mapper (IPM) in accomplishing the cross-domain transformation task. The multi-stage conversion architecture employs multiple feature maps to facilitate a gradual transformation between two domains. Concurrently, the infinity phase mapper is designed to accommodate the periodic nature of phase values, ensuring the preservation of the physical consistency.
Figure 2 delineates the workflow of our proposed method. The algorithm begins by inputting the amplitude image into the first sub-network. The next step involves combining the amplitude image with the initial phase image output from the first sub-network to represent the complex hologram at the object plane. After this integration, the algorithm utilizes the angular spectrum method (ASM) to simulate the forward propagation to compute the complex hologram at the SLM plane, which simulates the physical process of light propagation, bridging the digital world with the real-world holograms. The complex hologram at the SLM plane is then fed into the second sub-network. This network is trained to translate the complex hologram into the final POH, which can be loaded into the SLM.
Figure 2.Hologram generation workflow of our proposed method and the corresponding network architecture. The feature maps representing the various conversion stages are color-coded diversely. The first sub-network is used to predict the initial phase and the second sub-network is used to predict POHs.
The two sub-networks in our algorithm share a similar structure, which is a derivative of the UNet used in HoloNet [30]. But considering the cross-domain conversion problem in CGH, we break down the cross-domain conversion process into multiple stages by constructing more intermediate feature maps and thus designing a multi-stage network architecture.
The multi-stage architecture is motivated by the need for mapping from the amplitude domain to a different phase domain in POH generation. Traditional UNet architecture directly forwards high-resolution feature maps from the encoder to the decoder. While this approach is efficient in many applications, it is less effective in the CGH context due to the differing nature of amplitude values in the encoder and phase values in the decoder [39].
To tackle this issue, our network consists of more intermediate feature maps representing multiple stages, which serve as a buffer and enable a more gradual conversion. These stages resolve the discrepancies between the encoder inputs (amplitude values) and the final decoder outputs (phase values). As depicted in Fig. 2, the feature maps representing the various conversion stages are color-coded diversely. This design guarantees a smoother and more cohesive conversion from the amplitude domain to the phase domain, minimizing sudden changes and improving the results.
To prove the effectiveness of the multi-stage design, we try to exclude the impact from different numbers of network parameters by keeping a similar number of parameters with HoloNet [30]. Specifically, we downsize the top-down feature maps’ channel numbers from [32, 64, 128, 256] to [16, 32, 64, 128]. As shown in Table 2 presented later, our algorithm’s network has fewer parameters than HoloNet.
C. Infinity Phase Mapper
In CGH, it is necessary for the network to predict phase values when generating POH. To get a phase value in the range of , existing algorithms append a traditional activation function, e.g., a hardTanh function in HoloNet [30], after the final output layer of the network to restrict the value in the required range. However, this approach will introduce inconsistencies in the predicted phase values. Ideally, the phase angle, represented by the network’s raw output value, could span an infinity range . A well-defined mapping function should consider the periodicity of the phase angle and map the value in to a corresponding value in . But most activation functions simply do a truncating and map values beyond to the endpoints and , resulting in a misrepresented mapping of phase values that significantly impacts the results.
To address this problem, we propose a physics-aware mapping function. The primary objective of this function is to ensure a more accurate mapping of output values, carefully mapping each value within the desired range. The specific formulation of this function is detailed below: where the phase mapping function, denoted as , is defined by the input phase value and the output phase value . The function is a modified version of the arctan function that considers all four quadrants when computing the angle from the positive -axis to the point (, ).
This specific phase mapping function, named the infinity phase mapper, translates any phase value in the infinite range into its corresponding value in . This approach avoids the limitations of implementing hard truncation. The graphical representation of the infinity phase mapper is shown in Fig. 3.
Figure 3.Schematic of infinity phase mapper (IPM) , which maps infinite phase values to their corresponding points within the interval.
To improve our method’s performance, we have incorporated a deep supervision strategy into the training process. When training the network, we supervise the loss between the reconstructed image from the POH and the target image. Additionally, we supervise the losses between the reconstructed images from top multi-stage feature maps and the target image. By applying supervision to the intermediate feature maps in the initial stages of the network, the network is encouraged to learn more task-relevant features early in the model. This is critical because features learned in the initial stage form the foundation for subsequent stages. Deep supervision ensures that these foundational features are sensitive to cross-domain tasks, enhancing the overall performance of the network for cross-domain transformations. The loss function employed under this deep supervision paradigm can be formulated as follows: where the target amplitude image is denoted as , and the amplitude image reconstructed at the th level is represented as . is the same as the number of the network depth, which is 4 in the implementation. The weighting coefficient is used to balance the losses at different levels, with indexing the intermediate feature images, which is 0.25, 0.5, 0.75, and 1.0 in the implementation.
Moreover, because of the importance of the final POH, special attention is given during the training. An extra perceptual loss [40] is computed between the reconstructed image from the final POH and the target image. Therefore, the overall loss function is formulated as follows:
This overall loss, taking into account both the deep supervision loss from multi-stage feature maps and the perceptual loss related to the final POH, guarantees the algorithm’s effectiveness in producing high-quality POHs.
3. EXPERIMENT
The experiments are performed using a 24 GB NVIDIA RTX 3090 GPU. The DIV2K dataset [41] is utilized for both training and testing. The training set consists of 800 images and the testing set consists of 100. During the training process, data augmentation techniques, such as image flipping and rotation, are employed to facilitate the learning process, similar to HoloNet [30]. The training is limited to a maximum of 20 epochs and the learning rate is set as 0.001. For the ASM model used in our algorithm, specific parameters are defined as follows: the diffraction distance is set as 0.2 m, the SLM’s pixel pitch is set as , and the wavelengths for the red, green, and blue lights are 638, 520, and 450 nm, respectively.
A. Numerical Simulation
As shown in Fig. 4, we first compare the numerical simulation results of our proposed CDFN with other methods such as non-iterative method DPAC [26], iterative method SGD [16], and learning-based methods HoloNet [30] and CCNN [38] in the DIV2K dataset. For consistency, we first resize RGB images to pixels and then add zero padding to pixels. Figure 4 shows enlarged sections to better highlight the differences between these methods. Our results demonstrate CDFN’s ability to capture fine details and ensure structural integrity, proving its effectiveness and superiority.
Figure 4.Comparison of numerically reconstructed color images. From left to right: results of double phase-amplitude coding (DPAC), stochastic gradient descent method (SGD), HoloNet, CCNN, and our proposed CDFN, respectively (PSNR in dB).
Table 1 presents the quantitative comparison of various methods. Among all methods, SGD achieves the best evaluation results in PSNR, while CDFN excels in SSIM. Among the learning-based methods, under the same training conditions, our proposed CDFN achieves the highest performance, surpassing the current state-of-the-art method CCNN. This demonstrates that our method has a certain competitive advantage over current CGH approaches.
Quantitative Results of Different Methods Tested on the DIV2K Testing Dataset, which Consists of 100 Images (Color Channels)a
DPAC
SGD
HoloNet
CCNN
CDFN (Ours)
PSNR/SSIM
19.97/0.689
32.69/0.942
29.87/0.926
30.72/0.927
31.68/0.944
Evaluation metrics include PSNR (dB) and SSIM.
Table 2 compares the computational efficiency of SGD, HoloNet, and our CDFN method, including metrics like computational time, parameter count, and floating-point operations (Flops). CDFN and HoloNet, as learning-based methods, show a significant speed advantage over the traditional SGD algorithm. Despite CDFN’s use of a greater number of intermediate feature maps—which increases its computational demand—it optimizes performance by reducing convolution channel numbers. Consequently, CDFN boasts fewer parameters (210,000) than HoloNet (287,000), underscoring that its enhanced performance stems from an effective network design rather than mere parameter count.
Efficiency Comparison for Three CGH Methods Tested on the DIV2K Testing Dataset, which Consists of 100 Images with a Resolution of Pixelsa
SGD
HoloNet
CDFN (Ours)
Time (s)
16.851
0.010
0.012
Parameter quantity
–
FLOPs
–
Evaluation metrics include methods’ average inference time, number of parameters, and the floating point of operations (FLOPs). The evaluation is performed using a 24 GB NVIDIA RTX 3090 GPU.
Our CDFN method represents a significant advancement in CGH technology, striking a balance between reconstruction accuracy and computational efficiency. This is evidenced by its performance in PSNR and SSIM evaluations. CDFN distinguishes itself by employing multi-stage feature maps. This strategic approach allows CDFN to outperform the state-of-the-art methods. Consequently, CDFN has established itself as a sophisticated and effective CGH method.
B. Ablation Study
In our ablation study, we test the diverse strategies integral to our proposed method, beginning with an examination of the pure multi-stage (MS) architecture, illustrated in Fig. 5. This foundational aspect of our approach, even in its most basic form, showcases high reconstruction accuracy, outperforming HoloNet.
Figure 5.Comparison of simulated reconstruction images in our ablation study (green channel). From left to right: reconstruction results of HoloNet, multi-stage architecture (MS), multi-stage architecture with infinity phase mapper (MS w IPM), multi-stage architecture with deep supervision (MS w DS), multi-stage architecture with infinity phase mapper and deep supervision (MS w IPM&DS) (PSNR in dB).
Quantitative evidence, as detailed in Table 3, underscores this performance advantage. The MS architecture attained a PSNR of 30.88 dB and an SSIM of 0.932, surpassing the corresponding metrics of HoloNet which are 30.15 and 0.926, respectively.
Ablation Study with Average PSNR (dB) and SSIM Metrics on the DIV2K Testing Dataset, which Consists of 100 Images (Green Channel)
HoloNet
MS
MS w IPM
MS w DS
MS w IPM&DS
PSNR/SSIM
30.15/0.926
30.88/0.932
32.13/0.948
31.65/0.940
32.26/0.948
Our analysis extends to evaluating enhancements within the multi-stage architecture through the incorporation of an infinity phase mapper (MS w IPM) and deep supervision (MS w DS), individually and in combination (MS w IPM&DS). The addition of the infinity phase mapper alone elevates the reconstruction accuracy, achieving a PSNR of 32.13 dB and an SSIM of 0.948. This improvement underscores the crucial contribution of the phase mapping function to the algorithm’s performance.
Further integration of deep supervision into the architecture (MS w DS) marks another leap in reconstruction quality, with PSNR increasing to 31.65 dB and SSIM to 0.940. This indicates that deep supervision can improve the quality of reconstruction.
The peak performance is realized when both the phase mapping function and deep supervision are applied together (MS w IPM&DS), culminating in a PSNR of 32.26 dB and an SSIM of 0.948. These outstanding results not only highlight the individual strengths of each strategy but also the superior reconstruction accuracy achieved through their combined implementation, showcasing a significant synergistic effect.
To further explore the impact of the IPM on reconstruction quality, we analyze the distribution of phase values in POHs generated under different experimental conditions, as shown in Fig. 6. Notably, the use of the hardTanh activation function, as implemented in HoloNet and CDFN w/o IPM, introduces a discontinuity around . This discontinuity arises from hardTanh’s insensitivity to phase periodicity. However, in the CDFN w IPM, this discontinuity is mitigated, leading to enhanced reconstruction quality. The improved metrics with the implementation of IPM, as shown in Table 3, support this finding and reinforce our hypothesis that the IPM enhances the accuracy of POH generation by considering phase periodicity. These results also illustrate the unique properties of holographic data compared to traditional 2D image data, emphasizing the need for specialized consideration.
Figure 6.Normalized phase value distribution of a randomly picked POH. From left to right: results of HoloNet, CDFN without infinity phase mapper (CDFN w/o IPM), CDFN with infinity phase mapper (CDFN w IPM). Note that we shift the value in to in this figure for the easier observation of values around . The arrows highlight that the IPM can generate a continuous phase value distribution around the value, where the conventional active function generates a gap.
Our algorithm’s effectiveness is further validated through optical reconstruction experiments. In the experiments, we load the computed pixels POHs into a SLM and capture images using a Sony A7 Mark III camera. The optical display system setup is illustrated in Fig. 7. The laser source used in this setup is the FISBA READY Beam, which emits light at wavelengths of 638, 520, and 450 nm for the red, green, and blue channels, respectively. After passing through a collimating lens, the beam is split into two paths by a beam splitter. The incident beam is then modulated by a HOLOEYE LETO-3-CFS-127 SLM, which has a resolution of pixels and a pixel pitch of 6.4 μm. The modulated beam is reflected, and an aperture is used to filter out undesired high diffraction orders. The target plane is situated at a distance of 0.2 m from the SLM, and the image plane is determined by the thin lens equation.
Figure 7.Holographic display setup. (a) Schematic diagram of the optical display system setup. (b) Photograph of the optical display system setup.
Figure 8 illustrates the optical reconstruction results obtained with green light. To enhance clarity, magnified patches of the reconstructed images are also presented. In this experiment, we evaluate the performance of our CDFN in comparison to other methods like GS [18], DPAC [26], SGD [23], and HoloNet [30]. These optical experiments consistently match our simulation expectations. DPAC’s reconstructions struggle with accurately depicting fine details, as seen in the parrot’s feathers and the butterfly’s intricacy. The SGD method’s reconstructions are marred by significant speckle noise, obfuscating delicate features. In contrast, our CDFN method overcomes these issues, presenting reconstructions with enhanced clarity.
Figure 8.Optical reconstruction images in the green channel. From left to right: results of Gerchberg–Saxton algorithm (GS), double phase-amplitude coding (DPAC), HoloNet, CCNN, and CDFN, respectively.
For full-color imaging, as shown in Fig. 9, we sequentially loaded the SLM with POHs for each color channel, synchronously activating the corresponding laser color. The detailed results, showcased in Fig. 10, affirm our method’s capability to reconstruct full-color images with high fidelity. The experimental outcomes not only underscore the practical effectiveness of our approach but also its potential for advancing optical reconstruction technologies.
Figure 9.Optical reconstruction images from pixels POHs on a SLM in red, green, blue, and color channels. The images are directly captured by a camera. The color image is obtained by synchronizing the three-color laser source and sequentially loading different POHs. From left to right: results in red, green, blue, and color channels, respectively.
Figure 10.Optical reconstruction images of our method in color channels directly captured by a camera and the corresponding phase-only holograms. (a) Cropped-zoomed patch of the first color image in (b) for visualization. (b) Captured color images. (c) The corresponding phase-only holograms of (b).
To evaluate our method’s generalization ability across different data types, we test the model trained on the color DIV2K dataset using binary USAF images. Figure 11 shows the results from both simulation and optical experiments. The PSNR and SSIM metric results for the three images are as follows: 18.37 dB and 0.604, 17.03 dB and 0.643, and 20.71 dB and 0.781, respectively. While these results confirm the applicability of our method from color to binary images, they also highlight certain challenges. Notably, some speckle noise can be observed in the reconstructions. This occurs because binary images, with their high contrast between black and white regions, introduce significant high-frequency components that are difficult for a network trained on color images to accurately reconstruct. Despite this, our method demonstrates robustness, as identifiable areas are evident in the reconstruction results. These findings have prompted us to explore further refinements to our method to enhance its generalizability across different data types.
Figure 11.Results of our method applied to binary images. (a) Simulated images. (b) Zoom-in patches. (c) Experimental images. Note that target binary images are not in our training set.
In conclusion, our study represents an advancement in the field of computer-generated holography by introducing the cross-domain fusion network (CDFN) as a solution for generating high-quality and efficient phase-only holograms. By addressing the complexities of cross-domain conversion, CDFN stands as a testament to the potential of integrating computational methods with traditional optical principles. The inclusion of the infinite phase mapper, which incorporates an understanding of optical physics, ensures that the generated holograms maintain a high level of fidelity to the original optical phenomena. This work signifies not only technological progress but also a deeper integration between the realms of computation and physical reality, promising a future where holographic displays can achieve unprecedented levels of realism.
The proposed physics-aware CDFN encounters scalability limitations at higher resolutions and requires substantial computational resources, specifically over 24 GB of GPU memory. Additionally, the infinity phase mapper, while innovative, may not completely model all elements of holographic displays. Future efforts will focus on exploring computational models for greater efficiency [42], supporting higher resolutions [43], and refining physical assumptions to broaden the CDFN’s applicability in CGH. This aims to not only improve current metrics but also expand the network’s utility in broader holographic imaging and display challenges. Finally, while this work does not delve into the details of 3D CGH [44], we note that the proposed method could potentially extend to these domains but leave this question for future investigations.
[14] M. Zhou, S. Jiao, P. Chakravarthula. Point spread function-inspired deformable convolutional network for holographic displays. Proc. SPIE, 13104, 131042M(2024).
[16] G. A. Koulieris, K. Akşit, M. Stengel. Near-eye display and tracking technologies for virtual and augmented reality. Computer Graphics Forum, 38, 493-519(2019).
[36] O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 234-241(2015).
[40] J. Johnson, A. Alahi, F.-F. Li. Perceptual losses for real-time style transfer and super-resolution. Proceedings of European Conference of Computer Vision, 694-711(2016).
[41] E. Agustsson, R. Timofte. Ntire 2017 challenge on single image super-resolution: dataset and study. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 126-135(2017).
[42] Z. Dong, J. Jia, Y. Li. Divide-conquer-and-merge: memory- and time-efficient holographic displays. IEEE Conference Virtual Reality and 3D User Interfaces (VR), 493-501(2024).