Optimized binary computer holography via convolutional neural network-based differentiable binarization

Jiadi Shi; Shuqing Cao; Xian Ding; Bo Dai; Qi Wang; Songlin Zhuang; Dawei Zhang; Chenliang Chang

doi:10.3788/COL202523.100501

1. Introduction

Computer-generated holography (CGH)^[1–3] is a technology that utilizes computer simulations to generate holograms. It is capable of recording both real and virtual objects and finds wide applications in wavefront shaping^[1,2], holographic projection and display^[4,5], optical fiber communication^[6], microscopy^[7], and optical tweezers^[8]. Holographic 3D display can reproduce depth, parallax, and other 3D information of objects, thereby providing users with a more realistic visual experience. A spatial light modulator (SLM) is the core device in holographic displays, responsible for modulating the phase or amplitude of light to reconstruct a 3D wavefront. Common types of SLMs include liquid crystal-based SLMs and digital micromirror devices (DMDs). Among these, DMDs exhibit significant advantages in terms of high resolution, high speed, and high reliability. However, due to the limited modulation depth of DMDs being confined to binary levels, the quality of the reconstructed images is often deficient. Therefore, research on generating high-quality binary holograms becomes particularly crucial.

The methods for generating binary holograms have evolved continuously with the advancement of computational techniques and optical theories. In the early stages, the direct binarization method^[9–12], which converts grayscale holograms into binary holograms through threshold processing, was widely adopted due to its simplicity and low computational cost. However, this method introduces significant quantization noise, resulting in poor reconstructed image quality^[13,14]. An error diffusion algorithm was proposed^[15,16] to provide more accurate binarization, which can reduce noise and distortion by diffusing quantization errors to neighboring pixels. This method significantly enhances the quality of binary holograms while maintaining moderate computational complexity, making it suitable for early computer processing capabilities. However, its optimization effects on complex holograms are limited^[17,18], and it still introduces a certain level of noise. With technological advancements, the iterative Fourier transform algorithm (IFTA)^[19] was introduced, iteratively optimizing between the frequency and spatial domains to gradually approximate the target hologram. IFTA can generate high-quality binary holograms, particularly suitable for complex scenes and phase modulation, but it suffers from high computational complexity, numerous iterations, and lengthy processing time, imposing higher demands on hardware. Recently, stochastic gradient descent (SGD) based optimization algorithms for binary holograms have been proposed^[20,21]. During the gradient descent process, these algorithms employ differentiable functions to perform hologram binarization, and then obtain the optimized binary hologram by calculating gradients and applying gradient descent algorithms. However, the results processed using the binary-similar function for hologram binarization remain approximate and cannot precisely estimate the true binarization process.

In this paper, we propose an SGD-based binary CGH optimization framework where a convolutional neural network (CNN) is employed to perform the differentiable hologram binarization operation. Compared to conventional binary CGH generation methods, our CNN-based approach provides a more accurate calculation of binary CGH and significantly minimizes the binary quantization noise. Experimental results demonstrate that the binary CGHs generated by our proposed optimization framework achieve superior holographic reconstructions for 2D as well as 3D cases.

2. Methods

A holographic near-eye display configuration based on a DMD is presented in Fig. 1. When illuminated by a plane wave, a binary CGH displayed on the DMD goes through a band-limited diffraction configuration to reproduce the target 2D or 3D images. Specifically, the beam modulated by the DMD propagates to the back focal plane of the Fourier lens through the optical Fourier transform, as described by the following equation: $P (x_{p}, y_{p}) = \iint Binary [A (x_{h}, y_{h})] e^{- j (x_{p} x_{h} + y_{p} y_{h})} d x_{h} d y_{h},$ (1)where $Binary [A (x_{h}, y_{h})] = B (x_{h}, y_{h})$ denotes the binary hologram displayed on the DMD device, which is obtained using a binarization operation from an initial real-valued amplitude type hologram $A (x_{h}, y_{h})$ . $P (x_{p}, y_{p})$ is the propagated wavefront on the back focal plane of the Fourier lens by performing the Fourier transform to the binary hologram. The wavefront $P (x_{p}, y_{p})$ is then filtered using a single-side band (SSB) filter mask to block the zero-order (also known as DC) and conjugate noise^[22]. Subsequently, the reconstructed images can be observed by our eyes in front of the SSB window, which is also equivalent to virtually reconstructing the images through back-propagation of Fresnel diffraction through the following process: $I (x, y) = \frac{e^{j k z}}{j λ z} \iint M (x_{p}, y_{p}) P (x_{p}, y_{p}) e^{j \frac{k}{2 z} [{(x - x_{p})}^{2} + {(y - y_{p})}^{2}]} d x_{p} d y_{p},$ (2)where $I (x, y)$ represents the wavefront on the image plane, $λ$ is the wavelength, $z$ is the distance from the filtering plane to the image plane, and $M (x_{p}, y_{p})$ denotes the SSB mask applied on the filtering plane.

$DMD holographic near-eye display scheme based on double-step band-limit diffraction.$

Figure 1.DMD holographic near-eye display scheme based on double-step band-limit diffraction.

Download full size

View all figures

Figure 2(a) presents the SGD optimization flowchart based on the above band-limited diffraction optics. As an example of displaying a 2D target image, this process aims to find the optimized binary CGH with minimal loss between the holographic reconstructed image and the target image. We first initialize a real-valued amplitude hologram on the DMD plane and then convert it to a binary hologram using the forward binary operator. Then the focal image is numerically reconstructed from the binary hologram via the double-step band-limited diffraction using Eqs. (1) and (2). We further calculate the intensity loss for the reconstructed image. For the 3D case, we calculate the loss for all depth layer images and sum those losses for the total 3D loss. To achieve the optimized binary hologram, we set the problem as finding the optimum amplitude hologram $A (x_{h}, y_{h})$ as well as the optimum parameters of the CNN model resulting in minimum loss after binarization. This problem setting allows for a binary hologram update during gradient-based optimization. The optimization process can be summarized by solving the following optimization problem: $\underset{A (x_{h}, y_{h})}{minimize} \sum_{\forall (x, y)} L [σ \cdot I (x, y), T (x, y)],$ (3)where $T (x, y)$ represents the intensity of the target image, and $σ$ is an energy scale factor. We set the $L$ as the l₂ mean square error (MSE), considering that the MSE is sensitive to the difference of mean value rather than variance, which is appropriate to reduce the background noise of binary holograms. To solve Eq. (3), we utilize the iterative optimization based on SGD designed for updating the amplitude hologram $A (x_{h}, y_{h})$ . The update rule in iteration $k$ with step $α$ is as follows: $A_{k} = A_{k - 1} - α (\frac{\partial L}{\partial A}) .$ (4)

Figure 2.(a) Flowchart of the SGD optimization algorithm for generating binary CGH. (b1), (b2) The binarization network architecture.

Download full size

View all figures

In this paper, $k$ and $α$ are set as the constants, where $k = 100$ and $α$ is assigned a value of 0.1. The optimized binary hologram is acquired through the binarization of the optimized amplitude hologram as $B_{k} (x_{h}, y_{h}) = Binary [A_{k} (x_{h}, y_{h})]$ . Unlike phase-only SGD methods, the backward gradient of the loss cannot be obtained directly because the binary operation by simply setting a threshold value for binary quantization during the forward propagation is non-differentiable, which hinders direct backward gradient computation. To address this related issue, traditional methods employ binary-similar differentiable functions such as hard hyperbolic tangent functions to estimate the binary calculation^[20,21], thereby making the whole forward calculation differentiable. However, such binary-similar functions fail to obtain precise results because a large truncation error occurs between these binary-similar functions and true binarization computations. To this end, we here designed a full CNN to perform a differentiable hologram binarization operation instead of using binary-similar functions. The employment of our CNN model is not only differentiable but also provides more accuracy for hologram binarization.

A diagram of the CNN architecture is shown in Fig. 2(b). The role of this CNN is to perform an ideal approximation of the binarization operation to the input real-valued amplitude hologram, and thereby output an ideal binary hologram whose pixel values are almost close to zero or one. In the network architecture, we employ a fully convolutional operation referred to as “Convlayer”, which consists of two $3 \times 3$ convolutional layers, two batch normalization layers, and two ReLU activation layers. Prior to the “output” layer, the activation function is given by $y = {1 / [1 + \exp (- 20 x + 10)]}$ . During the SGD optimization process, the CNN is jointly trained to find its optimized parameters. After finishing the SGD optimization process, we obtain the final binary hologram from the output of the CNN, which is then displayed on the DMD for precise binary holographic wavefront modulation.

3. Experiment

We first evaluate the quality of binary CGH generated by our CNN-based binary SGD (CNN-B-SGD) optimization method with those obtained using the traditional binary-similar function-based binary SGD (Func-B-SGD) method. For the traditional Func-B-SGD, we employ two types of binary-similar functions. The first function is the clip mapping $H \tanh (x) = \max [- 1, \min (1, x)]$ reported in Ref. [20], and the second is another differentiable mapping function by $curve (x) = [\arctan (x - 1 / 2) / π + 1 / 2]$ reported in Ref. [21]. Meanwhile, to further suppress speckle noise, we employed the temporal multiplexing (TM) method where 20 binary holograms are generated using 20 different random initial amplitudes in the SGD iteration. Then we display them sequentially on DMD using fast binary modulation of 9.52 kHz framerate. Figure 3 shows the simulated results from the generated binary CGHs using different methods. The values of the peak signal-to-noise ratio (PSNR) and root mean square error (RMSE) are given in each image. Figure 3(a) illustrates five test target images. Figure 3(b) shows the simulation results from binary holograms generated using direct binarization to SGD optimization, where the reconstruction quality is low with poor PSNR and RMSE metrics due to direct binary quantization noise. Figures 3(c) and 3(d) show the simulated results from binary holograms generated by Func-B-SGD using clip mapping and curve mapping binary-similar functions, respectively. Figure 3(e) shows the simulated results from binary holograms generated by our CNN-B-SGD method. We can observe that our CNN-B-SGD optimization method can provide better reconstructions with a higher-level PSNR and a lower-level RMSE metric, confirming the ability of the proposed method to produce higher-quality DMD holographic displays.

Figure 3.(a) Target images. (b)–(e) Simulated comparison of different binary CGH methods, including (b) direct binarization to SGD optimized amplitude hologram, (c) Func-B-SGD optimization with clip mapping binary-similar function, (d) Func-B-SGD optimization with curve mapping binary-similar function, and (e) our proposed CNN-B-SGD optimization.

Download full size

View all figures

We also evaluated the performance of the binary holograms generated by our CNN-B-SGD method through holographic near-eye display experiments, with the experimental optical setup shown in Fig. 4(a). In this study, a DMD with a resolution of $1920 \times 1080$ and a pixel pitch of 7.56 µm is utilized to display holograms. A 520 nm laser beam is collimated by means of a collimator lens and subsequently directed onto the DMD through a mirror. The optimized binary CGH is loaded onto the DMD and illuminated by the laser beam at an incident angle of 24°. Following modulation by the DMD, the beam is transmitted through a filter positioned at the back focal plane of a Fourier lens with a focal length of 150 mm. This filter is designed to remove the DC and conjugate noise. Charge-coupled devices (CCDs) coupled with zoom lenses are employed to capture the final image.

Figure 4.(a) Optical setup of DMD holographic near-eye display. (b) Comparison of experimental results of different methods. (c) Experimental results of holographic display using our CNN-B-SGD-based binary CGHs for the 2D image case. (d) Experimental results of holographic display using our CNN-B-SGD-based binary CGHs for a true 3D case when the camera is focusing from 400 to 700 mm. Intensity image and corresponding depth map of the 3D scene are illustrated.

Download full size

View all figures

Figure 4(b) shows the optical experimental results from the generated binary CGHs using different methods. The experimental results from left to right are the experimental results of the binary hologram generated by direct binarization to the SGD optimized amplitude hologram, Func-B-SGD optimization with a clip mapping binary-similar function, Func-B-SGD optimization with a curve mapping binary-similar function, and our CNN-B-SGD method. We can observe that the experimental results reveal high consistency with the simulation results in Fig. 3.

Figure 4(c) shows the experimental results for four test target images, which exhibit consistent performance with those in the simulation results in Fig. 3(e). To further demonstrate the ability of true 3D holographic displays, we calculate the binary CGH by modifying the optimization of Eq. (4) into a 3D loss function. The target 3D volume is rendered into multiple focal stack images^[20], and we calculate amplitude loss for all depth layers and summate those losses for the total 3D loss. Figure 4(d) exhibits an example of the 3D target with intensity images and depth maps as well as the experimentally captured focal images at different distances from the camera. We can observe the in-focus and out-of-focus contents of the reconstructions in each case, correctly experiencing clear and blur changes, confirming that the binary holograms generated by our CNN-B-SGD method can provide high-contrast and speckle-free 3D images with realistic focus cues.

Moreover, Table 1 presents the comparison of the computation time of generating single binary CGHs using different methods. In the computation, we applied 40 iterations in the SGD optimization since the loss had already converged to a small value after 40 iterations. Due to the extra introduction of the CNN model in the binarization operation, our CNN-B-SGD method exhibits marginally longer computation time compared to the other three approaches. The test is performed on the platform of NVIDIA GeForce RTX 4070 GPU for CNN inference and Intel Core i7-14700KF CPU for other non-CNN computations. It should be noted that the runtime for generating binary CGHs using such an iteration strategy should face certain challenges to achieve a potential real-time holographic display requirement. This might be solved by employing a more advanced computation platform^[23].

Table 1. Runtime for Generating Single Binary CGH Using Different Methods

View table
View all Tables
Table 1. Runtime for Generating Single Binary CGH Using Different Methods

Direct binarization Func-B-SGD with clip mapping Func-B-SGD with curve mapping Proposed method
Time (s) 1.97 2.05 2.14 3.60

In addition, we also carry out the experiment demonstration of a full-color holographic near-eye display for our CNN-B-SGD binary CGH testing. In this case, the DMD is synchronized with an RGB fiber-coupled laser beam consisting of three precisely aligned beams operating at 488, 520, and 637 nm. The binary CGHs for red, green, and blue channels are calculated separately using the proposed CNN-B-SGD optimization under different wavelength values and displayed on the DMD sequentially for color mixing. The wavelength dispersion of the DMD is compensated for each color by multiplying the wavelength-dependent tilt phase in each binary CGH optimization^[24]. Figure 5 shows the captured reconstructed images, demonstrating that our method can also successfully deliver full-color contents for DMD holographic near-eye displays.

Figure 5.Experimental results of full-color holographic display using our CNN-B-SGD based binary CGHs.

Download full size

View all figures

4. Conclusion

In summary, we propose an SGD-based binary CGH optimization framework where a CNN is employed to perform the differentiable binarization operation. Our method significantly reduces quantization noise compared to conventional binary CGH generation approaches. While binary holography holds great promise for the future VR/AR 3D display optics, our work boosts potentials toward high-fidelity and low-cost DMD holographic VR/AR displays with superior true 3D viewing experiences.

Category: Diffraction, Gratings, and Holography

Received: Apr. 11, 2025

Accepted: Jun. 10, 2025

Published Online: Sep. 17, 2025

The Author Email: Chenliang Chang (changchenliang@hotmail.com)

DOI:10.3788/COL202523.100501

CSTR:32184.14.COL202523.100501

Table 1. Runtime for Generating Single Binary CGH Using Different Methods

Table 1. Runtime for Generating Single Binary CGH Using Different Methods