Off-axis holographic augmented reality displays with HOE-empowered and camera-calibrated propagation

Xinxing Xia; Daqiang Ma; Xiangyu Meng; Feifan Qu; Huadong Zheng; Yingjie Yu; Yifan Peng

doi:10.1364/PRJ.543925

1. INTRODUCTION

Augmented reality (AR) creates a novel way for human beings to experience the world by integrating virtual scenes with real environments [1]. Currently, AR display methods can be categorized into two primary types [2]. One is the video see-through AR display, which overlays virtual information directly onto real-world scenes captured by cameras, as exemplified by Apple Vision Pro [3]. The other is an optical see-through AR display, which uses optical components to merge real ambient light with virtual content, such as Google Glass [4]. Despite advancements in imaging quality, many AR products still struggle with the problem of vergence-accommodation conflict (VAC), which may cause visual fatigue and discomfort [5,6].

Unlike traditional 3D displays that rely on stereoscopic methods to simulate depth cues [7], holographic displays provide a more immersive and strain-free viewing experience by accurately reproducing the light wavefront using diffraction, thereby addressing the VAC problem by offering human eyes’ natural depth cues for proper focus adjustments [8,9]. Remarkably, computer-generated holography (CGH) has recently become mainstream, simulating the propagation of light beams and generating hologram patterns digitally. The generated holograms are loaded onto a spatial light modulator (SLM) to reproduce the light wavefront and create a 3D scene consistent with the original object [10,11].

Existing CGH algorithms used in holographic near-eye displays are mostly applicable only to the propagation of light between parallel planes, requiring the source and target planes to be aligned in parallel; examples include the angular spectrum method (ASM) [12,13]. However, when the target plane is not parallel with the source plane, these traditional methods lead to a mismatch degrading the imaging quality. The off-axis aberration should be considered in the propagation between nonparallel planes posing significant challenges for the application. For instance, in the practical use of wearable holographic near-eye displays, off-axis projection of holograms using a holographic optical element (HOE) combiner often involves propagation between nonparallel planes [14]. This requirement complicates the accurate generation of holograms for displays, making it crucial to develop methods that effectively address these propagation challenges.

As a lightweight optical combiner, an HOE [15] offers advantages such as high diffraction efficiency, minimal diffraction orders, spectral selectivity, and optical see-through capability, making it suitable for AR products [16]. Li et al. simplified their system considerably by replacing the beam splitter and lens with HOE [17]. Subsequently, Microsoft Research developed a compact, eyeglasses-style display with a wide field of view (FOV) [18]. To enlarge the eyebox, Jang et al. [19] implemented a compact pupil-shifting HOE (PS-HOE), enabling exit-pupil shifting without bulky mechanisms. Similarly, Xia et al. expanded the eyebox of holographic displays using a lenslet array to fabricate HOE [20]. Notably, waveguides are also used to miniaturize AR displays. For instance, metasurface gratings and a dispersion-compensating waveguide were employed to eliminate bulky collimation optics, enabling full-color, 3D AR content in a compact form factor [21].

To calculate off-axis propagation between nonparallel planes, a method tilts the angular spectrum of parallel plane waves by coordinate rotation in the Fourier domain [22,23]. Consequently, a nonuniform fast Fourier transform method was proposed to overcome the sampling limitations of traditional fast Fourier transform on a tilted plane [24]. As a powerful iterative method, the stochastic gradient descent (SGD) algorithm demonstrates superior performance in hologram retrieval when combined with the traditional angular spectrum method (SGD-ASM) [25,26]. However, the SGD-ASM method cannot handle the propagation between nonparallel planes.

This work presents an HOE-empowered, full-color, off-axis holographic AR display design paradigm. Specifically, we propose a novel off-axis-supported hologram generation algorithm incorporating SGD-ASM with spectrum remapping. This algorithm effectively mitigates issues such as distortion and depth mismatches caused by the tilted propagation with the off-axis HOE. To further enhance the display quality, we utilize the camera-in-the-loop (CITL) calibration [25,26] to optimize the holograms with ill-tuned hardware, including negative impacts caused by defects inherent in the HOE fabrication process, nonuniform illumination, and speckles caused by the coherent light source. Our work provides a novel solution to improve the image quality of HOE-based holographic AR displays significantly, and this end-to-end optimization method empowers the application of HOE in wearable near-eye displays.

2. MOTIVATION OF OFF-AXIS CONFIGURATION

Figure 1 illustrates two optical designs for wearable near-eye displays. The traditional coaxial design, shown in Fig. 1(a), primarily consists of an SLM, a beam splitter (BS), and a curved partial mirror. In this configuration, the SLM modulates light, producing an intermediate reconstructed holographic image, which is then magnified by the curved partial mirror and directed to the eye via the BS. The FOV in this coaxial design depends on the focal length of the curved partial mirror. However, the use of a traditional BS limits the FOV and contributes to a bulky form factor. In contrast, the off-axis design features modulated light beams that strike the combiner at an oblique angle before reflecting toward the eye, as depicted in Fig. 1(b). Here, the term “off-axis” indicates the tilted incidence of light on the combiner. As reported in prior work, an HOE can serve as the combiner and integrate multiple optical functions including off-axis projection [18], while offering improved transparency, thus significantly reducing the form factor of holographic AR displays. The basic design consists of an SLM and an HOE. In this off-axis configuration, the light beam modulated by the SLM is reflected by the HOE and then converges to the eye, effectively combining the function of the BS and curved mirror, leading to a more compact design reminiscent of eyeglasses.

Figure 1.Illustration of holographic AR displays with two different types of optical combiners. (a) Traditional coaxial design. Light from the SLM is reflected onto a curved partial mirror by a BS and then reflected toward the human eye. (b) Proposed off-axis design. Light from the SLM hits the HOE in an oblique incidence angle and then gets reflected and converged toward the human eye.

Download full size

View all figures

Further, in the off-axis design shown in Fig. 1(b), the HOE combiner interacts with off-axis incident beams, providing eye relief similar to that of traditional eyeglasses. As a result, the FOV is significantly larger than that of traditional coaxial designs, such as the birdbath design depicted in Fig. 1(a). For example, when the width of the HOE is 50 mm and the eye relief is 30 mm, the horizontal FOV can reach nearly 80°. The off-axis design is crucial for balancing performance and portability requirements. However, it inherently introduces challenges related to off-axis propagation, such as optical aberrations, including distortion, which degrade the quality of the reconstructed image. In practice, it is necessary to select an appropriate off-axis angle to avoid interference with the wearer’s head to minimize aberrations. Additionally, factors such as the quality of the HOE, specifically, its diffraction uniformity, and laser speckle should be carefully considered and optimized during the hologram generation process.

3. OFF-AXIS-SUPPORTED HOLOGRAM GENERATION

To support hologram generation tailored for our off-axis system, we extend the vanilla SGD-ASM by modifying the propagation model and introduce the CITL framework to further improve imaging quality in off-axis configurations.

A. Vanilla SGD-ASM

As shown in Fig. 2(a), suppose that the SLM field is given by $u_{SLM} (x, y)$ , and its spectrum is given by $U_{SLM} (f_{x}, f_{y})$ on the SLM plane; the reference plane field is given by $u_{refer} (x, y; z)$ , and its spectrum is given by $U_{refer} (f_{x}, f_{y})$ on the reference plane, where $f_{x}, f_{y}$ are spatial frequencies on the SLM plane.

Figure 2.Overview of tilt-SGD and tilt-CITL. (a) Hologram generation with the tilted plane setting. The initial phase hologram loaded on the SLM is a random phase to avoid getting stuck in local optimum during iteration. The reference plane is parallel to the SLM and set for propagation between two parallel planes. Two coordinate systems are used in the propagation model: one is the $(x, y, z)$ SLM coordinate, while the other one is the $(x^{'}, y^{'}, z^{'})$ or $(\hat{x}, \hat{y}, \hat{z})$ reference coordinate. Note that the tilted plane can either rotate the reference plane around the $\hat{x}$ axis or the $\hat{y}$ axis. (b) Camera-calibrated hologram optimization. The phase is represented by a green frame and the amplitude is represented by a black frame. The initial hologram loaded on the SLM is also a random phase. Similar to the pipeline of tilt-SGD, the reference plane field can be obtained through the propagation of the SLM field by ASM, and the tilted plane field is obtained by applying the transformation matrix $T$ into the reference plane field. Note that tilt-CITL needs to capture the virtual imagery in a dark environment.

Download full size

View all figures

The complex amplitude distribution on the reference plane $u_{refer} (x, y; z)$ can be obtained by using the ASM as the wave propagation operator $f_{ASM} (u; z)$ : $u_{refer} (x, y; z) = F^{- 1} {F {u_{SLM} (x, y)} H_{z} (f_{x}, f_{y})},$ (1)where $u_{SLM} = A_{SLM} \exp (i φ)$ , which is the complex amplitude distribution on the SLM plane, $A_{SLM}$ is the amplitude on the SLM plane, $φ$ is the phase on the SLM plane, $F {\cdot}$ denotes the Fourier transform, $F^{- 1} {\cdot}$ denotes the inverse Fourier transform, and $H_{z} (f_{x}, f_{y})$ is the transfer function of the propagation distance $z$ .

The amplitude on the reference plane and the amplitude of the target image are used to construct a loss function, such as mean squared error (MSE). The wave propagation and the loss function are implemented in PyTorch, with optimization performed using a gradient descent algorithm, such as SGD, to refine the phase $φ (x, y)$ that minimizes the loss function.

B. Modified Tilt-SGD

We incorporate the Fourier spectrum remapping mentioned above into the SGD-ASM framework for spectrum remapping, enabling phase retrieval for tilted planes. This phase retrieval algorithm, is referred to as tilt-SGD in this work, with its specific process illustrated in Fig. 2(a). The phase hologram is loaded onto the SLM, which is initialized with a random phase, while its amplitude is 1 consistently. The wavefront on the SLM plane then propagates distance $d$ to the reference plane using ASM. Subsequently, a rotation matrix $T (u; θ)$ is applied to the angular spectrum on the reference plane to perform the coordinate transformation from the reference plane to the tilted plane, thus yielding the reconstructed image on the tilted plane. The amplitude of the reconstructed image is extracted and compared with the amplitude of the target image via a loss function $L$ . The gradient descent method is employed as the optimization solver, updating the phase through backpropagation.

The following is a brief derivation of the Fourier spectrum remapping. The complex amplitude $u_{refer} (x, y; z)$ on the reference plane can be decomposed into plane waves using a Fourier transform. After a coordinate transformation, these decomposed plane waves are recombined on the tilted plane. The complex amplitude on the tilted plane is then obtained by applying an inverse Fourier transform. This procedure is called the “coordinate rotation in the Fourier domain” [22] (abbreviated as the Fourier spectrum remapping in this paper).

Following the similar routine in the prior work [22], the transformation matrix $T$ used to rotate coordinates around the $\hat{y}$ axis with the angle of $θ$ is given as $T^{- 1} = [\begin{matrix} \cos θ & 0 & \sin θ \\ 0 & 1 & 0 \\ - \sin θ & 0 & \cos θ \end{matrix}] .$ (2)

The complex amplitude on the reference plane $U_{refer} (f_{x}, f_{y})$ can be expressed as $U_{refer} (f_{x}, f_{y}) = U_{SLM} (f_{x}, f_{y}) \exp (i 2 π d {(λ^{- 2} - f_{x}^{2} - f_{y}^{2})}^{1 / 2}),$ (3)where $λ$ is the wavelength, and $d$ is the distance between the SLM plane and reference plane.

By applying matrix $T$ into Eq. (3), we obtain the angular spectrum on the tilted plane, supposing that the wave field on the tilted plane is given by $u_{tilt} (\hat{x}, \hat{y})$ , and its spectrum is given by $U_{tilt} ({\hat{f}}_{x}, {\hat{f}}_{y})$ , where ${\hat{f}}_{x}, {\hat{f}}_{y}$ are spatial frequencies on the tilted plane, as follows: $U_{tilt} ({\hat{f}}_{x}, {\hat{f}}_{y}) = U_{refer} ({\hat{f}}_{x} \cos θ + \hat{ω} ({\hat{f}}_{x}, {\hat{f}}_{y}) \sin θ, {\hat{f}}_{y}), where \hat{ω} ({\hat{f}}_{x}, {\hat{f}}_{y}) = \sqrt{λ^{- 2} - {\hat{f}}_{x}^{2} - {\hat{f}}_{y}^{2}} .$ (4)

Therefore, the complex amplitude on the tilted plane $u_{tilt} (\hat{x}, \hat{y})$ is given by the inverse Fourier transform: $u_{tilt} (\hat{x}, \hat{y}) = F^{- 1} {U_{refer} ({\hat{f}}_{x} \cos θ + \hat{ω} ({\hat{f}}_{x}, {\hat{f}}_{y}) \sin θ, {\hat{f}}_{y}) \cdot | \cos θ - \frac{{\hat{f}}_{x}}{\hat{ω} ({\hat{f}}_{x}, {\hat{f}}_{y})} \sin θ |},$ (5)where $| \cos θ - \frac{{\hat{f}}_{x}}{\hat{ω} ({\hat{f}}_{x}, {\hat{f}}_{y})} \sin θ |$ is added to conserve the total energy of the field after rotational transformation.

After rotating the Fourier spectrum from the reference plane to the tilted plane, we can obtain the complex amplitude located on the tilted plane used to calculate the loss function $L$ with the amplitude of the target image. Then, phase $φ (x, y)$ on the SLM plane can be retrieved through inversely solving the gradients of $L$ by the gradient descent method. In this work, we use MSE as the loss function.

We employ vanilla SGD, a variant method called “perspective-SGD,” and the proposed tilt-SGD to conduct numerical simulations for reconstructing a data set consisting of 20 test images from the DIV2K data set [27]. Perspective-SGD maps target images from a tilted plane to a reference plane based on the geometric perspective relationship between the two planes. The SLM field is propagated to the reference plane using ASM, and the loss function is constructed based on the amplitude of the reference plane field and the mapped target image. The phase on the SLM plane is then optimized via the SGD. However, unlike tilt-SGD, perspective-SGD only accounts for the geometric transformation of image intensities, without preserving depth cues.

To evaluate the performance of these methods, we use the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as the metrics. The quantitative results are summarized in Table 1 with qualitative and quantitative comparisons presented in Fig. 3. Overall, our findings indicate that tilt-SGD significantly outperforms vanilla SGD and perspective-SGD in terms of reconstruction quality.Table 1.

Quantitative Results Indicating Average PSNR↑ and SSIM↑ Metrics of 20 Test Images in Simulation

Method	SGD	Perspective-SGD	Tilt-SGD
PSNR (dB)	13.27	16.24	36.18
SSIM	0.41	0.40	0.96

Figure 3.Simulation results (PSNR, in dB). For each set (left to right): simulation results with SGD, perspective-SGD, tilt-SGD.

Download full size

View all figures

C. Camera-Calibrated Hologram Optimization

Tilt-SGD assumes an ideal propagation model, neglecting actual hardware factors, such as laser speckle noise, nonuniform illumination, assembly error, and performance degradation caused by the HOE. During the HOE fabrication process, defects such as bubbles, ripples, and uneven diffraction efficiency are often introduced, significantly compromising imaging quality.

To mitigate these hardware-induced issues, we propose a camera-calibrated phase optimization method that incorporates the optimizing strategy of CITL into tilt-SGD to enhance imaging quality. This approach is referred to as “tilt-CITL” in this paper. In this method, the propagation model from the SLM plane to the tilted plane remains consistent with the formulations in Eqs. (1) and (5). As illustrated in Fig. 2(b), tilt-CITL captures the output via a camera, establishes a loss function $L$ between the captured results and the target images, minimizes $L$ using a gradient descent solver, and updates the phase $φ (x, y)$ on the SLM plane through backpropagation. Tilt-CITL effectively accounts for hardware imperfections by directly incorporating them into the captured results, providing a more accurate representation of system deficiencies. This real-time feedback mechanism significantly reduces aberrations and speckle noise, while compensating for various hardware factors, including nonuniform illumination and HOE fabrication defects.

The SLM modulates and diffracts the incident collimated light, projecting the diffracted image onto the HOE plane. Since the HOE plane is not perpendicular to the optical axis, the images captured by the camera inevitably suffer from defocus and distortion. To mitigate this issue, the tilt-CITL process employs a $21 \times 12$ circle matrix for calibration, effectively compensating for the defocus and distortion caused by the off-axis HOE or other hardware factors. Following calibration, the resolution of the corrected region reaches $1600 \times 880$ , resulting in a significantly clearer and more accurate representation.

4. DISPLAY SYSTEM IMPLEMENTATION

To validate the proposed method, we fabricated the HOE and implemented an off-axis holographic AR display prototype.

A. HOE Design and Fabrication

For the off-axis design proposed in Section 2, an off-axis reflective HOE needs to be fabricated. The fabrication process, illustrated in Fig. 4(a), employs a point-to-parallel light exposure method. In this method, a collimated reference beam and a divergent signal beam illuminate opposite sides of the HOE interfering to form fringes on the photopolymer adhered to a glass substrate. The amplitude transmission coefficient $t (x, y)$ of the HOE is linearly related to the intensity distribution of both the signal reference beams after interference. Assuming the complex amplitude of the reference beam is ${\tilde{E}}_{r}$ and that of the signal beam is ${\tilde{E}}_{s}$ , the amplitude transmission coefficient $t (x, y)$ can be expressed as $t \propto {\tilde{E}}_{r} {\tilde{E}}_{s}^{*},$ (6)where ${\tilde{E}}_{r} = A_{r} \exp (i \frac{2 π}{λ} (x \cos α + y \cos β))$ represents the complex amplitude of reference beam on the plane of $z = 0$ , i.e., the HOE plane, $A_{r}$ is the amplitude of reference beam, $\cos α$ and $\cos β$ are direction cosines of the wave vector of the reference beam, ${\tilde{E}}_{s} = \frac{A_{s}}{\sqrt{x^{2} + y^{2} + z_{v}^{2}}} \cdot \exp (i \frac{2 π}{λ} \sqrt{x^{2} + y^{2} + z_{v}^{2}})$ represents the complex amplitude of signal beam on the plane of $z = 0$ , and $A_{s}$ is the amplitude at a unit distance from the viewpoint. Note that only one diffraction order is kept for reconstruction.

Figure 4.Schematic diagram of the proposed HOE (a) fabrication and (b) reconstruction procedure. The $(x, y, z)$ HOE coordinate is established to analyze the fabrication and reconstruction process better. Suppose the coordinates of the viewpoint are $(0,0, z_{v})$ . (c) Fabricated HOE by a 532 nm laser. The HOE consists of a layer of photopolymer and a glass substrate. The glass substrate size is about $50 mm \times 40 mm$ , and the thickness is 1 to 2 mm. (d) The viewpoint was formed by illuminating the HOE with a 532 nm laser.

Download full size

View all figures

The distance between the HOE plane and the viewpoint is set to 30 mm, providing adequate eye relief. The angle between the reference beam and the HOE is set to 45°, ensuring that the HOE functions properly when the probe beam is also at a 45° angle to the HOE during reconstruction. After fabrication, the HOE exhibits reflective and focusing properties, functioning similarly to a combination of a mirror and a lens. The HOE fabrication setup is constructed based on this design shown in Fig. 4(a), with additional details provided in the appendix.

To fabricate a full-color HOE, three lasers with wavelengths of 639, 532, and 457 nm are used for exposure, following the point-to-parallel light exposure method on photopolymers corresponding to each wavelength. Since the arrangement of the three photopolymers impacts the diffraction efficiency of the full-color HOE, we employ a laminated structure with two layers of glass substrates to maintain high diffraction efficiency. One substrate is coated with photopolymer on both sides, accommodating 639 and 532 nm wavelengths, while the other substrate is coated on a single side and operates at 457 nm.

In the reconstruction process, illustrated in Fig. 4(b), a probe beam from the SLM illuminates the HOE. The HOE reflects and focuses the incident probe beam toward the viewpoint, forming the reconstruction beam. Let the complex amplitude of the reconstruction beam be denoted by ${\tilde{E}}_{c}$ . Since the probe beam and the reference beam are a pair of conjugate waves, the complex amplitude of the probe beam can be expressed as ${\tilde{E}}_{r}^{*}$ . Therefore, the complex amplitude of the reconstruction beam ${\tilde{E}}_{c}$ is given by $t (x, y) {\tilde{E}}_{r}^{*}$ , as follows: ${\tilde{E}}_{c} \propto {| {\tilde{E}}_{r} |}^{2} {\tilde{E}}_{s}^{*} = \frac{A_{r}^{2} A_{s}}{\sqrt{x^{2} + y^{2} + z_{v}^{2}}} \exp (- i \frac{2 π}{λ} \sqrt{x^{2} + y^{2} + z_{v}^{2}}) .$ (7)

With the utilized SLM specifications, the horizontal FOV of the fabricated HOE is measured at 63°, enabling a wide and immersive viewing experience. Figure 4(c) shows a monochromatic HOE prepared using a 532 nm laser. In this setup, a collimated beam illuminates the monochromatic HOE and the viewpoint, as illustrated in Fig. 4(d), confirming the HOE’s functionality.

To achieve optimal HOE diffraction efficiency, we conducted simulation and experimental analysis. Diffraction efficiency is a key parameter for evaluating HOE performance, as it represents the intensity distribution of incident light after interacting with the HOE [28]. Higher diffraction efficiency indicates lower energy loss, which is crucial for enhancing HOE-empowered display quality. It is typically defined as the ratio of the intensity of diffracted light at a specific diffraction order to the intensity of the incident light. For the HOE fabricated using a 532 nm laser, the relationship between diffraction efficiency and exposure time is as illustrated in Fig. 5(a). The maximum diffraction efficiency reaches approximately 32% (at the $+ 1st$ or $- 1st$ diffraction order) with an exposure time of 120 s.

$Analysis of HOE diffraction efficiency. (a) Diffraction efficiency distribution according to exposure time. The exposure intensity is fixed at 0.3 mW/cm2. (b) Effects of incident light angle deviation on the measured diffraction efficiency. Please note that the diffraction efficiency is normalized.$

Figure 5.Analysis of HOE diffraction efficiency. (a) Diffraction efficiency distribution according to exposure time. The exposure intensity is fixed at $0.3 mW / {cm}^{2}$ . (b) Effects of incident light angle deviation on the measured diffraction efficiency. Please note that the diffraction efficiency is normalized.

Download full size

View all figures

Additionally, we analyze impact on the diffraction efficiency when the incident angle of the probe beam deviates from 45° during reconstruction. In our system, the reference beam and probe beam are emitted from the same laser source, ensuring they have the same wavelength. Therefore, the primary factor influencing diffraction efficiency is the incident angle deviation of the probe beam. Interestingly, as shown in Fig. 5(b), when the wavelength of the probe beam matches that of the reference beam, HOE can maintain a high diffraction efficiency even with a deviation angle of up to $\pm 2 °$ .

B. Prototype Configuration and Implementation

Figure 6(a) shows the optical schematic of our off-axis projection layout design. Our design consists of a fiber-coupled laser, a collimating lens (CL), a linear polarizer (LP), a BS, a phase-only SLM, a $4 f$ relay system including two lenses (Lens 1 and Lens 2), and an HOE. The $4 f$ relay system is employed to magnify the image from SLM. After SLM modulation, the wavefront on the SLM plane propagates over a certain distance to reach the front focal plane of Lens 1. Subsequently, the $4 f$ system magnifies and relays the wavefront to the rear focal plane of Lens 2, after which it propagates another distance further to reach the reference plane.

Figure 6.System design and setup of the implemented prototype. (a) Optical schematic: CL, collimating lens; LP, linear polarizer; BS, beam splitter; SLM, spatial light modulator; L1, lens 1; L2, lens 2; HOE, holographic optical element. L1 and L2 form a $4 f$ system for magnifying the image. (b) Bench-top holographic AR display prototype. The light emitted from the fiber-coupled laser meets CL, LP, SLM, and $4 f$ system, and constructs an image at the HOE plane through the eyepiece. A physical cube is placed behind the HOE as the reference. (c) Zoom-in details of the eyepiece and the full-color HOE. The distance between the HOE and the eyepiece is 30 mm, which is also the eye relief.

Download full size

View all figures

Our design sets the angle $θ$ between the HOE and the reference plane to 45°, seeking to balance multiple factors. If $θ$ is less than 45°, the observer or camera lens may block the image light from the SLM, causing occlusion. If $θ$ exceeds 45°, the projected image on the HOE becomes more stretched, introducing distortion that is difficult to correct. Setting $θ$ at 45° minimizes occlusion and distortion, ensuring a clearer, more accurate display. Additionally, this angle is chosen with future applications in wearable AR glasses in mind. It meets the ergonomic and optical requirements for devices worn on the face while maintaining display quality and user comfort.

As shown in Fig. 6(b), a benchtop system is built based on the optical schematic. Specifications are detailed in Table 2, the laser is a fiber-coupled laser with wavelengths of 639, 532, and 457 nm, and the utilized SLM is a phase-only LCoS (UPOLABS HDSLM45R) with a resolution of $1920 \times 1080$ and a pixel pitch of 4.5 μm. Compared with the optical schematic, a camera replaces the human eye to capture and analyze the optical output more precisely and consistently. The coherent light from fiber-coupled laser is collimated by the CL and then illuminates the SLM via the reflective path of BS. After SLM modulation, the light is magnified by the $4 f$ system, illuminates the HOE, and is diffracted by HOE, with the resulting image captured by a camera. The LP adjusts the polarization state of the incident beam to meet the requirements of the phase-only SLM. The $4 f$ system, with a $3 \times$ magnification (L1 focal length of 50 mm and L2 focal length of 150 mm), scales the SLM display area ( $8.64 mm \times 4.86 mm$ ) to make full use of the larger HOE effective area ( $40 mm \times 30 mm$ ).Table 2.

Specifications of the Off-Axis HOE Display Prototype

Experimental Devices	Parameters
Fiber-coupled laser	Wavelengths 639, 532, 457 nm
Collimating lens	Focal length 150 mm
Phase-only SLM	UPOLABS HDSLM45R;
	resolution $1920 \times 1080$ ;
	pixel pitch 4.5 μm
Lens 1	Focal length 50 mm
Lens 2	Focal length 150 mm
Eyepiece	Focal length 9.52 mm;
Eyepiece	focusing range 0.3 m–infinity
Industrial camera	FLIR GS3-U3-23S6C-C;
Industrial camera	resolution $1920 \times 1200$

We employ an optical layout that differs from a traditional eyepiece to better simulate the size, position, and FOV of the human eye. Unlike traditional eyepieces, which have internal apertures and often suffer from light occlusion, our eyepiece places the aperture in front of the lens. This design aligns the eyepiece aperture with the pupil size, allowing for a more accurate evaluation of the near-eye display under conditions that closely replicate human vision. The front-aperture design also enables an unobstructed FOV comparable with that of the human eye, ensuring a wide field of view without interference. The relative position of eyepiece and full-color HOE is shown in Fig. 6(c). The eyepiece in our system has a focusing range from 0.3 m to infinity, accommodating various viewing distances.

To validate the effectiveness of tilt-SGD, experiments are carried out for SGD and tilt-SGD. As shown in Fig. 7, the comparison of reconstruction results reveals clear stretching distortion in the SGD results. This distortion occurs because the hologram propagating between two parallel planes projects directly on tilted plane. The tilt-SGD algorithm, designed for phase retrieval between nonparallel planes, successfully addresses this stretching distortion.

Figure 7.Comparison of SGD and tilt-SGD results. (left to right) Captured results and optimized holograms of SGD and tilt-SGD when the target image is located at 10 cm. The main purpose is to illustrate the effect of the proposed algorithm, so these results are not calibrated by the camera, and there will be other distortion issues besides stretch distortion.

Download full size

View all figures

Additionally, the proposed tilt-SGD method helps alleviate the issue of depth mismatch. In the reconstruction results of the grid pattern, the SGD reconstruction appears out of focus on the left and right edges, whereas the tilt-SGD reconstruction remains consistently in focus. However, the tilt-SGD results still exhibit other distortions, which will be further corrected through calibration in tilt-CITL.

5. RESULTS

This section presents the reconstruction results of holograms captured with the off-axis HOE display prototype, which supports a flexible switch between virtual reality (VR) and AR modes.

A. VR-Mode Holographic Display

The comparison between VR results with tilt-SGD and tilt-CITL is illustrated in Fig. 8. Due to the uneven diffraction efficiency of the customized HOE or nonuniform illumination, the reconstruction results of tilt-SGD exhibit brightness nonuniformity, which is particularly evident in the OPTICA example with tilt-SGD (first row of Fig. 8). In contrast, the tilt-CITL results compensate for the brightness nonuniformity issue, leading to more consistent brightness across the entire image. Further, the tilt-CITL noticeably reduces artifacts and speckle noise while preserving image contrast and details. This improvement in image quality is quantitatively confirmed through the PSNR measurements, indicating the advancement of tilt-CITL in enhancing the visual quality of HOE.

Figure 8.Comparison of results using tilt-SGD and tilt-CITL in the VR-mode holographic display. The 2D resolution of captured images is $1600 \times 880$ . For each set (left to right): target images, phase patterns by tilt-SGD, captured results of tilt-SGD, phase patterns by tilt-CITL, and captured results of tilt-CITL, respectively. The target image is set at 10 cm away from the SLM plane. The PSNR metrics are reported. The phase patterns of full-color target image are created by superimposing holograms from three separate color channels. Note that these captured images are normalized for visualization purpose.

Download full size

View all figures

In addition to the single-color VR results, full-color VR results are obtained using a full-color HOE and three lasers with wavelengths of 639, 532, and 457 nm. To obtain comprehensive full-color VR results, the three monochromatic results are combined during postprocessing. However, the introduction of full-color HOE with three photopolymer layers leads to problems such as brightness nonuniformity and speckle noise in each single channel, which are further amplified in the full-color results. Similarly, as shown in Fig. 8, these problems are mitigated by the tilt-CITL method, indicating its importance.

B. AR-Mode Holographic Display

The AR results are captured by adjusting the focus of eyepiece to selected depths: 30 cm (represented by a physical cube) and 150 cm (represented by a head model); further, 30 cm is a common distance at which users interact with objects or manipulate virtual tools at arm’s length. The physical cube serves as an indicator for assessing the alignment and clarity of the AR system when virtual and real-world objects are close. Conversely, virtual signpost and pieces of information are usually projected at a distance of 150 cm. The head model acts as a reference object to evaluate the system’s effectiveness in rendering and maintaining the visibility of virtual content when the distance is increased. The capability of the AR system to handle different depth cues is demonstrated by varying the focus distance, providing insights into its practical applications.

The target images of AR results shown in Fig. 9 are consistent with those in VR mode, demonstrating the versatility of our system. Further, these monochromatic and full-color AR results indicate the adaptability of the system in handling different color schemes under various conditions. The AR system delivers clear and sharp imaging whether the focus is set on distant or close objects with uniform visual clarity and high-quality rendering, suggesting that the framework effectively accommodates a range of focused depths.

Figure 9.Acquired AR results at two focusing distances. (a) Near focus, with the camera focusing at the real object “Physical Cube.” (b) Far focus, with the camera focusing at the real object “Head Model.”

Download full size

View all figures

6. DISCUSSION AND CONCLUSION

The proposed HOE-empowered off-axis holographic AR display demonstrates significant advancements in VR and AR applications through the integration of tilt-CITL calibration. By addressing the limitations of existing hologram retrieval algorithms such as SGD-ASM, the proposed tilt-SGD approach, which applies Fourier spectrum rotation, effectively generates holograms for nonparallel plane propagation.

The experimental results highlight the superior performance of tilt-CITL in addressing brightness nonuniformity, reducing artifacts, and mitigating speckle noise, particularly when compared with the vanilla tilt-SGD method. The quantitative analysis using PSNR metrics confirms the improvements in visual quality, with more consistent brightness and enhanced image contrast and detail retention. Further, the full-color VR and AR results underscore the adaptability of the system in handling the challenges posed by multilayer HOE structures and nonuniform illumination across different color channels. The ability of the AR system to maintain clarity and sharpness at varying focal depths, as demonstrated by near and far target imaging, suggests that the proposed framework offers a robust solution for immersive holographic displays with practical applications across various environments and depth ranges.

Due to the constraints of the Maxwellian-view display mode, the eyebox of our setup is not large enough for comfortable viewing. When the user’s head or eyes move slightly beyond this range, the image becomes blurry, distorted, or even disappears, negatively impacting the viewing experience. Therefore, expanding the eyebox is crucial for wearable applications. This issue has been discussed in previous work [29], which enlarged the eyebox by using a 2D steering mirror to generate an illumination beam with variable angles for the SLM. In future work, we plan to adjust the angle of the SLM illumination light to create multiple viewpoints, further expanding the eyebox. Similarly, we can also create multiple viewpoints to expand the eyebox by using an HOE based on a lenslet array as reported in Ref. [20].

Another challenge associated with the Maxwellian-view display mode is that the image displayed by the HOE remains in focus at all depths [30]. Although this feature helps mitigate the VAC problem, it also indicates that important depth cues for holographic AR displays are missing. Specifically, in the holographic near-eye display system, the spatial bandwidth product (SBP) is determined by the number of pixels and the laser wavelength. When both are fixed, the SBP remains constant. Since it is roughly equal to the product of the FOV and the eyebox, a trade-off exists between these two factors. In our work, the off-axis HOE configuration aims to achieve a larger FOV; however, the eyebox is limited to 0.93 mm at a 63° horizontal FOV. A small eyebox constrains the system’s numerical aperture (NA), resulting in an increased depth of field (DOF), which weakens depth perception capability. To address this issue, we intend to refine the exposure method for the HOE preparation to preserve depth cues. We aim to design and manufacture multifocal HOE to compensate for the lack of depth cues. By incorporating multiple focuses in one HOE, we can provide varying depths to support a 3D viewing experience. Further, our system is restricted to static displays at this stage due to current hardware limitations. In the future, we plan to implement a full-color dynamic display that needs to synchronize the hologram updates on SLM by switching RGB channels. Last but not least, the generation of off-axis holograms still relies on iterative algorithms, which may require considerable time for preparation. For instance, the tilt-SGD hologram optimization takes about 250 s for 500 iterations per color channel, and the tilt-CITL hologram optimization takes about 40 min for 500 iterations per color channel. Tilt-SGD and tilt-CITL hologram optimizations are run on an NVIDIA GeForce RTX 3090 graphics processing unit. Looking ahead, integrating neural networks [25,31] to facilitate real-time off-axis hologram generation can significantly speed up the process.

Category: Holography, Gratings, and Diffraction

Received: Oct. 4, 2024

Accepted: Dec. 28, 2024

Published Online: Feb. 27, 2025

The Author Email: Yifan Peng (evanpeng@hku.hk)

DOI:10.1364/PRJ.543925

CSTR:32188.14.PRJ.543925

微信扫一扫：分享