Real-time spatiotemporal division multiplexing electroholography for 1,200,000 object points using multiple-graphics processing unit cluster

Hiromi Sannomiya; Naoki Takada; Kohei Suzuki; Tomoya Sakaguchi; Hirotaka Nakayama; Minoru Oikawa; Yuichiro Mori; Takashi Kakue; Tomoyoshi Shimobaba; Tomoyoshi Ito

doi:10.3788/COL202018.070901

Real-time electroholography based on computer-generated holograms (CGHs) is expected to become the ultimate three-dimensional (3D) television [1, 2] . However, computationally, the CGH calculation rapidly becomes prohibitively expensive because real-time electroholography requires processing extremely high floating-point arithmetic. The image quality of holographic video deteriorates when reconstructed from a point-cloud model comprising a huge number of object points. Two proposals to suppress this deterioration are the time multiplexing for two-dimensional reconstruction [3] and the spatiotemporal division multiplexing for clear 3D holographic video playback [4] . Large-scale electroholography using the spatiotemporal division multiplexing approach [4] implemented on the Horn-8 system has been reported [5] .

A modern graphics processing unit (GPU) is a cost-effective processor capable of high-floating-point arithmetic processing and fast computer-graphics processing. Thus, GPUs accelerate CGH calculations and directly display the calculated CGH on a spatial light modulator (SLM) [6 - 14] . Conversely, spatiotemporal division multiplexing uses moving image features [15] . This approach accelerates CGH calculations several-fold.

A PC cluster consisting of multiple PCs with multiple GPUs is called a multi-GPU cluster and can be used to significantly accelerate large-pixel-count CGH calculations [16 - 20] . Reference [16] directly connected the GPUs of a multi-GPU cluster to multiple SLMs to show that a multi-GPU cluster is suitable for real-time electroholography involving a large-pixel-count CGH. However, such a multi-GPU cluster system with multiple SLMs is very expensive. Real-time electro-holography using a multi-GPU cluster with a single SLM is low cost, but requires the CGH data transfer between the nodes, which prevents real-time electroholography. To address this problem, we used a high-speed InfiniBand network in a multi-GPU cluster system and applied this system to real-time electroholography [21] and fast time-division color electroholography [22] . We also realized real-time color electroholography by using a multi-GPU cluster system with three SLMs combined with an InfiniBand network [23] . Furthermore, we proposed a packing and unpacking method to reduce CGH data transfer between the nodes of the multi-GPU cluster [24] . We demonstrated real-time electroholography by using a multi-GPU cluster with 13 GPUs (NVIDIA GeForce 1080 Ti) connected by a gigabit Ethernet and a single SLM.

In this Letter, we propose clear real-time electro-holography based on spatiotemporal division multiplexing using moving image features and a multi-GPU cluster system connected by a gigabit Ethernet network. The proposed method does not use cache memory.

i - 1

Figure 1.Spatiotemporal division multiplexing approach for suppressing the deterioration of a 3D holographic video reconstructed from a point-cloud model comprising a huge number of object points.

Download full size

View all figures

Figure 2.Spatiotemporal division multiplexing approach using moving image features.

Download full size

View all figures

i

As shown in Fig. 2, the spatiotemporal division multiplexing approach using the moving image features uses only one of the divided objects. Here, a different divided object is selected for every three frames. In each frame, the number of object points contributing to one divided object is one-third of that contributing to the original 3D object. Thus, the subsequent CGH calculation is three times faster than that using the original 3D video. However, long CGH calculations for each frame prevent smooth real-time reconstruction of moving 3D images. As a result, we have never applied the spatiotemporal multiplexing approach using moving image features to a point-cloud model comprising a huge number of object points.

x_{h}, y_{h}, 0

1920 \times 1024

70 mm \times 50 mm \times 50 mm

Figure 3.Reconstructed 3D image from a 3D object “fountain” comprising 1,064,462 object points.

Download full size

View all figures

T

Figure 4.Multi-GPU cluster system with multiple GPUs connected by a gigabit Ethernet network and a single SLM.

Download full size

View all figures

Table 1. Specifications of Each Node in the Multi-GPU Cluster System

View table
View all Tables
Table 1. Specifications of Each Node in the Multi-GPU Cluster System

CPU Intel Core i7 7800X (clock speed: 3.5 GHz)
Main memory DDR4-2666 16 GB
OS Linux (CentOS 7.6 x86_64)
Software NVIDIA CUDA 10.1 SDK, OpenGL, MPICH 3.2
GPU NVIDIA GeForce GTX 1080 Ti

Figure 5.Pipeline processing for the spatiotemporal electroholography system shown in Fig. 2.

Download full size

View all figures

The time required to read the coordinate data of the object points from auxiliary storage becomes non-negligible when the number of the 3D-object points is huge. We investigated the total time required to display twelve-frame sequences because, by using pipeline processing, all GPUs of the CGH calculation nodes generated twelve CGH data in each cycle. In each of the CGH calculation nodes, we used two codes for serial computing [see Fig. 6(a)] and for parallel computing [see Fig. 6(b)]. The object data in the process “read object data,” which means to read object data from the NFS server, are the coordinates of the object points expressed as binary data. Fig. 7 shows the total display time for sets of twelve frames when using the serial computing scheme shown in Fig. 6(a) and when using the parallel computing scheme shown in Fig. 6(b) for 1,200,000 object points. Here, no cache memory was used when reading the coordinate data. Twelve CGHs for twelve frames were calculated by using twelve GPUs on the CGH calculation nodes. In Fig. 7, “SSD” and “HDD” refer to a solid-state drive and a hard disk drive, respectively, on the NFS server to store the coordinates of the object points. We used a Western Digital WD20EZAZ-RT (2 TB) HDD and an Intel Optane 900P (280 GB) SSD. The result shown in Fig. 7 indicates that the serial computing outlined in Fig. 6(a) is substantially affected by HDD access time when the HDD serves as the storage for the NFS server. When using parallel computing [Fig. 6(b)], the time required to read the object-point coordinates is completely hidden within the time required to do each CGH calculation using a GPU from the CGH calculation nodes, regardless of whether the HDD or SSD is used.

Figure 6.Read data processing and CGH calculation on each CGH calculation node in the multi-GPU cluster system shown in Fig. 4. (a) Serial computing. (b) Parallel computing.

Download full size

View all figures

Figure 7.Comparison of the total display time for every 12 frames using serial computing shown in Fig. 6(a) with that using parallel computing shown in Fig. 6(b) when the number of object points is 1,200,000.

Download full size

View all figures

T

Figure 8.Display-time interval $T$ shown in Fig. 5 plotted versus the number of object points when using the spatiotemporal division multiplexing approach using moving image features implemented on the multi-GPU cluster system shown in Fig. 4.

Download full size

View all figures

Figure 9 shows snapshots of the reconstructed 3D video (Video 1) from the original 3D video “fountain” comprising 1,064,462 object points and with six space divisions. Table 2 lists the frame rate of the reconstructed 3D video from the original 3D video “fountain” comprising 1,064,462 object points and for the number of space divisions. We obtained a clear holographic 3D video reconstructed from a 3D object comprising 1,064,462 object points at 32.7 fps with six space divisions.

Figure 9.Snapshot of a reconstructed 3D video (Video 1).

Download full size

View all figures

Table 2. Frame Rate of the Reconstructed 3D Video from the Original 3D Video “Fountain” Comprising 1,064,462 Object Points Against the Number of Space Divisions

View table
View all Tables
Table 2. Frame Rate of the Reconstructed 3D Video from the Original 3D Video “Fountain” Comprising 1,064,462 Object Points Against the Number of Space Divisions

Number of Space Divisions Object Points Frame Rate (fps)
No division 1,064,462 5.43
Two divisions 532,231 10.86
Four divisions 266,116 21.70
Six divisions 177,411 32.70

In conclusion, we implemented the spatiotemporal multiplexing approach using moving image features on a multi-GPU cluster system with 13 GPUs. A performance evaluation indicates that the proposed method can realize a real-time holographic video of a 3D object comprising approximately 1,200,000 object points. We obtained a clear real-time spatiotemporal holographic 3D video of a 3D object comprising 1,064,462 object points. The proposed method facilitates the handling of the clear real-time 3D holographic video, is applicable to various algorithms for the CGH calculation, and thereby significantly contributes to the development of the ultimate holographic 3D television.

Category: Holography

Received: Dec. 30, 2019

Accepted: Apr. 30, 2020

Posted: May. 6, 2020

Published Online: Jun. 15, 2020

The Author Email: Naoki Takada (ntakada@is.kochi-u.ac.jp)

DOI:10.3788/COL202018.070901

Table 1. Specifications of Each Node in the Multi-GPU Cluster System

Table 1. Specifications of Each Node in the Multi-GPU Cluster System

Table 2. Frame Rate of the Reconstructed 3D Video from the Original 3D Video “Fountain” Comprising 1,064,462 Object Points Against the Number of Space Divisions

Table 2. Frame Rate of the Reconstructed 3D Video from the Original 3D Video “Fountain” Comprising 1,064,462 Object Points Against the Number of Space Divisions