Structured light imaging is a key technology for acquiring three-dimensional (3D) information because of its advantages, such as nondestructiveness and high efficiency.1
Advanced Photonics, Volume. 6, Issue 4, 046004(2024)
Generative deep-learning-embedded asynchronous structured light for three-dimensional imaging
Three-dimensional (3D) imaging with structured light is crucial in diverse scenarios, ranging from intelligent manufacturing and medicine to entertainment. However, current structured light methods rely on projector–camera synchronization, limiting the use of affordable imaging devices and their consumer applications. In this work, we introduce an asynchronous structured light imaging approach based on generative deep neural networks to relax the synchronization constraint, accomplishing the challenges of fringe pattern aliasing, without relying on any a priori constraint of the projection system. To overcome this need, we propose a generative deep neural network with U-Net-like encoder–decoder architecture to learn the underlying fringe features directly by exploring the intrinsic prior principles in the fringe pattern aliasing. We train within an adversarial learning framework and supervise the network training via a statistics-informed loss function. We demonstrate that by evaluating the performance on fields of intensity, phase, and 3D reconstruction. It is shown that the trained network can separate aliased fringe patterns for producing comparable results with the synchronous one: the absolute error is no greater than 8 μm, and the standard deviation does not exceed 3 μm. Evaluation results on multiple objects and pattern types show it could be generalized for any asynchronous structured light scene.
1 Introduction
Structured light imaging is a key technology for acquiring three-dimensional (3D) information because of its advantages, such as nondestructiveness and high efficiency.1
Figure 1.Diagrams for (a) synchronous FPP and (b) asynchronous FPP systems.
When the synchronization is violated, the projector switches the fringe pattern in an unexpected way within the camera exposure period, leading to fringe pattern aliasing, as shown in Fig. 1(b), followed by reconstruction errors. To this end, several issues relevant to the synchronization of FPP systems should be paid attention to. Among them, the disadvantage of the high system complexity is the most notable one, since the dependence on the synchronization system highly relies on additional hardware or software coding.11 In addition, it is challenging to synchronize the whole system with high accuracy, not least for the structured light array composed by multiple cameras and projectors.12 On top of that, the limitations include its restricted applicability in specific scenarios, notably when the distance between the camera and projector exceeds certain thresholds, thereby hindering the proper functioning of the synchronous signal wire. Additionally, the system shows a susceptibility to electromagnetic interference, undermining its anti-interference capacity. Furthermore, equipment selection is constrained, as the employed camera and projector necessitate supplementary synchronization circuitry to accommodate the synchronous signal, which poses a challenge because many widely used consumer products lack this synchronization functionality.11 In light of advancements in opto-electronic devices and imaging techniques, there has been a significant reduction in both the cost and dimensions of digital cameras. Consequently, imaging devices have become omnipresent in our daily lives. Notably, consumer products, including smartphones, tablets, and laptops, have attained remarkable levels of image resolution and frame rate, leading to a heightened potential demand for handy consumer-grade 3D reconstruction techniques, particularly those employing asynchronous FPP (async-FPP for short).
Almost all existing research employing FPP techniques is built upon the assumption of synchronization, resulting in the synchronization constraints being overlooked in the literature.13
Though async-FPP-based 3D imaging has been a challenge, few works focus on this problem. Fujiyoshi et al. built a stereo vision system with a pair of asynchronous cameras,20 where the two cameras captured images independently, to track the 3D positions of the object based on the Kalman filter and the most recent image in any of the cameras. Hasler et al. utilized multiple unsynchronized moving cameras to capture the motion of articulated objects.21 The static background and the positions of each camera are reconstructed based on structure-from-motion first; and then, the cameras are registered to each other using the background geometry. The audio signal is employed to achieve camera synchronization. Bradley et al.12 proposed two approaches for solving the synchronization problem between camera arrays. The first approach is based on strobe illumination. It performs the exposure in each camera by first starting the cameras and then exciting the strobe lighting, thereby identifying the first synchronized frame. The second is based on optical flow vectors but is less accurate than the former. Moreno et al.11 reported that the object shape can be reconstructed with asynchronous structured light system-based FPP. In the work, the binary fringe patterns were projected and captured independently at a constant speed, and an asynchronous decoding algorithm was introduced to generate a new image sequence equivalent to that obtained by the synchronized system, allowing for reconstructing the 3D model by adopting the existing binary code reconstruction algorithm. El Asmi et al.22 proposed an asynchronous scan method based on structured light by considering the captured image as a new reference pattern aliased from two consecutive projected patterns instead of regarding it as a partial exposure of them; with the local sensitive hashing algorithm, the first of the image sequences can be found to build the matching correspondence by utilizing the quadratic codes. In 2019, the same group proposed a subpixel asynchronous unstructured light algorithm to increase the reconstruction accuracy.23 It is evident that all existing methods address other aspects of asynchrony or use simple binary patterns containing only two intensity values, 0 and 1.
Therefore, to the best of our knowledge, a notable gap in the field of structured light imaging is the lack of studies on async-FPP using widely adopted sinusoidal fringe patterns due to the challenges posed by complex fringe aliasing. This presents a significant practical limitation, as constructing a flexible structured light system for low-cost consumer-grade applications is difficult. Consequently, the development of flexible, accurate, and compatible methods for async-FPP measurement represents an important area for further research and advancement in structured light imaging. To this end, we model the asynchronous fringe imaging formation and thus, propose a novel aliased fringe pattern (AFP) separation network, which is referred to as APSNet for short, based on U-Net-like architecture to build an easy-to-use async-FPP imaging system. The a priori principles embedded in the async-FPP imaging underpin the APSNet to learn the intrinsic inductive bias of AFPs within a generative adversarial framework, which is supervised by a global similarity-informed loss function and guided with fringe pattern information. With the trained network, the synchronization constraint of the FPP is relaxed for allowing the camera and projector to work individually at a constant speed. Thereby, the well-established reconstruction model is still valid for an async-FPP to produce comparable results with the synchronous one. We experimentally show that our method heralds a vision that reconstructs the 3D shapes and geometries from sinusoidal and/or binary fringe patterns without relying on projector and camera synchronization.
2 Materials and Methods
2.1 Asynchronous Fringe Pattern Imaging
In the async-FPP system, the projector is allowed to switch the fringe pattern during the exposure time of camera, leading to AFPs. To describe the AFPs accurately, we first analyze the image formation of the camera component, which converts the light into the intensity responses of image pixels. Assume
Equation (1) formulates the intensity response of the pixel
For the synchronous FPP, one fringe pattern is projected and captured by the camera in the exposure time, as shown in Fig. 2(a), extending Eq. (1) to the whole image to capture a fringe pattern expressed as
Figure 2.Illumination and response relation between the projector and camera in (a) synchronous FPP and (b) async-FPP systems, respectively.
However, for the async-FPP, as shown in Fig. 2(b), the projected fringe pattern can be switched during the exposure time and multiple fringe patterns are captured by the camera, leading to aliasing of the neighboring fringe patterns. To identify the projected patterns from the aliased observations, it is necessary to explore the aliasing formation in asynchronous projection mathematically.
Assume
Equation (5) shows that, in async-FPP, the captured AFP
2.2 Fringe Pattern Separation with Generative Deep Networks
Although the AFP
Consider the two projected adjacent fringe images
Responding to the underlying inductive biases and operations in both global and local geometric invariances, we propose implementing the APSNet as a convolutional encoder–decoder architecture based on the existing U-Net model,27 as shown in Fig. 3. This architecture serves as the backbone of the generator in our GAN framework in Fig. 4. The encoder of
Figure 3.APSNet with U-Net architecture and latent representation for AFP separation.
Figure 4.Schematic description of training APSNet within the conditional GAN framework.
The convolution and pooling operations in the architecture embed the geometric prior implicitly shown in Eq. (5), imposing an intrinsic inductive bias to APSNet. As a result, it presents a powerful generative capacity for any similar classes by encoding the aliased pattern
To train the APSNet
The loss function above frames the fringe pattern separation as an adversarial learning problem with two submodels: the generator
In addition, the principles across the global geometric prior in Eq. (5) also shed light on the statistical property of the fringe pattern aliasing; that is, the intensity distributions of the projected source patterns hold globally in the aliased pattern range. We found that it provides a strong statistical prior for model training: the separated fringe patterns
We, finally, propose a loss function in Eq. (9) by summing the loss terms in Eqs. (6) and (7) together to train the generator
As the loss terms
2.3 3D Imaging Pipeline with the Trained APSNet
With our trained APSNet, we now have access to the wrapped phase information in the generated nonaliasing fringe patterns and then reconstruct the final 3D geometries of the objects being measured. Figure 5 shows the overall pipeline that accepts three AFPs that are observed from an object surface and outputs the 3D shape of the object.
Figure 5.Pipeline of 3D imaging with the trained APSNet.
The pipeline consists of three independent phases. The first one is AFP separation, which is done by the APSNet trained in Sec. 2.2. After this phase, we have four successive non-AFPs with different phases. To this end, we propose to estimate the phase information via the well-established four-step phase-shifting algorithm in Ref. 3. Consider that the four phase-shifting patterns generated by our trained APSNet are
Finally, with the phase estimate
3 Results
3.1 Data Set and Training Details
To train APSNet with high generalization capability, the data set is built with a real FPP system consisting of an industrial camera (S2 Camera TJ1300UM) with a resolution of
Figure 6.Experimental setup of our async-FPP system for generating the data set.
With the async-FPP imaging system, the AFPs are obtained as follows. The projector projects a fringe first with an identical frequency and then sends a high-level signal to the MC. The built-in timer of the MC executes timing according to the predefined delay
Figure 7.(a) and (b) Two successively projected fringe patterns and (c) an observed AFP with 10 ms delay.
For the source patterns to be projected, five sets of
In the training stage, the AdamW optimizer is adopted to update the parameters of APSNet with batching by following the training loop in Fig. 4 for 500 epochs within which the learning rate is controlled by a dynamic decay strategy. The strategy is that, given an initial value of
To find an optimal configuration of the loss weighting hyperparameters in Eq. (9), we investigate the convergence behavior of the loss function for different combinations of
Figure 8.(a)–(d) Convergence behavior of our APSNet model when training with different loss term contributions.
For the case of equal contributions (
3.2 Performance Validation and Results
In this section, we evaluate the performance of our trained APSNet on the tasks of separating sinusoidal and binary AFPs for different objects, such as statues, a car model, and a sphere, which are not seen by the trained network in both training and validation.
First, we investigate the inference capacity of our network from fringe pattern separation, phase evaluation, and shape reconstruction levels on a set of new AFPs, assuming that the aliasing level is variant along the time dimension. Figure 9 shows the pipeline of AFP generation [Figs. 9(a)–9(c)] and separation [Figs. 9(c)–9(e)] via the async-FPP system and the trained APSNet, respectively, with evaluated discrepancy maps between the real and generated patterns attached in Figs. 9(f) and 9(g). By inspecting the inferred pattern pair, we can see a very small difference between the inferred results of APSNet and the corresponding source fringe patterns. To quantitatively evaluate the fringe separation quality, the average errors and associated standard deviations in both discrepancy maps are computed, with values of 1.69 and 2.13 for Fig. 9(f), respectively, and 1.94 and 2.88 for Fig. 9(g), respectively.
Figure 9.Pipeline of AFP generation and separation with APSNet: (a) and (b) the source fringe patterns recorded by synchronized FPP, (c) the corresponding AFP recorded by async-FPP system, (d) and (e) are the separated results of (c), and (f) and (g) show the absolute discrepancy between the separated and source patterns.
Furthermore, adjusting the time delay
Figure 10.Inference performance of APSNet on AFPs with different aliasing levels.
Beyond the validation-in-intensity field, we then check the performance of our model by carrying out phase computation and shape reconstruction. To this end, a set of AFPs is separated by the trained APSNet to generate four successive fringe patterns without aliasing; meanwhile, four step synchronous fringe patterns are also captured correspondingly for comparison. By applying the four-step phase-shifting algorithm, we obtain the wrapped phase maps for both synchronous and asynchronous cases as shown in Figs. 11(a) and 11(b), respectively. One can see that these two phase maps are almost identical from the appearance view. To compare quantitatively, we sample the wrapped phase values along the line
Figure 11.Comparison of wrapped phase maps for synchronous and asynchronous cases.
Followed by the phase domain evaluation, we compare the 3D shape reconstruction of the tested statue for the cases of synchronous, asynchronous, and our APSNet-generated fringe patterns. Results are shown in Fig. 12. In this test, we treat the reconstruction result of the synchronous fringe patterns in Fig. 12(a) as benchmark. It is observed that the 3D shape reconstructed from the fringe patterns separated by APSNet in Fig. 12(c) is remarkably close to the benchmark, while that reconstructed from the AFPs directly in Fig. 12(b) has a significant loss of shape information, which is undoubtedly caused by the fringe pattern aliasing. Moreover, the reconstructions remain quantitatively comparable. For that, we sampled a depth curve from each of the reconstructed models along the dashed lines shown in Fig. 12(a); the results are plotted in Figs. 13(a)–13(c), respectively. Let the reconstruction error be
Figure 12.Reconstructed results from (a) synchronous, (b) asynchronous, and (c) APSNet-generated fringe patterns, respectively.
Figure 13.(a)–(c) Depth curves sampled from
To show the ability of APSNet to be generalized to asynchronous 3D imaging of different objects, we perform a reconstruction task where our trained APSNet is used as a forward model to separate five sets of AFPs captured from objects of three statues, a standard sphere, and a car model. Results are shown in Fig. 14, where the leading row shows the test objects with the AFPs, and APSNet-inferred pattern pairs attached. It is worth noting that in these cases, we project sinusoidal fringe patterns onto two statues (see the first two columns) and binary fringe patterns on the sphere, the car, and the remaining statue (see the third to fifth columns, respectively). By comparing the reconstruction results from the asynchronous (third row) and APSNet-generated (fourth row) fringe patterns with the benchmark (second row), respectively, we found that our APSNet, trained using the proposed method on our data set in Sec. 3.1, successfully extrapolates to cases involving different objects and various fringe pattern types. Since these test objects and (binary) fringe patterns were not seen by the model during training, these reconstruction examples demonstrate the expected generalization capability of our method. We attribute this to the enhancement of image generation ability of the trained generative model, which is achieved through our appropriate architectural choice and supervision using the two statistical prior-informed loss terms in Eqs. (7) and (8).
Figure 14.Reconstruction results for different objects with synchronous, asynchronous, and APSNet-generated sinusoidal and binary fringe patterns, respectively.
Importantly, our approach could work by training on one scenario that with few samples (1500) and successfully predicting various other scenarios, showing the generative APSNet has a potential capacity to deal with the problem of asynchronous structured light imaging. To further demonstrate the generalization capacity trained on a smaller data set, Fig. 15 shows a demonstration of multi-object reconstruction, where the test objects include a propeller prototype, a 3D printed bolt, and a statue. As shown in Fig. 15(a), despite the scene containing complex surface objects with different colors (dark blue, red, and gray) and varying reflective materials, our method can still well infer the source patterns (
Figure 15.Demonstration of generalization capability on multi-object asynchronous 3D imaging: (a) measured objects, with its fringe pattern examples before (
To investigate the reconstruction quality of our method, we compare the reconstruction results of our method with that of the well-established synchronous method by computing the absolute errors of the data points sampled along the dashed line in Fig. 15(b). Note that our comparison here focuses on the propeller and bolt prototypes because they have large curvature surfaces that were not present in the previous validations. The resulting errors can be found in Figs. 16(a) and 16(b), respectively. We can see that the error range for both reconstructed models is consistent with the results shown in Fig. 13(e): the absolute errors for the reconstructed propeller and bolt models are both less than 0.006 mm, with STDs of 0.002 and 0.001 mm, respectively. This demonstration and error investigation further evidence that our method could generalize to more complex scenes. We attribute this to the network’s intrinsic ability to learn the inductive bias corresponding to the underlying global translation equivariance in the aliasing pattern formation through appropriate loss terms, as we explained in Sec. 2.2.
Figure 16.Reconstruction errors of our method for (a) the propeller and (b) the 3D printed bolt models in
4 Discussion and Conclusion
In summary, we proposed a statistical prior-informed APSNet that separates the AFPs captured by an async-FPP structured light system for implementing accurate 3D reconstruction as the synchronous ones do. Our network learns to respond to the geometry prior embedded in the fringe aliasing formation and is trained in a generative adversarial framework by minimizing a loss function supervised by global intensity and structural similarity. As a result, the training is allowed to perform directly on a few experimental data without any constraints of the considered async-FPP imaging system. Since the trained network learns the underlying features of fringe patterns from both geometry and statistics, it could generate correct non-AFPs from unseen AFPs before with enough accuracy for phase and 3D shape reconstruction. We showed that our learning approach can effectively extrapolate to the imaging cases of objects that are different from the one used in training. Especially, we found that our network, even though trained only on few sinusoidal patterns, could generalize to aliased binary fringe patterns that are statistically different from the training examples without any further special tuning. The results of 3D demonstrations show that our method achieves imaging results consistent with the synchronous methods and improves the reconstruction accuracy by 3 orders of magnitude relative to the reconstruction of directly using aliased fringes. The validation results relevant to generalization performance highlight that our approach overcomes the fundamental challenge in async-FPP problem, offering a pathway for exploiting synchronous structured light in a wider range of consumer-grade applications and can be potentially extended to other fields, e.g., 3D arrays imaging.
While our trained APSNet is capable of generating nonaliased source fringe patterns from the aliased images captured by the camera, the use of the proposed method can be questionable if there is a significant difference between the frame rates of the projector and the camera. As shown in Fig. 2, this work primarily addresses the aliasing problem involving two fringe patterns. If the projection rate is significantly higher than the camera’s frame rate, the camera will capture more than two fringe patterns within the exposure time, leading to a more complex multiple-source aliasing problem, which can become a bottleneck for the direct use of the proposed method. Therefore, we assume that our model is somewhat limited in this respect. Despite this limitation, our method provides a baseline that inspires further advancement of generative deep-learning methods to address asynchronous structured light imaging with more complex aliasing problems.
Lei Lu received his BEng degree from Henan University of Science and Technology, China, in 2007; his MEng from Zhengzhou University, China, in 2011; and his PhD from the University of Wollongong, Australia, in 2015. He was a postdoctoral fellow in Singapore University of Technology and Design, Singapore, from 2015 to 2016. Now, he is an associate professor in Henan University of Technology, China. His research interests include 3D shape measurement, 3D printing, and 3D data processing.
Zhilong Su is currently an associate professor at Shanghai Institute of Applied Mathematics and Mechanics, Shanghai University, and the deputy director of the Department of Mechanics, School of Mechanics and Engineering Science, Shanghai University. He received his PhD from the Department of Engineering Mechanics, Southeast University, in 2019. His main research activities are devoted to geometric optical sensing, visual intelligence, generative and geometric deep learning, and photomechanics.
Banglei Guan is currently an associate professor at the National University of Defense Technology. His research interests include photomechanics and videometrics. He has published research papers in top-tier journals and conferences, including IJCV, IEEE TIP, IEEE TCYB, CVPR, ECCV, and ICCV.
Qifeng Yu is currently a professor at the National University of Defense Technology. He is an academician of the Chinese Academy of Sciences. He has authored three books and published over 200 papers. His current research fields are image measurement and vision navigation.
Wei Pan is currently working in OPT Machine Vision Corp. as a research leader in 3D algorithm development. Prior to that, he worked as a research fellow at Shenzhen University and South China University of Technology after he got his PhD from Singapore University of Technology and Design. His research interests include 3D imaging, 3D data processing, computer vision, machine learning, and computer graphics.
Biographies of the other authors are not available.
[5] J. Geng. Structured-light 3D surface imaging: a tutorial. Adv. Opt. Photonics, 3, 128-160(2011).
[16] T. Petković et al. Software synchronization of projector and camera for structured light 3D body scanning, 286-295(2016).
[23] C. E. Asmi, S. Roy. Subpixel unsynchronized unstructured light, 865-875(2019).
[25] A. Van Den Oord, N. Kalchbrenner, K. Kavukcuoglu. Pixel recurrent neural networks, 1747-1756(2016).
[28] S. Ioffe, C. Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift, 448-456(2015).
[29] A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks(2012).
[33] I. Loshchilov, F. Hutter. Decoupled weight decay regularization(2017).
Get Citation
Copy Citation Text
Lei Lu, Chenhao Bu, Zhilong Su, Banglei Guan, Qifeng Yu, Wei Pan, Qinghui Zhang, "Generative deep-learning-embedded asynchronous structured light for three-dimensional imaging," Adv. Photon. 6, 046004 (2024)
Category: Research Articles
Received: Jan. 19, 2024
Accepted: Jul. 22, 2024
Published Online: Aug. 14, 2024
The Author Email: Su Zhilong (szloong@shu.edu.cn)