3D human face reconstruction based on band-limited binary patterns

Xin Fan; Changhe Zhou; Shaoqing Wang; Chao Li; Boquan Yang

doi:10.3788/COL201614.081101

Face recognition technology has great prospects in public security, financial and homeland security, etc.^[1,2]. However, face recognition is confronted with some challenges that prevent its widespread use due to its low recognition accuracy in essential two-dimensional (2D) representations. Traditional recognition methods are mostly based on 2D photographs, which are easily confused because of their different head pose, lighting difference, facial expressions, and other characteristics. In order to overcome these difficulties, more and more attention has been paid to three-dimensional (3D) human face acquisition technique, because a 3D colorful human face model can offer more information, just like human eyes^[3,4]. However, 3D face reconstruction brings new challenges because the human faces are lowly textured and it is hard to obtain an accurate 3D model. Some commercial laser scanners have been employed to directly capture 3D face data, but the high expense of this equipment and the low speed of reconstruction make them difficult to popularize^[5]. Methods based on prior face models are proposed. Kemelmacher and Basri proposed a single-image reconstruction method using a shape-from-shading approach that requires only a single template face as a prior^[6]. This approach can yield good results, but the geometry varies significantly depending on which image and template are used. Several methods based on stereo passive vision were also proposed^[4]. However these methods are sensitive to the lighting conditions.

In this Letter, a non-contact active scanner^[7,8] for a 3D colorful human face is proposed. A color camera is attached to standard binocular cameras to obtain the color texture information of a human face. An optimal temporal correlation technology (OTCT) is also proposed to improve the accuracy of the corresponding points.

Figure 1 shows a schematic view of the scanner. The up and down objects are two fire-wire cameras with 2.0 MP ( $1600 \times 1200$ , $pixel size = 4.5 μm$ ). The focal length of the camera lenses is 16 mm, and the baseline of the two cameras is about 30 cm. A digital light processing (DLP) projector with a focal length of 80 cm is placed between these two cameras to project special patterns onto a human face, and a color camera with 22 MP ( $5760 \times 3840$ , $pixel size = 6.25 μm$ ) is near the DLP projector. A series of binarized band-limited patterns with a resolution of $912 \times 1140$ are projected onto the human face via the DLP projector during the measurement. All the cameras were calibrated with a MATLAB toolbox before the experiments^[9].

Figure 1.Schematic view of the active binocular 3D setup.

Download full size

View all figures

Finding homologous points is a key step for all 3D reconstruction using stereo vision. Traditional matching methods, such as the sum of the squared difference, sum of absolute intensity value differences, and normalized cross correlation, simply perform template matching in the spatial domain. All of these methods are based on the texture features of objects and the results mostly depend on the quality of the images, which are easily affected by the environmental light and camera angle. These methods performed badly when reconstructing a smooth surface. Davis et al. proposed a method named temporal correlation technology (TCT) to extend the length of the correlation windows to pixels in the temporal domain^[10]. For the TCT-based stereo-matching problem, the result is given by $TCT (x, y, d) = \frac{\sum_{t = 1}^{N} [I_{U} (x, y, t) - M_{U}] \cdot [I_{D} (x + d, y, t) - M_{D}]}{S_{U} (x, y, t) \cdot S_{D} (x + d, y, t)},$ (1)where the numerator of Eq. (1) represents the cross correlation between the two temporal intensity values, $I_{U}$ and $I_{D}$ represent the intensities of the binarization images, and $d$ represents the disparity between the up and down images. The search window is only along the horizontal direction because of epipolar rectifying. $M_{i}$ and $S_{i}$ denote the mean intensity value and the standard deviation of the temporal intensity vector in the up and down images, i.e., $M = \frac{1}{N} \sum_{t = 1}^{N} I (u, v, t),$ (2) $S = \sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(I (u, v, t) - M)}^{2}},$ (3)and two pixels can be homologous when the TCT exceeds the threshold and reaches a maximum.

Liu et al.^[11] have proposed a setup to measure the ground surface of an optical element based on the TCT method, and it performs well when measuring low-textured glass just like a human face. However, because the edge pixels of the patterns may be lost when the images are processed into binary, traditional TCT is easily leads to wrong matching. We expand the matching window to the neighborhood of matching pixels to combine the temporal and spatial domains and propose a method named OTCT. The OTCT is given by $OTCT (x, y, d) = \frac{\sum_{t = 1}^{N} \sum_{d x = - m}^{m} \sum_{d y = - n}^{n} [I_{U} (x + d x, y + d y, t) - M_{U}] \cdot [I_{D} (x + d x + d, y + d y, t) - M_{D}]}{S_{U} (x, y, t) \cdot S_{D} (x + d, y, t)},$ (4)where $d x$ and $d y$ represent the neighboring pixels of the matching point in two directions. Obviously, $m$ and $n$ represent the matching window’s size; the matching window is $(2 m + 1) \times (2 n + 1)$ , as shown in Fig. 2.

Figure 2.Homologous points matching window using OTCT.

Download full size

View all figures

During the measurement, a series of $N$ binary band-limited random patterns are projected onto the human face, and two grayscale cameras will capture images synchronously. After pattern sequence, one color image is captured by the color camera, as shown in Fig. 3(d). The rectified images can be obtained using the precalibration results. After extracting the region of interest, self-adapting binarization, and using the triangle principle^[12], the dense 3D point cloud of the human face will be reconstructed. By re-projecting the point cloud back to the color image, the texture information of the human face can be obtained easily.

Figure 3.(a) and (b) One rectified image in experiment and the result of self-adapting binarization. (c) Extraction of the region of interest.

Download full size

View all figures

Table 1 shows the accuracy comparison of OTCT with a matching window of $3 \times 3$ when using different pattern numbers to obtain a human face. Our program is run by a personal computer with a CPU core of a 2.6 GHz clock frequency under Matlab 2012. The amount of accurate points increases as the pattern numbers projected onto the human face increases. Table 2 shows the reconstruction accuracy and time comparison between TCT and OTCT when the pattern number is 20. The computation time increases when using a bigger matching window. The suggested matching window is $3 \times 3$ based on this table, which will be helpful to obtain better results with less calculating time. Figures 4(a) and 4(b) show the comparison between TCT and OTCT. It is obvious our method can get better results when using the same pattern number. Figure 4(b) shows the reconstruction results of the human face with OTCT. The related point cloud consists of more than $2.0 \times 10^{5}$ points and the absolute error of the full field is less than 0.4 mm.

Table 1. Comparison Between Different Pattern Numbers Using OTCT with the Matching Window of 3×3

View table
View all Tables
Table 1. Comparison Between Different Pattern Numbers Using OTCT with the Matching Window of 3×3

Pattern number (N) Accurate points (104)
10 13.7
15 19.2
20 20.0

Table 2. Comparison Between Different Matching Windows with Pattern Number of 20

View table
View all Tables
Table 2. Comparison Between Different Matching Windows with Pattern Number of 20

Matching method Matching window Accurate points (104) Time (s)
TCT 1×1 11.6 401
OTCT 3×3 20.0 1253
OTCT 5×5 20.1 3705

Figure 4.Reconstruction results of human face with (a) TCT and (b) OTCT methods.

Download full size

View all figures

In conclusion, we develop an active binocular 3D setup based on band-limited patterns to obtain a colorful human face. We expand TCT to the spatial domain to improve the accuracy of the corresponding points. The experiments show that our OTCT method performs better when using fewer patterns. And the suggested matching window is $3 \times 3$ , which can obtain better results with a low computational complexity. With the color camera, the texture information of the human face can be obtained easily. The experimental results verify the robustness, easy operation, and the high speed of this method.

Category: Imaging Systems

Received: Apr. 13, 2016

Accepted: May. 24, 2016

Published Online: Aug. 3, 2018

The Author Email: Changhe Zhou (chazhou@mail.shcnc.ac.cn)

DOI:10.3788/COL201614.081101

Table 1. Comparison Between Different Pattern Numbers Using OTCT with the Matching Window of 3×3

Table 1. Comparison Between Different Pattern Numbers Using OTCT with the Matching Window of 3×3

Table 2. Comparison Between Different Matching Windows with Pattern Number of 20

Table 2. Comparison Between Different Matching Windows with Pattern Number of 20