Acta Optica Sinica, Volume. 43, Issue 21, 2120001(2023)

Inverse Reflectance Model Based on Deep Learning

Xi Wang, Zhenxiong Jian, and Mingjun Ren*
Author Affiliations
  • State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • show less

    Objective

    To enhance the capability of photometric stereo to handle the isotropic non-Lambertian reflectance, an inverse reflectance model based on deep learning is proposed to achieve highly accurate surface normal estimation in this paper. Non-Lambertian reflectance is an important factor affecting the performance of optical measurements like fringe projection. To our best knowledge, photometric stereo is only one technology that could solve the effect of non-Lambertian reflectance in theory. Traditional non-Lambertian photometric stereo methods employ robust estimation, parameterized reflectance model, and general reflectance property to handle the non-Lambertian reflectance, which in essence adopts different mathematical technologies to handle the reflectance model. With the introduction of deep learning technology, it is possible to directly establish the inverse reflectance model, and the capability of photometric stereo to handle the non-Lambertian reflectance significantly increases. The represented supervised deep learning methods are CNN-PS and PS-FCN. The CNN-PS directly maps the observation map recording the intensities under different lightings to the surface normal according to the orientation consistency cue. The performance of this network significantly decreases if there are a small number of lights. PS-FCN simulates the normal estimation process of the pixel-wise inverse reflectance model and employs the neighborhood information to give a robust surface normal estimation for the scene with sparse light. The pixel-wise inverse reflectance model could not globally describe the non-Lambertian reflectance, which is supplemented by introducing collocated light recently. However, there still exist theoretical limitations in the collocated light-based inverse reflectance model. Therefore, this paper attempts to complete the theoretical defect of the collocated light-based inverse reflectance model by effectively extracting the image feature related to azimuth difference and designing the deep-learning-based inverse reflectance model.

    Methods

    We first analyze the theoretical limitation of the collocated-light-based inverse reflectance model, then design the three-stage subnetworks of the proposed deep learning-based inverse reflectance model, and train the model by the new training strategies. The theoretical defect mainly comes from the assumption of Eq. (4), or in other words, the main direction α should lie on the plane extended by the l and v. Now, the BRDF input value ?φ is simplified by the value lTv. However, lTv is not identical to the ?φ in most circumstances, and ?φ is highly related to the unknown surface normal. The proposed inverse reflectance model based on deep learning is designed as shown in Fig. 1, which consists of three subnetworks, i.e., the azimuth difference subnetwork, the inverse reflectance model subnetwork, and the surface normal estimation subnetwork. The first-stage subnetwork attempts to map the image o under arbitrary lighting, the collocated image o0, and the lighting map l to the ?φ map, and the max-pooling fused feature is introduced to represent the surface normal. The second-stage subnetwork achieves the ideal inverse reflectance model in an image feature way. The output of this subnetwork could be directly utilized to calculate the surface normal by the least-square algorithm, but the shadow thresholding value directly and dramatically influences the estimation accuracy. Thus, the third-stage subnetwork is designed to avoid error accumulation and achieve accurate surface normal estimation. To train the proposed network, the new supplement training dataset is designed to save the low-reflectance data and provide the SVBRDF scene. The three subnetworks are firstly trained separately to obtain the initial model parameters of every subnetwork and then combined to finetune the parameters.

    Results and Discussions

    In this paper, the ablation experiment is utilized to prove the effectiveness of the network design, and the synthetic experiment and real experiment are adopted to analyze the performance of the proposed method. The PS-FCN, CNN-PS, and the network proposed by Wang et al., denoted by CH20, IK18, and WJ20, are adopted as comparison methods in this paper. As shown in Table 2, the ablation experiment illustrates that the introduction of the max-pooling fusion feature benefits the extraction of the image features related to the ?φ and the shading, and the azimuth difference subnetwork could effectively supplement the defect of the collocated light-based inverse reflectance model to better handle the isotropic reflectance. The synthetic experiment validates that the proposed method could achieve the best performance on the scene with dense lights, sparse lights, and SVBRDF. Figure 5 exhibits the superior performance of the proposed method on the sparse light scene compared with the WJ20, which shows the necessity of breaking the theoretical limitation of the collocated light-based inverse reflectance model. The real experiment based on the benchmark DiLiGenT dataset proves the state-of-the-art performance of the proposed method. Table 6 and Table 7 demonstrate that our method could achieve an average surface normal estimation accuracy of 5.90° for the real scene, and the performance of the proposed method significantly increases under the sparse light scene.

    Conclusions

    We design the inverse reflectance model based on deep learning to handle the isotropic non-Lambertian reflectance, which completes the theoretical defect of the collocated light-based inverse reflectance model by effectively extracting the image feature related to the azimuth difference. The proposed model contains three subnetworks: the azimuth difference subnetwork, the inverse reflectance model subnetwork, and the surface normal estimation subnetwork. The first two subnetworks achieve the inverse mapping between the intensity and the dot product of surface normal and lighting direction, and the third network fully employs the image features extracted by these two subnetworks to accurately estimate the surface normal. The proposed method contains three characteristics, i.e., the introduction of max-pooling fusion feature to extract the feature related to ?φ, inverse reflectance model based on the image feature, and stage training strategy. The ablation experiment proves the rationality of the network design, and the synthetic experiments validate that the proposed method could simultaneously handle classical 100 isotropic reflectances. The real experiments based on benchmark DiLiGenT dataset illustrate that the proposed method could achieve accurate surface normal estimation with 5.90°. The synthetic and real experiments validate the state-of-the-art performance of the proposed method. In future work, we would like to inversely model the challenging anisotropic reflectance and to break the limitation of parallel lighting and orthogonal cameras for photometric stereo.

    Tools

    Get Citation

    Copy Citation Text

    Xi Wang, Zhenxiong Jian, Mingjun Ren. Inverse Reflectance Model Based on Deep Learning[J]. Acta Optica Sinica, 2023, 43(21): 2120001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Optics in Computing

    Received: Mar. 2, 2023

    Accepted: Jun. 13, 2023

    Published Online: Nov. 16, 2023

    The Author Email: Ren Mingjun (renmj@sjtu.edu.cn)

    DOI:10.3788/AOS230615

    Topics