ObjectiveImage alignment technology can combine the advantages of infrared and visible light images to provide a strong basis for condition assessment and fault location of low-voltage electrical equipment. Aiming at the alignment of infrared and visible images of low-voltage electrical equipment, due to the existence of a single responsive mapping relationship between the two images and the large difference in their spectral characteristics, the infrared image shows a weak contour, low texture, poor resolution caused by the poor alignment effect, low correctness, etc., a homography estimation model based on the dual backbone network is proposed.
MethodsFirstly, the binocular camera composed of infrared thermal imager and visible light camera captures the different feature information presented in the infrared and visible light images of low-voltage electrical equipment under the operating state using the corresponding backbone network for feature information extraction; Secondly, the self-attention and cross-attention structures consisting of efficiently aggregated linear transformers enhance the expressiveness of the feature descriptors extracted from the network backbone, emphasising the ability of the feature descriptors within the feature map to focus on the global feature information and the descriptors in the infrared feature map to focus on the feature information in the visible light feature map; Then, the full connectivity layer is used to form the homography estimation model to estimate the 8-degrees-of-freedom information of the homography matrix and the 9-degrees-of-freedom information of the homography inverse matrix for evaluation; Then, the supervised learning model is constructed using the results of the homography evaluation matrix and reversibility constraints; Finally, the image matching information is calculated based on the homography evaluation matrix to complete the multi-source image alignment.
Results and Discussion In order to verify the effectiveness of the designed supervised homography estimation model, this paper has conducted comparative experiments on the dual backbone network, the feature aggregation module and the loss function of the evaluation model for verification. Through a large number of experiments, in the process of extracting features from the backbone network, using the global average pooling layer to weight the feature channels cannot better reflect the feature distribution among the feature layers. In this regard, this paper introduces the concept of standard deviation pooling layer based on global average pooling, and uses the combination of global average pooling layer and standard deviation pooling layer to weight the channel information for aggregation. The experimental data show that the number of parameters and the inference time are increased by using the channel aggregation method that combines the global average pooling layer and the standard deviation pooling layer, but there are different degrees of changes in the two indicators of RMSE and CMR, which makes the image alignment effect more accurate, and the correct matching rate is improved by 16.4% and 17.5%. The designed VSNet and IRNet network backbone can effectively extract the feature information of low-voltage electrical equipment in visible light images and infrared images. The self-attention and cross-attention structure composed of efficiently aggregated linear transformers reduces the RMSE after matching by a maximum of 0.697 and a minimum of 0.357, and the CMR increases by a maximum of 2.4% and a minimum of 0.4%, compared to the self-attention and cross-attention structure composed of transformers and linear transformers, and the inference speed decreased by 12 ms. Compared with the loss function constructed without reversibility constraints, the RMSE after matching is reduced by a maximum of 1.107 and a minimum of 0.734, and the CMR of alignment is increased by a maximum of 3.2% and a minimum of 2.3%. The homography estimation model for dual backbone network has a relatively low RMSE of 7.895 after matching and a correct matching rate of 91.8%.
ConclusionsThe supervised dual backbone network homography estimation model designed in this paper can effectively complete the image alignment of infrared and visible light of low-voltage electrical equipment. Through a series of experiments, the following conclusions can be drawn: 1) The dual backbone feature extraction network designed for the image features and information content of infrared and visible light can effectively extract the feature information in heterogeneous images, and compared with the single backbone homography estimation model, the proposed model in this paper has a good matching effect, and the RMSE and CMR are at the best performance value. 2) The introduction of reversibility constraint matrix in the homography estimation model for dual backbone networks improves the accuracy of the assessment model, with a decrease of 0.734 in RMSE and an improvement of 3.2% in CMR. 3) Compared with the eight matching methods of SIFT, SURF, ORB, LightGlue, SuperGlue, SuperPoint, Loftr, and DeephomoGraphy, the designed dual-trunk homography estimation model is able to effectively deal with the characteristic cases of weak contour and low texture in the infrared diagrams of low-voltage electrical equipment, and the RMSE values are maximum 10.557, 1.848, and 8.8% increase in CMR. 4) The homography estimation model developed takes 0.14 seconds to reason about a pair of multi-source images, which meets the requirements of real-time evaluation and matching, and provides favourable support for subsequent fault diagnosis and localisation of low-voltage electrical equipment.