For better night-vision applications using the low-light-level visible and infrared imaging, a fusion framework for night-vision context enhancement (FNCE) method is proposed. An adaptive brightness stretching method is first proposed for enhancing the visible image. Then, a hybrid multi-scale decomposition with edge-preserving filtering is proposed to decompose the source images. Finally, the fused result is obtained via a combination of the decomposed images in three different rules. Experimental results demonstrate that the FNCE method has better performance on the details (edges), the contrast, the sharpness, and the human visual perception. Therefore, better results for the night-vision context enhancement can be achieved.
The low-light-level visible images always provide the details and background scenery, while the target is often detected/recognized via the infrared imaging[1,2] in night vision. As the visible and infrared image fusion technology can improve the perception of the scene in addition to the ability to detect/recognize the target[3], the fusion technology of the low-light-level visible and infrared images plays a significant role in night vision and has been successfully applied in the areas of defense and security[4].
However, night-vision images usually have relatively strong noise, low contrast, and unclear details (including edges). Moreover, human eyes are very sensitive to the details and noise. As these factors have not been considered in most proposed fusion methods, it is difficult to achieve good results in night vision. Thus, an appropriate fusion technology is required for night vision to obtain better results for the night-vision context enhancement.
Liu et al. proposed a modified method to fuse the visible and infrared images for night vision[5]. In the method, the visible image is enhanced via the corresponding infrared image, and the fused result is obtained using a conventional multi-scale fusion method. The details of the visible image are not fully enhanced. Salient targets in the infrared image are displayed in dark pixels, which is not good for visual perception. A fusion method for low-light visible and infrared images based on contourlet transform is proposed[6]. Different rules are used for the combination of the low-frequency and high-frequency information. The details of the visible image are not fully enhanced either. Zhou et al. proposed a guided-filter-based context enhancement (GFCE) fusion method for night vision[7]. In the result of the GFCE method, the noise has been amplified along with the detail enhancement, and some distortions may emerge in the bright regions due to over enhancement. In all of these methods, neither a denoising method nor a detail enhancing method is used. Furthermore, the details (including edges) cannot be preserved well enough during the fusion process. Thus, further research needs to be done to obtain better context-enhancement results for the low-light-level visible and infrared imaging.
Sign up for Chinese Optics Letters TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
In order to address the above problems for better night-vision applications, a low-light-level visible and infrared images fusion framework for night-vision context enhancement (FNCE) is proposed in this Letter, as shown in Fig. 1. Actually, the FNCE method can be divided into two parts: the initial enhancement and the fusion process. In the initial enhancement, an adaptive brightness stretching method has been first proposed to enhance the visibility of the low-light-level visible image. At the same time, the denoising and detail enhancement methods are applied for source images. As the multi-scale decomposition (MSD) method based on edge-preserving filtering can accurately extract the details at different scales[8]. Furthermore, the gradient domain guided image filtering (GDGF)[9] has better edge performance than the guided image filtering (GF)[10]. Therefore, in the fusion process, a structure of the hybrid MSD with the GF and the GDGF has been proposed to fully decompose the enhanced source images. In addition, the multi-scale weight maps are obtained using a perception-based saliency detection technology at each scale. Finally, the fused result is obtained via the combination of decomposed images with the multi-scale weight maps in three different combination rules according to different scales.
Figure 1.The proposed infrared and visible image FNCE.
The “Queen’s Road,” as shown in Fig. 2(a), is collected from the website http://www.imagefusion.org/. The “Buildings” source images, as shown in Fig. 2(b), are captured by a low-light-sensitive CMOS camera and a mid-wave infrared camera. The two test pairs are the typical scenes of urban surveillance applications. As shown in Fig. 2, the source images are usually displayed with unclear details, as well as some noise. Moreover, the contrast of the low-light-level visible image is always low. Thus, as shown in Fig. 1, some enhancement methods must be applied to the source images before the fusion process.
Figure 2.Test pairs of visible and infrared images captured under low-light-level conditions. (a) “Queen’s Road,” (b) “Buildings.”
Firstly, a denoising method with the edge-preserving GDGF[9] is used for both the source images to reduce the noise. The values of the filtering parameters of the GDGF are , for the low-light-level visible image, and , for the infrared image, respectively.
Following this, an adaptive brightness stretching method is proposed for enhancing the visibility of the low-light-level visible image as follows: where is the input image, is the stretched image, is the mean of the input image, and (, ) and (, ) are the two inflection points of the piecewise linear stretching. As shown in Fig. 3, parts smaller than will be linearly stretched to , parts larger than will be retained, and the rest will be linearly mapped between and . The values of parameters in our work are , , and . The is stretched to three times that of the original. At the same time, should be between 90 and 150, where the mean of the image with normal illumination is. Therefore, values of the low-light-level visible image can be effectively enhanced to an appropriate range without insufficient or over enhancement via the proposed adaptive brightness stretching method.
Finally, the detail enhancement method with the GF[10] is applied to both source images. The values of the filtering parameters of the GF are , , the detail layer is boosted 2.5 times for the low-light-level visible image, and , , boosted 3 times for the infrared image.
The initial enhancement results for the “Queen’s Road” visible image are shown in Fig. 4. Close-up views for the labeled regions are presented. It can be seen that more and clearer details are presented with less noise using our enhancing method. Thus, the proposed enhancing method for the low-light-level visible image is more effective.
Figure 4.Initial enhancement results for the “Queen’s Road” visible image. (a) The original, (b) result with Zhou’s method[7], and (c) result with the proposed method.
In order to make full use of the information at different scales, a hybrid MSD with the GF and the GDGF is proposed to decompose both source images. The structure of the proposed hybrid MSD is designed as shown in Fig. 5. The GDGF is used to obtain the details of the image (including edges). As an adequate amount of low-frequency information is difficult to obtain via the GDGF, the low-frequency information will be obtained by using the strong GF. As shown in Fig. 5, there are three different levels: the small-scale detail level, the large-scale detail level, and the base level in the decomposition. As the finer details for an image are at the first scale of the hybrid MSD, detail images from the first scale are regarded as the images of the small-scale detail level. The detail images from the second to the th scale are regarded as the images of the large-scale detail level. The coarsest-scale information is obtained for the base level and can roughly represent the energy distribution.
Figure 5.Structure of the hybrid MSD with the GF and the GDGF.
In the proposed hybrid MSD structure, texture information and edge information at the th scale are respectively computed as where and are the filtered images at the th scale with the GF and the GDGF, respectively, both and are the input image , and is the number of decomposition scales. and can be obtained as follows: where and are the filtering parameters of the GF at the th scale, , is a decomposition factor between adjacent scales, is set to be very large () to acquire low-frequency information; and are the filtering parameters of the GDGF at the th scale, , and . Furthermore, as shown in Fig. 5, let the filtered image of the GF at the th scale be the base image .
In the fusion process, three different combination rules are respectively designed for the three different levels. The frequency-tuned filtering computes bottom-up saliency[11]. The output of the saliency model strongly correlates with human visual perception[12]. For better weight maps, the frequency-tuned filtering is used to extract the saliency information from the background at each scale. As the targets are always more significant in the infrared images, the weight maps are mainly based on the infrared information, and the infrared characteristic information of the target is maximally highlighted at each scale.
The fused image Fuse is obtained via weighted combinations of the decomposed images as follows: where is the base fused image, and is the detail fused image at the th scale.
For the small-scale detail level, the fused image can be obtained as where and are the infrared and visible images at the first scale, and is the weight maps of the infrared images for the small-scale detail level.
The saliency maps of the infrared and visible images for the small-scale detail levels and (, 1) are obtained using the frequency-tuned filtering for and (,1), respectively. Following this, the binary weight maps of the infrared are computed as
The resulting binary weight maps are noisy and are typically not well aligned with object boundaries. Therefore, spatial consistency is restored through the GDGF with the corresponding detail images () used as guidance images. Finally, the weight maps of the infrared images for the small-scale detail level is obtained as follows: where the values of filtering parameters are and .
For the large-scale detail level, the combination rule is similar to that for the small-scale detail level. The fused image for the large-scale detail level is obtained as where and are the infrared image and visible images at the th scale, and is the weight maps of the infrared images at the corresponding scale.
It should be noted that there is only one weight map for the images at the th scale for the large-scale detail level. It is because the two kinds of detail images, and (), always have similar structures. However, the edge detail image has better edge performance than the texture detail image . Thus, only is used for the large-scale detail level.
The saliency maps of the infrared and visible images at th scale for the large-scale detail levels and () are also obtained using frequency-tuned filtering for the corresponding images and (). Following this, the binary weight maps of the infrared () are computed as
The binary weight maps () are also filtered using the GDGF with the corresponding detail images () as guidance images. Finally, weight maps of the infrared for the large-scale detail level () can be obtained as where , and for the GDGF.
For the base level, the fused image is computed as where and are the base images of the infrared and visible, respectively, and is the weight map of the infrared image for the base level.
The saliency maps of the infrared and visible images for the base levels and are obtained via the frequency-tuned filtering for the corresponding base images. Then, the binary weight map of the infrared is computed as
The binary weight map is smoothed using a Gaussian filter to fit the combination of extremely coarse-scale information. Finally, the weight map for the base level is obtained as follows: where for the Gaussian filtering .
In order to test the proposed FNCE method, three state-of-the-art fusion methods are selected for comparison: the guided filtering fusion (GFF) method[13], the gradient transfer fusion (GTF) method[14], and the GFCE method[7]. All the comparative methods are implemented using the public codes, where the parameters are set according to the corresponding Letters. In the FNCE method, the number of decomposition scales , the decomposition factor , and the initial values of the GF are , and , as well as the initial values of the GDGF of , and .
It can be seen from the fusion results of the test images in Fig. 6 that the results of the FNCE method have clearer details (including edges), more salient targets, better contrast, and less noise than other methods. Close-up views for the labeled regions are presented below the images. The results of the GFF method have little detail information from the visible image and are similar than the infrared image with unclear details, as shown in Fig. 6(a). Moreover, the clouds from the infrared image are nearly lost in the “Buildings” result. For the GTF method, as shown in Fig. 6(b), although the results have the least noise, the details are unclear enough. Moreover, lots of information from the visible is lost, for example, the lights. For the GFCE method, as shown in Fig. 6(c), the bright parts (for example, the labeled building with lights) are obviously over-enhanced, and the noise in the sky is obvious in the “Buildings” result. The results of the GFCE method have obvious noise and not clear enough details (edges). Moreover, some distortions may occur due to the over enhancement in the GFCE method. For the FNCE method, as shown in Fig. 6(d), the road sign shown by the red arrow is clearest without distortions in the “Queen’s Road” result. Therefore, the proposed FNCE method is able to acquire better results for the human visual perception in night vision.
Figure 6.Fusion results of different methods for the test images.
Information entropy (IE), average gradient (AG), gradient-based fusion metric ()[15], the metric based on perceptual saliency (PS)[7], and the fusion metric based on visual information fidelity (VIFF)[16] are selected for the objective assessment. IE evaluates the amount of information contained in an image. AG indicates the degree of sharpness. is recommended for night-vision applications[17] to evaluate the amount of edge information transferred from the source images. PS measures the saliency of perceptual information contained in an image. VIFF evaluates the image quality of the fused image in terms of human visual perception. Table 1 gives the quantitative assessments of different fusion methods on four test image pairs, and the best results are highlighted in bold. The values in Table 1 are averaged values of the four test pairs. It can be seen from Table 1 that IE, AG, PS, and VIFF all achieve the best values in the FNCE method, which means the proposed FNCE method can extract more information, have better sharpness, have more saliency information, and achieve better human visual perception. In addition, the value of the FNCE method is in the second rank, and it means edges can be relatively better preserved via the FNCE method as well.
Table 1. Quantitative Assessments of Different Methods
Table 1. Quantitative Assessments of Different Methods
Method
IE
AG
QG
PS
VIFF
GFF
6.6093
0.0100
0.6132
16.4800
0.4278
GTF
6.3275
0.0061
0.2847
13.7586
0.2205
GFCE
6.8106
0.0152
0.3612
19.5798
0.5648
FNCE
6.9786
0.0187
0.6029
20.6976
0.6580
The average running time of the different methods on source images is shown in Table 2. All of the compared methods are implemented via MATLAB on a computer (Inter i5 3.40 GHz CPU, 4G RAM).
After the experimental comparisons, it can be seen that better human visual perception is achieved with more salient targets, better details (edges) performance, better contrast, better sharpness, and less noise in the FNCE method. Obviously, the proposed FNCE method is more effective, which will help to obtain better context enhancement for the night-vision imaging. Although the FNCE method is slightly time-consuming, it is acceptable, considering the better fused result.
In conclusion, an FNCE method is proposed. First, an adaptive brightness stretching method is proposed to enhance the visibility of the low-light-level visible image. Following this, a structure of the hybrid MSD with the GF and the GDGF is proposed for fully decomposing the enhanced source images. In addition, weight maps are obtained via a perception-based saliency detection technology at each scale.
Experimental results show that better results for night-vision context enhancement can be acquired via the proposed FNCE method. In the future, the idea of the fast GF[18] may be introduced into the simplifications of the FNCE method for practical applications. Moreover, the previous frame video image may be used as the guidance image of the current frame to reduce the delay.
[11] R. Achanta, S. Hemami, F. Estrada, S. Susstrunk. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE 2009), 1597(2009).
Jin Zhu, Weiqi Jin, Li Li, Zhenghao Han, Xia Wang, "Fusion of the low-light-level visible and infrared images for night-vision context enhancement," Chin. Opt. Lett. 16, 013501 (2018)