Introduction
Visible, infrared, and infrared polarization images individually captured by different sensors present complementary information of the same scene, and they could be combined by the image fusion technology to obtain a new, more accurate, comprehensive, and reliable image description of the scene [1]. The fusion methods are varied with image sources, fusion requirements, or purposes [2-4]. In general, the fusion methods can be classified into pixel-, feature- and decision-level. Compared with the latter two fusion levels, the first fusion level can maintain the source image data as much as possible, so it plays an important role in most image processing tasks. Major pixel-level image fusion methods can be put into four groups according to their adopted theories [5], namely multi-scale decomposition-based methods, the sparse representation-based methods, methods in other domains, and methods combining different transforms. For the multi-scale decomposition-based methods the decomposition schemes and fusion rules are two aspects that affect fusion quality and efficiency.
For the decomposition schemes, various methods are proposed, like the discrete wavelet transform (DWT) [6], dual-tree complex wavelet transform (DTCWT) [7], stationary wavelet transform (SWT) [8], wavelet packet transform (WPT) [9], non-subsampled contourlet transform (NSCT) [10-11], and non-subsampled shearlet transform (NSST) [12]. And many practices proved that NSCT and NSST usually outperform other multi-scale decomposition-based methods in representing 2-D singular signals contained in digital images [13]. But the design of multi-directional filter banks for NSCT and NSST is relatively complex and computational time-consuming, which greatly reduces the efficiency of image fusion.
Fusion rules generally include low and high frequency coefficients fusion rules. The AVG-ABS rule is a simple fusion rule which uses the average rule to combine low-frequency coefficients and uses the absolute maximum rule to combine the high-frequency coefficients. The AVG-ABS fusion rule is computed easily and implemented simply, however, it always causes distortions and artifacts [14-15]. To overcome these shortcomings and improve the fusion quality, a large number of rules have been proposed [15-20]. These rules in Refs.15-20 have achieved satisfactory results, but they have the disadvantage of high computational complexity.
To ensure both the fusion quality and computational efficiency simultaneously, a novel multi-scale decomposition-based fusion method with dual decomposition structures is proposed. Our method is dedicated to improving the image fusion quality and efficiency from the aspect of image decomposition scheme, while for the rule aspect, our method only uses simple AVG-ABS rule. Firstly, inspired by the idea of constructing octaves in SIFT [21] and SURF [22] algorithms, the source images are decomposed into a series of detail and approximation images by multi-scale Gaussian filters to construct the undecimated pyramid structures. The multi-scale Gaussian filters have increasing standard deviation as well as up-scaling size. Secondly, for the approximation images, i.e., the top layers of the undecimated pyramid structures, multi-scale morphology top- and bottom-hat decompositions [23-24] are used to fully extract bright and dark details of different scales on the background, and then the contrast of the fused layer is improved by the absolute maximum rule. Thirdly, the multi-scale morphological inner- and outer-boundary decompositions are especially constructed based the idea of constructing multi-scale top- and bottom-hat decompositions. For each detail image, these two morphology decompositions are implemented to extract the boundary information. And then the decomposed coefficients are combined by the approach of choosing absolute maximum. At last, the fused image is reconstructed through taking the inverse transforms corresponding to the decompositions mentioned.
1 Related theories and work
1.1 The pyramid transforms
The theory and mathematical representation for constructing multiresolution pyramid transform scheme are presented by Ref.25 and extended by Ref.26. A domain of signals is assigned at each level, the analysis operators maps an image to a higher level in the pyramid, while the synthesis operator maps an image to a lower level in the pyramid, i.e. and . The detail signal contains information of x which does not exist in , where and is a subtraction operator mapping into the set . The decomposition process of an input image f is expressed as Eq.1:
where
And the reconstruction process through the backward recursion is expressed as Eq.3:
Eq.1 and Eq.3 are called the pyramid transform and the inverse pyramid transform respectively.
1.2 Scale space representation and multi-scale Gaussian filtering
The scale space of an image can be generated through convolving the image with Gaussian filters, and it has been successfully applied in SIFT [21] to detect key points which are invariant to scales. In Ref.26 the scale space is divided into octaves. For each octave, the initial image is iteratively convolved with Gaussians with increasing standard deviation to generate a set of scale space images (Gaussian images), and one of the Gaussian images is downsampled to obtain the initial image of the next octave. Then, the Difference of Gaussians (DoG) images are obtained by subtracting adjacent Gaussian images. In SURF [22], in order to omit the down-sampling step, the scale space is obtained by increasing the size of filter.
Inspired by the above algorithms, we have the source image repeatedly convolved with Gaussian filters whose standard deviation and size increase simultaneously to construct undecimated pyramid structure. Then, the DoG images are produced by subtracting adjacent Gaussian images. Accordingly, the transform scheme of such pyramid is given by Eq.4:
where
is the Gaussian kernels (filters) with a size of in this paper, and denotes the convolution operation. The parameter is the standard deviation, which is increasing with , and in this paper . Then the source image f can be decomposed into an approximation image and a set of detail images as shown in scheme 1, and it can also be exactly reconstructed through the following recursion:
The four-level decomposition scheme is illustrated in Fig.1.

Figure 1.Example of four-level decomposition by multi-scale Gaussian filtering
1.3 Multi-scale morphological transforms
The multi-scale top-hat transform using structuring elements with up-scaling size can extract the light and dark details at different image scales in image fusion [24]. Based the idea of constructing multi-scale top-hat transform, the multi-scale morphological inner-boundary transform is constructed. These two kinds of morphological transforms can be expressed as Eq.4, with the Gaussian kernel being replaced by morphological opening operation and erosion operation , respectively. For purpose of extracting details of different scales, the scale of structuring element increases with j. The inverse transforms can be expressed as Eq.6.
The multi-scale morphological bottom-hat transform and its inverse are shown as follows
where the analysis operator is morphological closing operation , with also increasing with . The morphological outer-boundary transform and its inverse are similar to the bottom-hat transform and its inverse, with being replaced by dilation operation .
2 Proposed method framework
The proposed fusion method comprises three processes that are multi-scale decomposition, fusion, and reconstruction.
2.1 Multi-scale decomposition process
The K-level decomposition of a given source image by the scheme (4) has the form
where represents the detail image at level and denotes the approximation image of this multi-scale structure.
is a coarse representation of and usually inherit a few bright and dark details, thus the multi-scale top- and bottom-hat decompositions are used to extract bright objects on a dark background and dark objects on a bright background of different scales, respectively. Henceforth, can be decomposed by schemes mentioned in subsection 1.3 as
where and represent the detail images at level l obtained by the top- and bottom-hat decomposition process, respectively. And and denote the approximation images of the multi-scale top- and bottom-hat structure, respectively. Figure 2 is given as an example of three-level top- and bottom-hat decompositions.

Figure 2.
The detail image in scheme 9 comprises various details like edges and lines, thus the multi-scale inner- and outer-boundary transforms mentioned in subsection 1.3 are used to extract inner-boundary as well as outer-boundary information of different scales. Hence, can be decomposed as
where and represent the detail images at level l of that are obtained by the inner- and outer-boundary decomposition process, respectively. and are the approximation images of at the highest level of the multi-scale inner- and outer-boundary structure, respectively. Figure 3 gives an example of three-level inner- and outer-boundary decompositions.

Figure 3.
2.2 Fusion process
In this paper, the composite approximation coefficients of the approximation image in the multi-scale top- and bottom-hat structures take the average of the approximation of the sources. For the composite detail coefficients of the detail images, the absolute maximum selection rule is used.
2.2.1 Fusion rules for the multi-scale top- and bottom-hat structures
The vector coordinate is used here to denote the location of an image. For instance, represents the detail coefficient for the multi-scale top-hat structure at location within level l of source image A. And the notation will be used to denote an image, e. g., refers to the detail image.
The arbitrary fused detail coefficient and the fused approximation coefficient of the multi-scale top-hat structure are obtained through
The weights and take 0.5, which preserves the mean intensity of the two source images. Likewise, and of the multi-scale bottom hat structure are obtained through
with .
The selective rule in Eq.12 means that we choose the brighter ones in the bright details, and the selective rule in Eq.13 means that we choose the darker ones in the dark details. In this way, the bright and dark details of different scales can be fully extracted and hence the contrast at each layer can be improved.
2.2.2 Fusion rules for the multi-scale inner- and outer-boundary structures
For an arbitrary fused detail coefficient of the multi-scale inner-boundary structures, we only use the absolute maximum selection rule:
So is the fused approximation coefficient . In such way, the boundary information such as edges and lines of different scales can be well preserved. Likewise, arbitrary and of the multi-scale outer-boundary structures are also obtained by the absolute maximum selection rule.
2.3 Reconstruction process
According to Eqs.6 and 8, the reconstruction of the approximation image can be obtained through the multi-scale top- and bottom-hat inverse transforms as
which means both bright and dark information are of equal importance to the source image. In addition, we attach equal importance to the features of different scale levels, thus the weights in Eq.15 are set to be .
Similarly, inner- and outer-boundary information are considered to be equally important to the source image, and so are the features of different scale levels. Thus, according to Eqs.6 and 8, the reconstruction of an arbitrary detail image through the multi-scale inner- and outer-boundary inverse transforms can be obtained as
At last, the fused image can be reconstructed by
3 Experiments
3.1 Experimental setups
In order to validate the performance of the proposed method, experiments are conducted on two categories of source images including ten pairs of infrared-visible images (Fig.4(a)) and eight pairs of infrared intensity-polarization images (Fig.4(b)). The two source images in each pair are pre-registered and the size of each image is set to 256×256 pixels. The experiments in this paper are programmed by Matlab 2016b and run on an Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Desktop with 16.0 GB RAM.

Figure 4.The two kinds of source images (a) infrared-visible images, (b) infrared intensity-polarization images
Various pixel-level multi-scale decomposition-based methods including DWT, DTCWT, SWT, WPT, NSCT, and NSST are compared with the proposed method. All the compared methods adopt the simple AVG-ABS rule. According to Ref.13, most of the methods mentioned above perform well when the decomposition levels for them are set to 3. Thus, for purpose of making reliable and persuasive comparisons, the decomposition levels for the methods mentioned above are all set to 3. And to make each method achieve a good performance the other parameters are also suggested by Ref. 13, some of which are listed in Table 1.

Table 1. The parameters set in the compared methods. ‘Filter’ represents the Orientation filter; ‘Levels’ denotes the decomposition levels and the corresponding number of orientations for each level.
Table 1. The parameters set in the compared methods. ‘Filter’ represents the Orientation filter; ‘Levels’ denotes the decomposition levels and the corresponding number of orientations for each level.
Methods | Pyramid filter | Filter | Levels |
---|
DWT | rbio1.3 | | 3 | DTCWT | 5-7 | q-6 | 3 | SWT | bior1.3 | | 3 | WPT | bior1.3 | | 3 | NSCT | maxflat | dmaxflat5 | 4,8,16 | NSST | maxflat | | 4,8,16 |
|
For NSST, the size of the local support of shear filter at each level are selected as 8, 16, 32. As for the proposed method, the parameters and k for the multi-scale Gaussian filtering process in Eq. 5 are selected experimentally. In this experiment, the source images are decomposed by 3-layer multi-scale Gaussian decomposition, and different fused images are obtained by changing the parameters and k. During the fusion process, the AVG-ABS rule is also adopted. When and k are in certain value, every fusion image will be evaluated by seven objective assessment metrics (mentioned in subsection 3.2). For each metric, its mean value is obtained by averaging the evaluation results of the fusion images. Then, seven mean values are summed to get the sum values of objective metrics. Figure 5 gives three surface plots which show variations of the sum of the seven metrics with and k. As shown in Fig.5, the optimal values of and k for the four kinds of images are obtained. The structuring elements in the multi-scale inner- and outer-boundary decompositions are selected as square, and in the multi-scale top- and bottom-hat decompositions they are chosen to be disk. and k in Eq. 5 and the parameters K, M, N1, N2, and N3 in schemes 9, 10, and 11 are set as shown in Table 2 to make the proposed method achieve a good performance.

Figure 5.

Table 2. The parameters of the proposed method for the four kinds of source images.
Table 2. The parameters of the proposed method for the four kinds of source images.
Source images | Parameters |
---|
| k | [K, M, N1, N2, N3] |
---|
Infrared-visible | 0.6 | 1.4 | [3,2,0,1,2] | Infrared intensity-polarization | 0.6 | 1.1 | [3,2,1,1,2] |
|
3.2 Objective assessment metrics
Seven representative metrics, i.e., Q0[27], QE[28], QAB/F[29], information entropy (IE) [30], mutual information (MI) [31], Tamura contrast (TC) [32], and visual information fidelity (VIF) [33] are employed to evaluate the proposed method comprehensively. The variable in TC is chosen to be 4.
3.3 Experimental results
3.3.1 Subjective assessment
In this section, the subjective assessment of the fusion methods is done by comparing the visual results obtained from the above and proposed methods. One sample pair in each type of source images are selected for visual comparison as shown in Figs.6 and 7.

Figure 6.Fusion results of one pair of the infrared-visible images (a) infrared image, (b) visible image, (c)-(i) the fusion results of the DWT, DTCWT, SWT, WPT, NSCT, NSST, and the proposed methods.

Figure 7.Fusion results of one pair of the infrared intensity-polarization images (a) Infrared intensity image, (b) Infrared polarization image, (c)-(i) the fusion results of the DWT, DTCWT, SWT, WPT, NSCT, NSST, and the proposed methods.
In Fig.6, both the DWT and WPT methods distort the edges of the roof, which was shown clearly in magnified squares. The DTCWT, SWT, NSCT and NSST methods produce artificial edges in the sky around the roof, while the result obtained by the proposed method is free from such artifacts or brightness distortions. In addition, the walls and the clouds in the sky in Fig.6(i) are brighter those in Fig.6(g) and (h), which means that the fused image of the proposed method has better contrast.
The edges of the car are distorted heavily in Fig.7 (f), and slightly distorted in Figs.7(c-e) which is shown more clearly in the corresponding regions in magnified square. And Figs.7(c-h) show some artifacts around the edges of the car. However, in Figs.7(i) there are no distortions or certain artifacts. In addition, the car in magnified square of Fig.7(i) is the darker than those in Figs.7(h) and (i), which demonstrate that the proposed method has better contrast.
The above experiments confirm that the proposed method performs better in visual effect for the two categories of source images. Although adopting the simple AVG-ABS rule, the proposed method does not generate certain artifacts or distortions and simultaneously preserves the detail information of source images as much as possible.
3.3.2 Objective assessment
The objective assessment of the seven multi-scale decomposition-based methods are shown in Tables 3. For the infrared-visible images, the proposed method performs the best on all the seven metrics. For the infrared intensity-polarization images, the proposed method performs the best on the other five metrics except Q0 and QE on which it performs the second best. It can also be obtained from Tables 3 that compared with the seven methods, the proposed method always has the best assessment on metrics QAB/F, IE, MI, TC, and VIF. It means that the proposed method can transfer the original information of source image including the edges and brightness details to the fused image sufficiently, and improve the contrast of the fused image.

Table 3. Objective assessment of all methods (the best result of each metric is highlighted in bold).
Table 3. Objective assessment of all methods (the best result of each metric is highlighted in bold).
Images | Methods | Q0 | QAB/F | QE | IE | MI | TC | VIF |
---|
Infrared-visible | DWT | 0.439 1 | 0.485 8 | 0.226 8 | 6.660 1 | 2.165 8 | 0.258 8 | 0.293 6 | DTCWT | 0.444 6 | 0.517 3 | 0.257 9 | 6.683 0 | 2.223 5 | 0.293 7 | 0.294 9 | SWT | 0.445 2 | 0.509 7 | 0.245 7 | 6.615 5 | 2.187 2 | 0.220 3 | 0.278 4 | WPT | 0.407 9 | 0.395 2 | 0.161 4 | 6.638 5 | 2.194 9 | 0.274 5 | 0.273 8 | NSCT | 0.466 9 | 0.528 1 | 0.259 5 | 6.696 1 | 2.263 3 | 0.294 0 | 0.314 5 | NSST | 0.465 3 | 0.523 1 | 0.257 0 | 6.685 8 | 2.257 5 | 0.290 2 | 0.310 3 | Proposed | 0.475 7 | 0.535 6 | 0.268 9 | 6.735 9 | 2.470 7 | 0.317 7 | 0.362 6 | Infrared intensity-polarization | DWT | 0.385 3 | 0.420 6 | 0.167 6 | 6.478 2 | 2.266 4 | 0.347 6 | 0.219 6 | DTCWT | 0.394 4 | 0.458 5 | 0.208 9 | 6.570 7 | 2.341 5 | 0.468 4 | 0.243 7 | SWT | 0.387 5 | 0.439 1 | 0.193 1 | 6.473 0 | 2.342 9 | 0.330 8 | 0.230 0 | WPT | 0.346 9 | 0.343 9 | 0.119 8 | 6.405 2 | 2.291 7 | 0.443 7 | 0.197 2 | NSCT | 0.413 3 | 0.467 5 | 0.197 7 | 6.564 6 | 2.391 7 | 0.458 5 | 0.257 4 | NSST | 0.413 8 | 0.464 1 | 0.199 5 | 6.574 0 | 2.389 8 | 0.459 7 | 0.259 2 | Proposed | 0.413 4 | 0.469 0 | 0.201 3 | 6.658 0 | 2.624 1 | 0.547 8 | 0.313 7 |
|
3.3.3 Comparison of computational efficiency
To verify the efficiency of the proposed method, an experiment is conducted on the image sequences named as “Nato_camp”, “Tree”, and “Dune” from the TNO Image Fusion Dataset [34]. Table 4 shows the average processing time of all methods for a frame. Compared with the DWT, DTCWT, SWT, and WPT methods, the proposed method is more time-consuming because these four methods contain one types of multi-scale decomposition while the proposed method contains two, i.e., the multi-scale decomposition using multi-scale Gaussian filtering and the multi-scale morphological decomposition, as mentioned in Sec.2. Compared with the NSCT and NSST methods which also contain two kinds of multi-scale decomposition, the proposed method is far more efficient mainly because the design of the multi-directional filter banks for NSCT and NSST is relatively complex and the processing speed of multi-directional filtering is much lower than that of multi-scale morphological operations.

Table 4. Average processing time (unit: sec.) comparison of eight methods. Each value represents the average run time of a frame in a certain sequence.
Table 4. Average processing time (unit: sec.) comparison of eight methods. Each value represents the average run time of a frame in a certain sequence.
Image sequences | DWT | DTCWT | SWT | WPT | NSCT | NSST | Proposed |
---|
Nato_camp | 0.018 0 | 0.036 2 | 0.064 7 | 0.140 1 | 24.517 3 | 2.307 2 | 0.141 9 | Tree | 0.016 5 | 0.035 7 | 0.064 3 | 0.139 8 | 24.821 5 | 2.292 3 | 0.141 1 | Duine | 0.017 1 | 0.036 1 | 0.064 1 | 0.140 6 | 24.584 1 | 2.288 1 | 0.141 2 |
|
4 Conclusions
Experiments on both visual quality and objective assessment demonstrate that although adopting the simple AVG-ABS rule, the proposed method does not generate certain artifacts or distortions and performs very well in aspects like information preservation and contrast improvement. Under the premise of ensuring image fusion quality, the proposed method is also proved computationally efficient. The proposed method provides an option for the fusion situations needing both high quality and particularly computational efficiency, such as fast high-resolution images fusion and video fusion.