| Type | Typical methods | Characteristic | Input method | Single channel | [10], [45], [52], [54-55], [79], [82], [83]
| Cascading source images, mining the fusion ability of the network | Multi-channel | [46], [47], [50-51], [53], [56-72]
| Distinguishing the source images, but need to design a fusion strategy | Multi-image multi-channel | [67], [88-89]
| Inputting the source images in proportion, keeping the same category information
of the source images
| Preprocess image | [70-71], [86], [89-90]
| Providing more useful information for fused images | Common block | Attention network | [45], [51], [53], [63], [65], [85], [87]
| Enhancing feature maps from channels and spaces, it can be
embedded in any network
| Nest network | [63-65]
| The network structure is complex, and focusing on the shallow and
middle layers of the network
| Skip connection | [59], [68], [77], [87]
| Based on residual and dense networks,it prevents loss of
useful shallow information
| Loss
Func-tion
| Perceptual loss | [55], [66], [82], [87]
| Balancing feature error between reconstructed image and input | TV loss | [47], [79]
| constraining the fused image to exhibit similar gradient variation
with the visible image
| Edge detail loss | [69], [82], [83-84]
| Enhancing fusion image edge detail | Sematic loss | [72]
| More targeted to different information of the scene |
|