Super-wide-field-of-view long-wave infrared gaze polarization imaging embedded in a multi-strategy detail feature extraction and fusion network

Dongdong Shi; Jinhang Zhang; Jun Zou; Fuyu Huang; Limin Liu; Li Li; Yudan Chen; Bing Zhou; Gang Li

doi:10.1364/PRJ.559833

Photonics Research, Volume. 13, Issue 7, 1902(2025)

Super-wide-field-of-view long-wave infrared gaze polarization imaging embedded in a multi-strategy detail feature extraction and fusion network

Dongdong Shi^1、†, Jinhang Zhang^1、†, Jun Zou¹, Fuyu Huang^1,2、*, Limin Liu^1,3、*, Li Li¹, Yudan Chen¹, Bing Zhou¹, and Gang Li¹

Author Affiliations

¹Shijiazhuang Campus, Army Engineering University of PLA, Shijiazhuang 050000, China

²e-mail: hfyoptics@163.com

³e-mail: lk0256@163.com

show less

Figures & Tables(11)

Fig. 1. Concept of SWFOV LWIR gaze polarization imaging. (a) Equipment overview. It integrates a vanadium oxide uncooled IR focal plane detector with SWFOV LWIR gaze polarization lens designed independently. The IR movement focal plane resolution is 1280×1024, the element size is 17 μm, and the frame rate is up to 30 Hz. (b) The polarization lens is coupled with a rotatable holographic line-grid IR polarization device to achieve 150°×120° large airspace acquisition capability. (c) The polarization lens is designed based on the non-similar imaging principle and adopts isometric projection to optimize SWFOV imaging performance. (d) Stokes vectors are obtained by parsing the polarization components of the radiation information to reveal the target polarization properties. (e) Utilizing fusion image technology to improve image quality and enhance the value of fusion images in subsequent applications.

Download full size

View in Article

$Performance evaluation of SWFOV LWIR gazing polarizer heads. (a) Blur spot diagram: demonstrating diffuse spot characteristics by testing blur spots at four FOV angles with reference to the main light. (b) Calculating the percentage of total enclosed energy at the four FOV angles and comparing it with the diffraction limit curve (which represents the zero distortion response). (c) Using the FFT method, diffraction MTF is calculated for the four FOV angles, with the spatial frequency range set as [0,29] lp/mm (Nyquist frequency is 29 lp/mm). (d) The relative illuminance is calculated by integrating the effective area of the exit pupil observed at the image point (performed in cosine space). The effective F/# is inversely proportional to the square root of the stereo emission angular area of the optical pupil in cosine space, with consideration of polarization lens transmittance weighting. (e) The field curvature curve shows the offsets of the near-axis image plane. (f) The distortion curve shows the corresponding distortion offset.$

Fig. 2. Performance evaluation of SWFOV LWIR gazing polarizer heads. (a) Blur spot diagram: demonstrating diffuse spot characteristics by testing blur spots at four FOV angles with reference to the main light. (b) Calculating the percentage of total enclosed energy at the four FOV angles and comparing it with the diffraction limit curve (which represents the zero distortion response). (c) Using the FFT method, diffraction MTF is calculated for the four FOV angles, with the spatial frequency range set as [0,29] lp/mm (Nyquist frequency is 29 lp/mm). (d) The relative illuminance is calculated by integrating the effective area of the exit pupil observed at the image point (performed in cosine space). The effective F/# is inversely proportional to the square root of the stereo emission angular area of the optical pupil in cosine space, with consideration of polarization lens transmittance weighting. (e) The field curvature curve shows the offsets of the near-axis image plane. (f) The distortion curve shows the corresponding distortion offset.

Download full size

View in Article

Fig. 3. SWFOV LWIR gaze polarization imaging test. By rotating the holographic line-grid IR polarization device in the IR polarizer head, the image data are captured in four directions: 0°, 45°, 90°, and 135°. Stokes vectors are calculated for each pixel, where the s₀ parameter directly corresponds to the SWFOV LWIR image. Further to obtain SWFOV LWIR DoLP images, the SWFOV LWIR images and the SWFOV LWIR DoLP images are presented in checkerboard image.

Download full size

View in Article

Fig. 4. Image fusion network. (a) This fusion network contains LLRR model, Detail CNN, ADFU model, and decoder network. The LLRR model extracts sparse features from IR images and IR DoLP images (SIR and SDoLP). Detail CNN model extracts high-frequency detail features (ΨIRD and ΨDoLPD). The primary detail features obtained by adding S and Ψ are delivered to the ADFU model, which is based on the attentional mechanism, and the new detail features FDoLPD of salient targets can be obtained. Meanwhile, the LLRR model and Base Poolformer model extract the low-rank features LIR and low-frequency features BIR of the IR image, respectively, and compose them into the background features FIRB. Finally, the detail feature FDoLPD and the background feature FIRB are fused by the decoder network to obtain the fusion image. (b) LLRR model structure. The input data is X. The activation function hξ is defined as hξ=sign(x)·max(|x|−ξ,0), where Z0=hξ(V0*X). (c) ADFU model structure. To segment the primary detail features into two parts, A and B, the 3×3 deep convolution operation (DWCovn) is used for A to generate non-local features. The Pooling denotes the adaptive maximum pooling operation to obtain low frequency information. After using the 3×3 deep convolution operation for B, the 1×1 convolution and a hidden activation function are used directly to obtain enhanced high-frequency features. The features obtained from the A and B channels are added to obtain the improved detail features. Then the Detail CNN model in the ADFU model is utilized to perform the third detail feature extraction on the improved detail features to obtain the final detail features FDoLPD. (d) Base Poolformer model structure. This model acquires background features, where pooling operations are performed to enhance the perceptual ability of the feature information; PoolMLP uses two 1×1 convolutional layers and activation operations are performed on the features.

Download full size

View in Article

Fig. 5. Fusion effects of ablation study. (a) The fusion effects and close-ups of the fusion images obtained from the baseline model, the LLRR model, the ADFU model, and the LLRR + ADFU model are shown. Three scenes were reconstructed using pseudo-color for viewing convenience. (b) Loss function of the image fusion network. (c) Number of parameters of the four ablation models.

Download full size

View in Article

Fig. 6. Qualitative comparison of different fusion methods for three scenes. (a) IR. (b) IR DoLP. (c) RFN-Nest. (d) SwinFusion. (e) YDTR. (f) CDDFuse. (g) CMTFusion. (h) DIF-Fusion. (i) DCSFuse. (j) Ours.

Download full size

View in Article

Fig. 7. Testing our dataset using different fusion methods. Using the YOLOX detection network to detect car targets in SWFOV LWIR images, SWFOV LWIR DoLP images, and SWFOV fused images, and demonstrating the average precision and confidence of car detection.

Download full size

View in Article

Fig. 8. Car detection task is performed at different distance locations. Cars are detected using YOLOX detection network for SWFOV LWIR images, SWFOV LWIR DoLP images, and SWFOV fusion images.

Download full size

View in Article

Fig. 9. IR camouflage target recognition experiments. (a) Experimental demonstration. (b) Visual images wearing the IR camouflage suit in bush, building, and grass environments. (c) Recognition tests of IR targets and IR camouflaged targets in road and grass environments, respectively. The SWFOV LWIR images, SWFOV LWIR DoLP images, and SWFOV fusion images are shown from left to right. The gray values of rectangular contour-marked regions in the images are counted and annotated with the HIS color model. The close-ups show the statistics of the gray values of the outline marking area, and the height indicates the gray value. (d) Analyzing the pixel distributions of the SWFOV LWIR images, SWFOV LWIR DoLP images, and SWFOV fusion images of the four scenes in Fig. 3.

Download full size

View in Article

Table 1. Quantitative Evaluation of Ablation Study on the Test Dataset^a

View table

View in Article

Table 1. Quantitative Evaluation of Ablation Study on the Test Dataset^a

Method	CER (%)	DR (%)	PID	PSNR	$E_{n}$
Baseline	$\underline{\underline{74.5631}}$	$\underline{\underline{74.915}}$	$\underline{\underline{299.857}}$	$\underline{\underline{2.8916}}$	$\underline{\underline{3.7342}}$
LLRR	$\underline{\underline{128.8028}}$	$\underline{\underline{76.1781}}$	$\underline{\underline{259.3154}}$	$\underline{\underline{4.0078}}$	$\underline{\underline{3.0442}}$
ADFU	17.3612	128.8947	86.6665	12.9008	6.7563
LLRR + ADFU	32.0598	147.1214	81.3571	13.9214	6.5471

Table 2. Quantitative Evaluation of Different Fusion Methods on the Test Dataset^a

View table

View in Article

Table 2. Quantitative Evaluation of Different Fusion Methods on the Test Dataset^a

Method	CER (%)	DR (%)	PID	PSNR	$E_{n}$
RFN-Nest	5.1064	121.9426	99.7914	11.8487	6.7448
SwinFusion	4.5682	109.5763	188.0744	6.177	7.2394
YDTR	19.1432	81.1398	166.2386	7.3832	7.0926
CDDFuse	6.0211	125.2308	186.945	6.2367	7.3335
CMTFusion	19.2669	131.9363	114.8855	10.6667	6.9971
DIF-Fusion	9.5311	126.0948	196.8322	5.9109	7.1988
DCSFuse	23.8086	113.7567	250.0697	4.1158	7.2151
Ours	32.0598	147.1214	81.3571	13.9214	6.5471

Tools

Get Citation

Copy Citation Text

Dongdong Shi, Jinhang Zhang, Jun Zou, Fuyu Huang, Limin Liu, Li Li, Yudan Chen, Bing Zhou, Gang Li, "Super-wide-field-of-view long-wave infrared gaze polarization imaging embedded in a multi-strategy detail feature extraction and fusion network," Photonics Res. 13, 1902 (2025)

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites