Research on Multiframe Lane Detection Method Using Swin Transformer Embedded with Attention

Yanhui Li; Zhongchun Fang; Hairong Li

doi:10.3788/LOP241332

Laser & Optoelectronics Progress, Volume. 62, Issue 4, 0412007(2025)

Research on Multiframe Lane Detection Method Using Swin Transformer Embedded with Attention

Yanhui Li^1、*, Zhongchun Fang², and Hairong Li²

¹School of Digital and Intelligent Industry (School of Cyber Science and Technology), Inner Mongolia University of Science & Technology, Baotou 014000, Inner Mongolia , China

²Engineering Training Center (College of Innovation and Entrepreneurship Education), Inner Mongolia University of Science & Technology, Baotou 014000, Inner Mongolia , China

show less

Abstract Get PDF(in Chinese)

Figures & Tables(18)

Fig. 1. Overall framework diagram of the proposed model

Download full size

Fig. 2. Structure of the Swin Transformer network

Download full size

Fig. 3. Framework of the Swin Transformer block

Download full size

Fig. 4. Structure of the ST-LSTM

Download full size

Fig. 5. Structure of the CA module

Download full size

Fig. 6. Scene distribution of the CULane dataset

Download full size

Fig. 7. Scene distribution of the VIL-100 dataset

Download full size

Fig. 8. Detection results by the proposed model on the CULane dataset

Download full size

Fig. 9. Detection results by the proposed model on the Tusimple dataset

Download full size

Fig. 10. Detection results by the proposed model on the VIL-100 dataset

Download full size

Table 1. Sampling method for successive input images

View table

Table 1. Sampling method for successive input images

Ground truth	Stride	Sampled frames
15^th	1	12^th， 13^th， 14^th， 15^th
	2	9^th， 11^th， 13^th， 15^th
	3	6^th， 9^th， 12^th， 15^th
19^th	1	16^th， 17^th， 18^th， 19^th
	2	13^th， 15^th， 17^th， 19^th
	3	10^th， 13^th， 16^th， 19^th

Table 2. Structure and contents of the Tusimple dataset
View table
Table 2. Structure and contents of the Tusimple dataset
Dataset Type Lane Environment Labeled frame Labeled image
Training set Tusimple ≤4 Highway 15^th and 19^th 7252
Test set Test set 1 ≤4 Highway 15^th and 19^th 2465
Test set 2 ≤4 Highway All frames 781

Table 3. Experimental results on the Tusimple dataset by different models

View table

Table 3. Experimental results on the Tusimple dataset by different models

Model	A /%	P	R	F1 score
SCNN^［20］	96.53	0.654	0.808	0.722
RESA^［21］	96.80	0.761	0.729	0.745
LaneNet^［22］	96.38	0.875	0.927	0.884
SegNet^［23］	96.05	0.796	0.956	0.838
U-Net^［24］	96.40	0.790	0.953	0.867
SegNet-convLSTM^［19］	97.10	0.852	0.964	0.901
UNet-convLSTM^［19］	97.20	0.857	0.958	0.904
ADNet^［25］	96.23	—	—	—
Res18_UFLD^［26］	95.95	—	—	0.8836
Proposed	97.60	0.868	0.971	0.907

Table 4. Comparasion of detection performances for scenes on the CULane dataset by different algorithms

View table

Table 4. Comparasion of detection performances for scenes on the CULane dataset by different algorithms

Algorithm	F1 score								Overall	N_FPof Crossroad	FPS /（frame·s^-1）
Algorithm	Normal	Crowded	Dazzle light	Shadow	No line	Arrow	Curve	Night	Overall	N_FPof Crossroad	FPS /（frame·s^-1）
U-Net^［24］	0.9110	0.6820	0.6120	0.6710	0.4330	0.8540	0.6370	0.6730	0.7230	2044	50.1
LaneNet^［22］	0.9170	0.7030	0.6030	0.6750	0.4460	0.8520	0.6510	0.6740	0.7370	2005	48.8
SCNN^［20］	0.9060	0.6970	0.5850	0.6690	0.4340	0.8410	0.6440	0.6610	0.7130	1990	8.2
ADNet^［25］	0.9192	0.7581	0.6939	0.7621	0.5175	0.8771	0.6884	0.7233	0.7756	1133	87.0
Res18_UFLD^［26］	0.8906	0.6776	0.5526	0.6483	0.3900	0.8381	0.5839	0.6407	0.6969	2215	6.9
Proposed	0.9430	0.7750	0.7230	0.7910	0.5620	0.8810	0.6720	0.7230	0.8310	1951	63.5

Table 5. Experimental results on the VIL-100 dataset by different models
View table
Table 5. Experimental results on the VIL-100 dataset by different models
Model mIoU F1 score
LaneNet^［22］ 0.633 0.721
RESA^［21］ 0.702 0.874
LaneATT^［17］ 0.664 0.823
MMA-Net^［27］ 0.705 0.839
Proposed 0.726 0.895

Table 6. Comparison of the accuracy on the Tusimple dataset

View table

Table 6. Comparison of the accuracy on the Tusimple dataset

Model	Accuracy of training set /%		Accuracy of test set /%
Model	Easy lanes	Hard lanes	Easy lanes	Hard lanes
SCNN^［20］	94.12	92.85	94.25	93.12
ENet^［28］	94.98	93.62	94.73	93.74
UNet^［24］	95.42	94.14	95.53	94.08
SegNet^［23］	95.79	94.57	95.61	94.52
LaneNet^［22］	96.56	95.35	96.48	95.42
Proposed	97.20	97.10	97.19	97.12

Table 7. Experimental results using different Q values
View table
Table 7. Experimental results using different Q values
Value of Q A /% P R F1 score
1 96.9 0.862 0.956 0.871
2 97.4 0.867 0.961 0.876
3 97.5 0.869 0.968 0.905
4 97.6 0.868 0.971 0.907
5 97.3 0.866 0.970 0.904

Table 8. Results of the ablation experiments
View table
Table 8. Results of the ablation experiments
Base lane Transformer Vision Transformer Swin Transformer CBAM CA F1 score
CULane Tusimple VIL-100
√ √ 0.721 0.836 0.821
√ √ 0.758 0.875 0.859
√ √ 0.775 0.887 0.864
√ √ √ 0.789 0.895 0.876
√ √ √ 0.791 0.907 0.895

Tools

Get Citation

Copy Citation Text

Yanhui Li, Zhongchun Fang, Hairong Li. Research on Multiframe Lane Detection Method Using Swin Transformer Embedded with Attention[J]. Laser & Optoelectronics Progress, 2025, 62(4): 0412007

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites