Semantic Segmentation for Road Scene Based on Multiscale Feature Fusion

Fig. 4. Heat maps after different processing in fusion module. (a) Input image; (b) heat map without Laplace operator; (c) heat map with Laplace operator

Download full size

Fig. 5. Semantic segmentation results on Cityscapes dataset

Download full size

Table 1. Results of different networks with LFE-B

View table

Table 1. Results of different networks with LFE-B

Network	Speed /（frame·s^-1）		Parameters /M		mIoU /%		GFLOPs
Network	Original	LFE-B	Original	LFE-B	Original	LFE-B	Original	LFE-B
DABNet^［5］	106.00	104.41	0.76	0.69	69.1	70.46	11.18	9.61
ERFNet^［7］	58.57	41.01	2.07	0.75	70.0	71.49	26.86	9.84
LEDNet^［8］	58.94	72.01	0.95	0.60	70.6	70.00	11.51	7.65
ESNet^［19］	51.39	51.72	1.66	1.38	70.7	71.12	24.35	14.29
MIFNet（proposed）	—	73.68	—	0.82	—	72.50	—	12.03

Table 2. Results of different configuration of ESF
View table
Table 2. Results of different configuration of ESF
Configuration Speed /（frame·s^-1） Parameters /M mIoU /%
None 74.97 0.82 71.71
3×3 71.80 0.83 71.65
Prewitt 71.39 0.82 71.79
Sobel 72.74 0.82 72.16
Laplace 73.68 0.82 72.50

Table 3. Results of different decoders on MIFNet
View table
Table 3. Results of different decoders on MIFNet
Decoder Speed /（frame·s^-1） Parameters /M mIoU /%
None
ERFD^［7］
PAD^［5］
APN^［8］
MAFD（proposed）
88.20
52.91
75.23
67.29
73.68
0.77
1.03
0.77
0.78
0.82
71.10
72.10
71.59
69.51
72.50

Table 4. Performance comparison of different network models on Cityscapes test set

View table

Table 4. Performance comparison of different network models on Cityscapes test set

Network	Pretrain	Speed /（frame·s^-1）	Parameters /M	mIoU（test）/%	GFLOPs
ENet^［6］	No	41.70	0.36	58.3	4.35
ESPNet^［22］	No	146.00	0.36	60.3	3.50
CGNet^［23］	No	44.70	0.50	65.6	7.00
ContextNet^［10］	No	176.60	0.88	65.5	1.78
EDANet^［24］	No	105.50	0.68	67.3	9.00
ERFNet^［7］	No	58.57	2.07	68.0	26.90
FastSCNN^［9］	No	198.41	1.10	62.8	1.76
LEDNet^［8］	No	58.94	0.95	69.2	11.50
DABNet^［5］	No	106.20	0.64	71.2	10.50
ESNet^［19］	No	51.39	1.66	70.7	24.40
LRNNet_C^［14］	No	71.00	0.68	72.2	8.58
BiSeNetV1_X^［20］*	ImageNet	105.80^*	5.80	68.4	14.90
BiSeNetV1_R^［20］*	ImageNet	65.50^*	49.00	74.7	55.30
BiSeNetV2^［21］	No	156.00	—	72.6	21.15
BiSeNetV2_L^［21］	No	47.30	—	75.3	118.51
MIFNet（proposed）	No	73.68	0.82	73.1	12.03

Table 5. Performance comparison of different network models on CamVid test set

View table

Table 5. Performance comparison of different network models on CamVid test set

Network	Input size /pixel	Speed /（frame·s^-1）	Parameters /M	mIoU（test）/%	GFLOPs
ENet^［6］	360×480	61.00	0.36	51.3	1.44
ERFNet^［7］	360×480	64.30	2.07	67.1	8.80
DABNet^［5］	360×480	117.00	0.64	64.6	3.20
LEDNet^［8］	360×480	58.94	0.95	66.6	11.50
EKENet^［13］	360×480	38.00	1.20	67.5	—
ESPNet^［22］	360×480	132.00	0.36	55.6	1.10
EDANet^［24］	360×480	163.00	0.68	66.4	2.90
CGNet^［23］	360×480	112.00	0.50	65.6	65.60
LRNNet_C^［14］	360×480	76.50	0.68	69.2	—
BiSeNetV1_X^［20］*	720×960	175.00^*	49.00	65.6	8.70
BiSeNetV1_R^［20］*	720×960	116.30^*	5.80	68.7	32.40
BiSeNetV2^［21］	720×960	124.50	—	72.4	21.15
BiSeNetV2_L^［21］	720×960	32.70	—	73.2	118.51
MIFNet（proposed）	720×960	55.02	0.81	71.1	15.86
MIFNet（proposed）	360×480	85.16	0.81	67.7	3.90

Tools

Get Citation

Copy Citation Text

Qingming Yi, Wenting Zhang, Min Shi, Jialin Shen, Aiwen Luo. Semantic Segmentation for Road Scene Based on Multiscale Feature Fusion[J]. Laser & Optoelectronics Progress, 2023, 60(12): 1210006

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites