Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation

Algorithm 1： Spatial attention bridge block
Inputs： The input maps of the four channel attention bridge block $C_{i}$ ，i=1，2，3，4
Outputs： $S_{i}$ ，i=1，2，3，4
1： $χ_{m e a n}^{i}$ =AvgPool（ $C_{i}$ ） /avg-pooling/
2： $χ_{m a x}^{i}$ =MaxPool（ $C_{i}$ ）/max-pooling/
3： $χ_{s}^{i}$ =Concat（ $h_{m e a n}^{i}$ _， $h_{m a x}^{i}$ ）/Concatenate the feature map odd/
4： $α$ = $C o n v_{7 \times 7} (h_{c})$ /7×7 convolution operation/
5： $ε$ = $σ (β)$ /After sigmoid， the feature map become $C \times H \times 1$ /
6： $S_{i}$ = $ε$ * $C_{i}$ + $C_{i}$ /The feature map of sigmoid with the original feature and then add /
End

Table 1. Segmentation results of different networks on Kvasir and CVC-ClinicDB datasets

View table

View in Article

Table 1. Segmentation results of different networks on Kvasir and CVC-ClinicDB datasets

Dataset	Method	Dice	MIoU	SE	PC	F2	MAE
Kvasir	U-Net	0.818	0.746	0.856	0.857	0.827	0.055
	EUNet	0.908	0.854	0.934	0.911	0.919	0.028
	PraNet	0.898	0.840	0.911	0.916	0.901	0.032
	CaraNet	0.918	0.867	0.912	0.938	0.914	0.023
	PolypPVT	0.917	0.864	0.913	0.947	0.914	0.023
	SSFormer-L	0.918	0.865	0.897	0.957	0.904	0.022
	MSRAFormer	0.923	0.873	0.915	0.952	0.917	0.024
	Ours	0.932	0.883	0.933	0.944	0.931	0.021
CVC-ClinicDB	U-Net	0.823	0.755	0.834	0.839	0.827	0.019
	EUNet	0.902	0.846	0.959	0.880	0.926	0.011
	PraNet	0.899	0.849	0.910	0.907	0.905	0.009
	CaraNet	0.936	0.887	0.955	0.928	0.948	0.007
	PolypPVT	0.937	0.889	0.949	0.936	0.945	0.006
	SSFormer-L	0.906	0.855	0.897	0.931	0.898	0.008
	MSRAFormer	0.924	0.874	0.945	0.920	0.932	0.008
	Ours	0.942	0.896	0.964	0.927	0.954	0.006

Table 2. Segmentation results of different networks on CVC-ColonDB and ETIS datasets

View table

View in Article

Table 2. Segmentation results of different networks on CVC-ColonDB and ETIS datasets

Dataset	Method	Dice	MIoU	SE	PC	F2	MAE
CVC-ColonDB	U-Net	0.512	0.444	0.523	0.621	0.510	0.061
	EUNet	0.756	0.681	0.849	0.758	0.788	0.044
	PraNet	0.712	0.640	0.739	0.755	0.717	0.043
	CaraNet	0.773	0.689	0.857	0.753	0.796	0.042
	PolypPVT	0.808	0.727	0.821	0.849	0.809	0.031
	SSFormer-L	0.802	0.721	0.791	0.864	0.787	0.031
	MSRAFormer	0.782	0.707	0.803	0.874	0.787	0.028
	Ours	0.811	0.731	0.823	0.844	0.813	0.027
ETIS	U-Net	0.398	0.335	0.482	0.439	0.429	0.036
	EUNet	0.687	0.609	0.871	0.635	0.749	0.066
	PraNet	0.628	0.567	0.686	0.628	0.649	0.031
	CaraNet	0.747	0.672	0.811	0.731	0.777	0.017
	PolypPVT	0.787	0.706	0.867	0.774	0.820	0.013
	SSFormer-L	0.796	0.720	0.830	0.794	0.807	0.014
	MSRAFormer	0.750	0.679	0.811	0.745	0.777	0.013
	Ours	0.805	0.729	0.887	0.770	0.842	0.012

Table 2. [in Chinese]

View table

View in Article

Table 2. [in Chinese]

Algorithm 2： Channel attention bridge block
Inputs： The input maps of the four stages $E_{i}$ ，i=1，2，3，4
Outputs： $C_{i}$ ，i=1，2，3，4
1： $h_{m e a n}^{i}$ =AvgPool（ $C_{i}$ ） /avg-pooling/
2： $h_{c}$ =Concat（ $h_{m e a n}^{1}$ _， $h_{m e a n}^{2}$ _， $h_{m e a n}^{3}$ _， $h_{m e a n}^{4}$ ）/Concatenate the feature map of avg-pooling/
3： $β$ = $C o n v_{3 \times 3} (h_{c})$ /3×3 convolution operation/
4： $γ$ = $σ (β)$ /After sigmoid， the feature map become $C \times H \times 1$ /
5： $C_{i}$ = $γ$ * $E_{i}$ + $E_{i}$ /The feature map of sigmoid with the original feature and then add /
End

Table 3. Performance comparison of different networks（CVC-ClinicDB）

View table

View in Article

Table 3. Performance comparison of different networks（CVC-ClinicDB）

Method	Parameters/M	GFLOPs	Train/（round·s^-1）
U-Net	34.53	65.52	309
EU-Net	31.36	12.31	284
PraNet	30.50	6.96	90
CaraNet	44.54	11.45	256
Polyp-PVT	25.12	5.30	233
SSFormer-L	65.96	17.29	220
MSRAformer	68.03	21.29	199
Ours	24.99	10.01	127

Table 4. Ablation results of each module on the Kvasir and CVC-ColonDB datasets

View table

View in Article

Table 4. Ablation results of each module on the Kvasir and CVC-ColonDB datasets

Dataset	Method	Dice	MIoU	SE	PC	F2
Kvasir	M1	0.906	0.851	0.900	0.931	0.901
	M2	0.921	0.871	0.930	0.931	0.926
	M3	0.928	0.877	0.934	0.936	0.928
	M4	0.932	0.883	0.933	0.944	0.931
CVC-ColonDB	M1	0.786	0.705	0.7918	0.835	0.785
	M2	0.789	0.706	0.8337	0.803	0.802
	M3	0.810	0.730	0.841	0.797	0.806
	M4	0.811	0.731	0.823	0.844	0.813

Tools

Get Citation

Copy Citation Text

Liming LIANG, Anjun HE, Renjie LI, Jian WU. Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation[J]. Optics and Precision Engineering, 2023, 31(18): 2700

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Information Sciences

Received: Mar. 15, 2023

Accepted: --

Published Online: Oct. 12, 2023

The Author Email:

DOI:10.37188/OPE.20233118.2700

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology