Working Condition Recognition Based on Lightweight Convolution Vision Transformer Network for Antimony Flotation Process

Fig. 4. L-CVT network structure. (a) L-Conv module (depth convolution step size is 1); (b) L-Conv module (depth convolution step size is 2); (c) Conv-VIT module

Download full size

Fig. 5. Depth separable convolution

Download full size

Fig. 6. Transformer. (a) Transformer structure; (b) multi-head attention

Download full size

Fig. 7. Global representation of pixel information by Conv-VIT module

Download full size

Fig. 8. Process of flotation data collection. (a) Flotation site; (b) flotation tank; (c) collection terminal of flotation data

Download full size

Fig. 9. Image flip transformation. (a) Original image; (b) horizontal flip; (c) filp vertically; (d) rotate clockwise 90°

Download full size

Fig. 10. MixUp rendering

Download full size

Fig. 11. CutMix rendering

Download full size

Fig. 12. Comparison curves of identification accuracy of antimony flotation condition based on different networks

Download full size

Fig. 13. Confusion matrix results of different models. (a) L-CVT; (b) AlexNet; (c) VGG16; (d) ResNet18

Download full size

Fig. 14. ROC curves and AUC values of different networks. (a) L-CVT; (b) AlexNet; (c) VGG16; (d) ResNet18

Download full size

Fig. 15. Visualization results of feature maps of four kinds of networks

Download full size

Table 1. Feature description of different operating conditions

View table

Table 1. Feature description of different operating conditions

Category	Flotation condition	Category feature description
Class Ⅰ	Abnormal	The bubbles are very sparse；the particle loading is much lower than normal and the bubbles are with gray appearance
Class Ⅱ	Poor	The bubbles are sparse；the particle loading is a little lower than normal and the bubbles are with gray-black appearance
Class Ⅲ	Qualified	The bubbles are medium in size and messy distributed；the particle loading is normal and the bubbles are with black appearance
Class Ⅳ	Medium	The bubbles are medium in size and evenly distributed；the particle loading is normal and the bubbles are with bright appearance
Class Ⅴ	Good	The bubbles are large in size and evenly distributed；the particle loading is higher than normal and the bubbles are with bright appearance
Class Ⅵ	Excellent	The bubble are the largest；the particle loading is much higher than normal and the froth is viscous；the bubbles are with water-shiny appearance

Table 2. Experiment of data augmentation comparison
View table
Table 2. Experiment of data augmentation comparison
Method Top-1 accuacry /%
Flip 89.64
MixUp 90.83
CutMix 91.39
Filp+MixUp+CutMix 93.56
None 86.55

Table 3. Network parameters of L-CVT

View table

Table 3. Network parameters of L-CVT

Layer name	Output size	Output channels	Number
Conv-3×3	128×128	32	1
L-Conv（stride is 2）	64×64	48	1
L-Conv（stride is 1）	64×64	48	2
L-Conv（stride is 2）	32×32	64	1
Conv-VIT（M=2）	32×32	64	1
L-Conv（stride is 2）	16×16	96	1
Conv-VIT（M=4）	16×16	96	1
L-Conv（stride is 2）	8×8	128	1
Conv-VIT（M=4）	8×8	128	1
Conv-1×1	8×8	384	1
MLP	1×1	6	1
FLOPs：6.01×10¹⁰	Params：2.33 MB

Table 4. Main parameters of AlexNet, VGG16, ResNet18

View table

Table 4. Main parameters of AlexNet, VGG16, ResNet18

AlexNet	VGG16	ResNet18
Layer-1：11×11，96；maxpool-3×3	Layer-1： $[\begin{array}{l} 3 \times 3, 64 \\ 3 \times 3, 64 \end{array}]$ ；maxpool-2×2	Layer-1：7×7，64；maxpool-3×3
Layer-2：5×5，96；maxpool-3×3	Layer-2： $[\begin{array}{l} 3 \times 3, 128 \\ 3 \times 3, 128 \end{array}]$ ；maxpool-2×2	Layer-2： $[\begin{array}{l} 3 \times 3, 64 \\ 3 \times 3, 64 \end{array}] \times 3$
Layer-3：3×3，384	Layer-3： $[\begin{array}{l} 3 \times 3, 256 \\ 3 \times 3, 256 \\ 3 \times 3, 256 \end{array}]$ ；maxpool-2×2	Layer-3： $[\begin{array}{l} 3 \times 3, 128 \\ 3 \times 3, 128 \end{array}] \times 4$
Layer-4：3×3，384	Layer-4： $[\begin{array}{l} 3 \times 3, 512 \\ 3 \times 3, 512 \\ 3 \times 3, 512 \end{array}]$ ；maxpool-2×2	Layer-4： $[\begin{array}{l} 3 \times 3, 256 \\ 3 \times 3, 256 \end{array}] \times 6$
Layer-5：3×3，256；maxpool-3×3	Layer-5： $[\begin{array}{l} 3 \times 3, 512 \\ 3 \times 3, 512 \\ 3 \times 3, 512 \end{array}]$ ；maxpool-2×2	Layer-5： $[\begin{array}{l} 3 \times 3, 512 \\ 3 \times 3, 512 \end{array}] \times 3$
FC-1：2048	FC-1：4096	Pooling layer：average pool
FC-2：2048	FC-2：4096	FC-1：6
FC-3：6	FC-3：6	Classifier：Softmax
Classifier：Softmax	Classifier：Softmax

Table 5. Identificatiton accuracy of antimony flotation condition based on different networks
View table
Table 5. Identificatiton accuracy of antimony flotation condition based on different networks
Network L-CVT AlexNet VGG16 ResNet
Top-1 accuracy /% 93.56 78.33 85.11 88.33

Table 6. Computational complexity of different networks
View table
Table 6. Computational complexity of different networks
Network Params /MB FLOPs /10⁹
L-CVT 2.38 60.10
AlexNet 33.66 93.31
VGG16 46.29 5146.08
ResNet18 11.18 152.02

Table 7. Evaluation results of different networks
View table
Table 7. Evaluation results of different networks
Network Precision /% Recall /% F1-Score /%
L-CVT 93.56 94.51 94.03
AlexNet 78.33 77.37 77.85
VGG16 85.11 85.46 85.28
ResNet18 88.33 89.74 89.03

Table 8. Ablation experiment
View table
Table 8. Ablation experiment
Method Accuracy /% F1-score /%
Base 75.22 74.60
Base + A 88.78 88.70
Base + B 81.44 81.70
Base + A + B（proposed network） 93.56 94.03

Tools

Get Citation

Copy Citation Text

Yifei Chen, Yaoyi Cai, Shiwen Li. Working Condition Recognition Based on Lightweight Convolution Vision Transformer Network for Antimony Flotation Process[J]. Laser & Optoelectronics Progress, 2023, 60(6): 0615002

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites