Optical Instruments, Volume. 46, Issue 5, 1(2024)
Architectural style classification algorithm fusing CNN and Transformer
[2] [2] XU Z, TAO D C, ZHANG Y, et al. Architectural style classification using multinomial latent logistic regression[C]13th European Conference on Computer Vision–ECCV 2014. Zurich, Switzerl: Springer, 2014: 600 – 615.
[5] [5] WANG R, GU D H, WEN Z J, et al. Intraclass classification of architectural styles using visualization of CNN[C]5th International Conference on Artificial Intelligence Security. New Yk: Springer, 2019: 205 – 216.
[7] [7] ZHAO H S, JIA J Y, KOLTUN V. Expling selfattention f image recognition[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition. Seattle: IEEE, 2020: 10073 – 10082.
[8] [8] RAMACHRAN P, PARMAR N, VASWANI A, et al. Stalone selfattention in vision models[C]Proceedings of the 33rd International Conference on Neural Infmation Processing Systems. Vancouver: ACM, 2019: 7.
[10] [10] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]Annual Conference on Neural Infmation Processing Systems 2017. Long Beach: NIPS, 2017: 5998 – 6008.
[11] [11] PENG Z L, HUANG W, GU S Z, et al. Confmer: Local features coupling global representations f visual recognition[C]Proceedings of the 2021 IEEECVF International Conference on Computer Vision. Montreal: IEEE, 2021: 357 – 366.
[12] [12] CHEN Y P, DAI X Y, CHEN D D, et al. Mobilefmer: bridging mobile transfmer[C] Proceedings of the 2022 IEEECVF Conference on Computer Vision Pattern Recognition. New leans: IEEE, 2022: 5260 – 5269.
[13] [13] SLER M, HOWARD A, ZHU M L, et al. MobileV2: Inverted residuals linear bottlenecks[C]2018 IEEECVF Conference on Computer Vision Pattern Recognition. Salt Lake City: IEEE, 2018: 4510 – 4520.
[14] [14] CDONNIER J B, LOUKAS A, JAGGI M. On the relationship between selfattention convolutional layers[C]8th International Conference on Learning Representations. Addis Ababa: ICLR, 2019.
[15] [15] SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transfmers f visual recognition[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition. Nashville: IEEE, 2021: 16514 – 16524.
[16] [16] TOUVRON H, CD M, DOUZE M, et al. Training dataefficient image transfmers & distillation through attention[C]International conference on machine learning. PMLR, 2021: 10347 – 10357.
[17] [17] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning f image recognition[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 770 – 778.
[18] [18] SCHROFF F, KALENICHENKO D, PHILBIN J. Face: a unified embedding f face recognition clustering[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition. Boston: IEEE, 2015: 815 – 823.
[19] BARZ B, DENZLER J. Wikichurches: A fine-grained dataset of architectural styles with real-world challenges[J]. arXiv preprint arXiv:, 06959, 2021(2108).
[20] [20] ZHANG H Y, CISSÉ M, DAUPHIN Y N, et al. mixup: Beyond empirical risk minimization[C]6th International Conference on Learning Representations. Vancouver: ICLR, 2018.
[21] [21] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture f computer vision[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition. Las Vegas: IEEE, 2016: 2818 – 2826.
[22] [22] LIU Z, LIN Y T, CAO Y, et al. Swin transfmer: Hierarchical vision transfmer using shifted windows[C]Proceedings of the IEEECVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992 – 10002.
[23] [23] CHEN Z S, XIE L X, NIU J W, et al. Visfmer: The visionfriendly transfmer[C]Proceedings of the IEEECVF International Conference on Computer Vision. Montreal: IEEE, 2021: 569 – 578.
Get Citation
Copy Citation Text
Dong LIU, Rongfu ZHANG, Junxiang QIN, Junzhe GONG, Zhibin CAO. Architectural style classification algorithm fusing CNN and Transformer[J]. Optical Instruments, 2024, 46(5): 1
Category:
Received: Aug. 16, 2023
Accepted: --
Published Online: Jan. 3, 2025
The Author Email: ZHANG Rongfu (zrf@usst.edu.cn)