Shanghai Urban Planning Review, Volume. , Issue 2, 25(2025)
Paradigm Shift of Street Visual Intelligence in Urban Planning
[9] [9] WHYTE W H. The social life of small urban spaces[M]. Washington, DC: Conservation Foundation, 1980.
[10] [10] JACOBS J. Death and life of great American cities[M]. New York: Random House, 1961.
[11] [11] LYNCH K. The image of the city[M]. Cambridge, MA: MIT Press, 1960.
[12] [12] RUNDLE A G, BADER M D M, RICHARDS C A, et al. Using Google Street View to audit neighborhood environments[J]. American Journal of Preventive Medicine, 2011, 40(1): 94-100.
[13] [13] BADLAND H M, OPIT S, WITTEN K, et al. Can virtual streetscape audits reliably replace physical streetscape audits?[J]. Journal of Urban Health,2010, 87(6): 1007-1016.
[14] [14] NAIK N, PHILIPOOM J, RASKAR R, et al. Streetscore - predicting the perceived safety of one million streetscapes[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, OH, USA: IEEE, 2014:793-799.
[15] [15] LI X, ZHANG C, LI W, et al. Assessing streetlevel urban greenery using Google Street View and a modified green view index[J]. Urban Forestry & Urban Greening, 2015, 14(3): 675-685.
[16] [16] LI X, ZHANG C, LI W. Building block level urban land-use information retrieval based on Google Street View images[J]. GIScience & Remote Sensing, 2017, 54(6): 819-835.
[17] [17] MURILLO A C, SINGH G, KOSECK J, et al. Localization in urban environments using a panoramic gist descriptor[J]. IEEE Transactions on Robotics, 2013, 29(1): 146-160.
[18] [18] CAMPBELL A, BOTH A, SUN Q. Detecting and mapping traffic signs from Google Street View images using deep learning and GIS[J]. Computers, Environment and Urban Systems, 2019,77: 101350.
[19] [19] DAI Y, LIU L, WANG K, et al. Using computer vision and street view images to assess bus stop amenities[J]. Computers, Environment and Urban Systems, 2025, 117: 102254.
[20] [20] PENG X, SONG R, CAO Q, et al. Real-time illegal parking detection algorithm in urban environments[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 20572-20587.
[21] [21] FAN Z, ZHANG F, LOO B P Y, et al. Urban visual intelligence: uncovering hidden city profiles with street view images[J]. Proceedings of the National Academy of Sciences, 2023, 120(27): e2220417120.
[22] [22] ZHANG F, ZHANG D, LIU Y, et al. Representing place locales using scene elements[J]. Computers, Environment and Urban Systems, 2018, 71: 153-164.
[24] [24] ZHANG F, ZHOU B, LIU L, et al. Measuring human perceptions of a large-scale urban region using machine learning[J]. Landscape and Urban Planning, 2018, 180: 148-160.
[25] [25] HUANG J, FEI T, KANG Y, et al. Estimating urban noise along road network from street view imagery[J]. International Journal of Geographical Information Science, 2024, 38(1): 128-155.
[26] [26] ZHANG F, WU L, ZHU D, et al. Social sensing from street-level imagery: a case study in learning spatio-temporal urban mobility patterns[J]. ISPRS Journal of Photogrammetry and Remote Sensing,2019, 153: 48-58.
[27] [27] KANG J, KRNER M, WANG Y, et al. Building instance classification using street view images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 145: 44-59.
[28] [28] YAO Y, DONG A, LIU Z, et al. Extracting the pickpocketing information implied in the built environment by treating it as the anomalies[J]. Cities, 2023, 143: 104575.
[30] [30] GUI J, CHEN T, ZHANG J, et al. A survey on self-supervised learning: algorithms, applications, and future trends[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12):9052-9071.
[31] [31] WANG Z, LI H, RAJAGOPAL R. Urban2Vec: incorporating street view imagery and POIs for multi-modal urban neighborhood embedding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(1): 1013-1020.
[32] [32] LI Y, HUANG Y, MAI G, et al. Learning street view representations with spatiotemporal contrast[J]. arXiv, 2025: 2502.04638.
[33] [33] LI H, DEUSER F, YIN W, et al. Cross-view geolocalization and disaster mapping with street-view and VHR satellite imagery: a case study of Hurricane IAN[J]. ISPRS Journal of Photo-grammetry and Remote Sensing, 2025, 220: 841-854.
[34] [34] LI Y, HUANG W, CONG G, et al. Urban region representation learning with OpenStreetMap building footprints[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery, 2023:1363-1373.
[35] [35] WANG J, HUANG W, BILJECKI F. Learning visual features from figure-ground maps for urban morphology discovery[J]. Computers, Environment and Urban Systems, 2024, 109: 102076.
[36] [36] QI M, HANKEY S. Using street view imagery to predict street-level particulate air pollution[J]. Environmental Science & Technology, 2021,55(4): 2695-2704.
[37] [37] SWERDLOW A, XU R, ZHOU B. Street-view image generation from a bird's-eye view layout[J]. IEEE Robotics and Automation Letters, 2024,9(4): 3578-3585.
[38] [38] PANG H E, BILJECKI F. 3D building reconstruction from single street view images using deep learning[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 112:102859.
[39] [39] LIU Z, LI T, REN T, et al. Day-to-night street view image generation for 24-hour urban scene auditing using generative AI[J]. Journal of Imaging, 2024,10(5): 112.
[40] [40] HOU C, ZHANG F, LI Y, et al. Urban sensing in the era of large language models[J]. The Innovation,2025, 6(1): 100749.
[42] [42] JANG K M, KIM J. Multimodal large language models as built environment auditing tools[J]. The Professional Geographer, 2025, 77(1): 84-90.
[44] [44] HUANG W, WANG J, CONG G. Zero-shot urban function inference with street view images through prompting a pretrained vision-language model[J]. International Journal of Geographical Information Science, 2024, 38(7): 1414-1442.
[45] [45] WU M, HUANG Q, GAO S, et al. Mixed land use measurement and mapping with street view images and spatial context-aware prompts via zero-shot multimodal learning[J]. International Journal of Applied Earth Observation and Geoinformation,2023, 125: 103591.
[46] [46] CHEN M, LI Z, HUANG W, et al. Profiling urban streets: a semi-supervised prediction model based on street view imagery and spatial topology[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery, 2024: 319-328.
[47] [47] XU S, ZHANG C, FAN L, et al. AddressCLIP: empowering vision-language models for city-wide image address localization[C]//LEONARDIS A, RICCI E, ROTH S, et al. Computer Vision - ECCV 2024. Cham: Springer Nature Switzerland,2025: 76-92.
[48] [48] LIANG H, ZHANG J, LI Y, et al. Automatic estimation for visual quality changes of street space via street-view images and multimodal large language models[J]. IEEE Access, 2024, 12: 87713-87727.
[49] [49] ZHOU Z, WANG Q, LIN B, et al. UNIAA: a unified multi-modal image aesthetic assessment baseline and benchmark[J]. arXiv, 2024:2404.09619.
[50] [50] BLEI I, SAIU V, TRUNFIO GIUSEPPE A. Enhancing urban walkability assessment with multimodal large language models[C]//GERVASI O, MURGANTE B, GARAU C, et al. Computational Science and Its Applications - ICCSA 2024 Workshops. Cham: Springer Nature Switzerland, 2024: 394-411.
[51] [51] YU D, BAO R, MAI G, et al. Spatial-RAG: spatial retrieval augmented generation for real-world spatial reasoning questions[J]. arXiv, 2025:2502.18470.
Get Citation
Copy Citation Text
YIN Hanyu, SUN Yumei, WU Lun, ZHANG Fan. Paradigm Shift of Street Visual Intelligence in Urban Planning[J]. Shanghai Urban Planning Review, 2025, (2): 25
Category:
Received: --
Accepted: Aug. 22, 2025
Published Online: Aug. 22, 2025
The Author Email: ZHANG Fan (博士生导师fanzhanggis@pku.edu.cn)