Journal of Geo-information Science, Volume. 22, Issue 9, 1753(2020)

Identifying Metro Trip Purpose using Multi-source Geographic Big Data and Machine Learning Approach

Pengjun ZHAO* and Yushu CAO
Author Affiliations
  • The Centre for Urban Planning and Transport Studies, College of Urban and Environmental Sciences, Peking University, Beijing 100871, China
  • show less
    Figures & Tables(13)
    Spatial distribution of land-use related variables
    Spatial distribution of metro trip records in a week
    Schematic diagram of estimating trip purpose of the smart card transactions
    MDA values of different feature importance in the RF classifier
    The OOB accuracy of the RF classifier changes with the number of features
    Convergence of training of random forest classifiers and judgment of the optimal number of trees
    Temporal distribution of metro trip departures and arrivals for different travel purposes
    Spatial distribution of metro trip departures and arrivals for different travel purposes
    • Table 1. Data sources and brief description

      View table
      View in Article

      Table 1. Data sources and brief description

      数据类型数据描述数据年份数据来源
      居民出行调查数据居民一日出行链2015年(对应2014年北京市居民出行情况)北京市交通委员会(http://jtw.beijing.gov.cn/)
      SCD智能卡数据共计约1434万条地铁出行数据2018年(7月1日至7月7日)北京市交通委员会(http://jtw.beijing.gov.cn/)
      百度POI数据用于反映城市服务设施的空间分布情况2015年百度地图开放平台(http://lbsyun.baidu.com/)
      地铁站点数据北京市地铁站点空间分布情况2014年、2018年北京地铁(https://www.bjsubway.com/)
      住房交易价格数据单位面积成交价格2015年北京链家网(https://bj.lianjia.com/)
    • Table 2. Number and proportion of metro trips by purpose intraffic survey data

      View table
      View in Article

      Table 2. Number and proportion of metro trips by purpose intraffic survey data

      出行目的样本数量/条占比/%
      回家327058.76
      其他4988.95
      上班179732.29
      总计5565100.00
    • Table 3. Variables included in the random forest classifier

      View table
      View in Article

      Table 3. Variables included in the random forest classifier

      特征名称特征描述
      出行目的被识别变量(上班、回家、其他)
      出行特征出发时刻、到达时刻、出行时长
      土地利用特征起止点周边高收入、低收入工作场所类型POI核密度值
      起止点周边居民点类型兴趣点与住房价格核密度值
      起止点周边公共服务与生活服务设施类型POI核密度值
      起止点到市中心欧式距离
    • Table 4. Random forest classifier confusion matrix results

      View table
      View in Article

      Table 4. Random forest classifier confusion matrix results

      样本数量/条样本占比/%
      分类结果为“回家”分类结果为“其他”分类结果为“上班”预测准确样本占比
      真实值为“回家”782381493.76
      真实值为“其他”8452359.21
      真实值为“上班”04243991.27
    • Table 5. Comparison of confusion matrix betweenrandom forest classifier with or without travel-related variables

      View table
      View in Article

      Table 5. Comparison of confusion matrix betweenrandom forest classifier with or without travel-related variables

      样本数量/条样本占比/%
      分类结果为“回家”分类结果为“其他”分类结果为“上班”仅包含出行特征分类准确样本占比初始RF分类器准确样本占比
      真实值为“回家”765381593.5293.76
      真实值为“其他”22313336.0559.21
      真实值为“上班”35642887.8991.27
    Tools

    Get Citation

    Copy Citation Text

    Pengjun ZHAO, Yushu CAO. Identifying Metro Trip Purpose using Multi-source Geographic Big Data and Machine Learning Approach[J]. Journal of Geo-information Science, 2020, 22(9): 1753

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Mar. 22, 2019

    Accepted: --

    Published Online: Apr. 23, 2021

    The Author Email: ZHAO Pengjun (pengjun.zhao@pku.edu.cn)

    DOI:10.12082/dqxxkx.2020.200134

    Topics