Journal of Terahertz Science and Electronic Information Technology , Volume. 21, Issue 3, 378(2023)

Micro-blog hot topic detection method based on improved K-means

CHEN Yangjian1、* and WEN Qiuhua2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less

    Micro-blog text data is high-dimensional, bearing the obvious features of synonymy and polysemy. Traditional topic detection method based on Vector Space Model(VSM) combined with K-means has some problems such as low accuracy, complex calculation, and being difficult to determine the center of clustering. A Relevance Vector Machine(RVM) optimized VSM method is proposed to realize the text vectorization. Firstly, the dimension of VSM feature vector is reduced automatically by using the adaptive feature selection ability of RVM, and then Principal Component Analysis(PCA) is applied to determine the cluster center of K-means clustering algorithm. K-means algorithm is employed to get the clustering results. Finally, according to the number of micro-blog forwarding and comments, the topic with the largest heat index is the current hot topic. The results show that compared with two traditional methods, the accuracy of the proposed method is improved by 7.3% and 1.1%, and the real-time performance is improved by 45% and 53%, respectively.

    Tools

    Get Citation

    Copy Citation Text

    CHEN Yangjian, WEN Qiuhua. Micro-blog hot topic detection method based on improved K-means[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(3): 378

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Sep. 14, 2020

    Accepted: --

    Published Online: Apr. 12, 2023

    The Author Email: Yangjian CHEN (Chenyangjian688@163.com)

    DOI:10.11805/tkyda2020457

    Topics