Journal of Terahertz Science and Electronic Information Technology , Volume. 21, Issue 3, 378(2023)
Micro-blog hot topic detection method based on improved K-means
Micro-blog text data is high-dimensional, bearing the obvious features of synonymy and polysemy. Traditional topic detection method based on Vector Space Model(VSM) combined with K-means has some problems such as low accuracy, complex calculation, and being difficult to determine the center of clustering. A Relevance Vector Machine(RVM) optimized VSM method is proposed to realize the text vectorization. Firstly, the dimension of VSM feature vector is reduced automatically by using the adaptive feature selection ability of RVM, and then Principal Component Analysis(PCA) is applied to determine the cluster center of K-means clustering algorithm. K-means algorithm is employed to get the clustering results. Finally, according to the number of micro-blog forwarding and comments, the topic with the largest heat index is the current hot topic. The results show that compared with two traditional methods, the accuracy of the proposed method is improved by 7.3% and 1.1%, and the real-time performance is improved by 45% and 53%, respectively.
Get Citation
Copy Citation Text
CHEN Yangjian, WEN Qiuhua. Micro-blog hot topic detection method based on improved K-means[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(3): 378
Category:
Received: Sep. 14, 2020
Accepted: --
Published Online: Apr. 12, 2023
The Author Email: Yangjian CHEN (Chenyangjian688@163.com)