Ao, Xiong
(Beijing University of Posts & Telecommunications (BUPT), China,Beijing,China)
,
Yu, Xin
(Beijing University of Posts and Telecommunications, China,Beijing,China)
,
Liu, Derong
(Beijing University of Posts and Telecommunications, China,Beijing,China)
,
Tian, Hongkang
(Beijing University of Posts & Telecommunications (BUPT), China,Beijing,China)
TextRank algorithm tends to extract words with frequent occurrence as keywords, while TF-IDF only considers the word frequency relation in the text library to extract keywords. In order to combine the advantages of the two algorithms, this paper proposes a Weighted TF-IDF of the Same Category Histor...
TextRank algorithm tends to extract words with frequent occurrence as keywords, while TF-IDF only considers the word frequency relation in the text library to extract keywords. In order to combine the advantages of the two algorithms, this paper proposes a Weighted TF-IDF of the Same Category Historical News Library and TextRank (TFSL-TR). This method first uses the classification model based on LSTM to classify the news, and calculates the TF-IDF value as the first weight of the word by using the news library of the target news category. Then use the TextRank algorithm to calculate the second weight of words. Finally, sum the two weights by weight and take the TopK words as keywords in order of size. The experiment was carried out in the Chinese news library, and the results showed that, compared with the traditional method, TFSL-TR could effectively improve the accuracy of keyword extraction.
TextRank algorithm tends to extract words with frequent occurrence as keywords, while TF-IDF only considers the word frequency relation in the text library to extract keywords. In order to combine the advantages of the two algorithms, this paper proposes a Weighted TF-IDF of the Same Category Historical News Library and TextRank (TFSL-TR). This method first uses the classification model based on LSTM to classify the news, and calculates the TF-IDF value as the first weight of the word by using the news library of the target news category. Then use the TextRank algorithm to calculate the second weight of words. Finally, sum the two weights by weight and take the TopK words as keywords in order of size. The experiment was carried out in the Chinese news library, and the results showed that, compared with the traditional method, TFSL-TR could effectively improve the accuracy of keyword extraction.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.