Wang, Yu-Chun
(Institute of Information Science, Academia Sinica, Department of Electrical Engineering, National Taiwan University)
,
Lee, Yi-Hsun
(Institute of Information Science, Academia Sinica)
,
Lin, Chu-Cheng
(Institute of Information Science, Academia Sinica, Department of Computer Science and Information Engineering, National Taiwan University)
,
Tsai, Richard Tzong-Han
(Department of Computer Science and Engineering)
,
Hsu, Wen-Lian
(Institute of Information Science, Academia Sinica)
Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages...
Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages, Chinese uses characters (ideographs), which makes person name translation difficult because one syllable may map to several Chinese characters. We propose an effective hybrid person name translation method to improve the performance of KCIR. First, we use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. Second, we adopt the Naver people search engine to find the query name's Chinese or English translation. Third, we extract Korean-English transliteration pairs from Google snippets, and then search for the English-Chinese transliteration in the database of Taiwan's Central News Agency or in Google. The performance of KCIR using our method is over five times better than that of a dictionary-based system. The mean average precision is 0.3490 and the average recall is 0.7534. The method can deal with Chinese, Japanese, Korean, as well as non-CJK person name translation from Korean to Chinese. Hence, it substantially improves the performance of KCIR.
Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages, Chinese uses characters (ideographs), which makes person name translation difficult because one syllable may map to several Chinese characters. We propose an effective hybrid person name translation method to improve the performance of KCIR. First, we use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. Second, we adopt the Naver people search engine to find the query name's Chinese or English translation. Third, we extract Korean-English transliteration pairs from Google snippets, and then search for the English-Chinese transliteration in the database of Taiwan's Central News Agency or in Google. The performance of KCIR using our method is over five times better than that of a dictionary-based system. The mean average precision is 0.3490 and the average recall is 0.7534. The method can deal with Chinese, Japanese, Korean, as well as non-CJK person name translation from Korean to Chinese. Hence, it substantially improves the performance of KCIR.
* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.
제안 방법
3490, which is even better than that of the Chinese monolingual IR system. The proposed method can deal with Chinese, Japanese, Korean, as well as non-CJK person name translation. Hence, it substantially improves the performance of KCIR.
대상 데이터
To evaluate our KCIR system, we use the topic collection and document collection of the NTCIR-5 and NTCIR-6 CLIR tasks. The document collection is the Chinese Information Retrieval Benchmark (CIRB) 4.0, which contains news articles published in four Taiwanese newspapers from 2000 to 2001. The topics have four fields: title, description, narration, and concentrate words.
성능/효과
The evaluation results demonstrate that our method improves KCIR substantially, as its performance is more than five times better than that of the baseline system. Interestingly, it is even better than Chinese monolingual IR.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.