Method and device for mining comparable network language materials

The invention relates to a method for mining comparable network language materials. The method includes acquiring source language web pages by the aid of network crawlers and preprocessing the source language web pages to obtain source language documents; analyzing probabilities of cross-language to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZENG WEIHUI, ZENG XINHUA, ZHANG JIAN, LI HUALONG, ZHU ZEDE, GAO HUIYI, DONG HANLIN, CHEN LEI, ZHENG SHOUGUO, WU NA, LI MIAO, BIAN CHENGFEI, YANG ZHENXIN, CHEN SHENG, WENG SHIZHUANG, HU ZELIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a method for mining comparable network language materials. The method includes acquiring source language web pages by the aid of network crawlers and preprocessing the source language web pages to obtain source language documents; analyzing probabilities of cross-language topics of the source language documents and generating corresponding target language query phrases; submitting the target language query phrases to search engines and selecting front N documents to form a target language candidate similar document set; computing similarity degrees of the source language documents and target language candidate similar documents, sieving documents with high similarity degrees and constructing a comparable language material bank. The invention further discloses a device for implementing the method for mining the comparable network language materials. The method and the device have the advantages that the problem of ambiguity or long time consumption due to vocabulary translation can be