Method and device for mining comparable network language materials
The invention relates to a method for mining comparable network language materials. The method includes acquiring source language web pages by the aid of network crawlers and preprocessing the source language web pages to obtain source language documents; analyzing probabilities of cross-language to...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a method for mining comparable network language materials. The method includes acquiring source language web pages by the aid of network crawlers and preprocessing the source language web pages to obtain source language documents; analyzing probabilities of cross-language topics of the source language documents and generating corresponding target language query phrases; submitting the target language query phrases to search engines and selecting front N documents to form a target language candidate similar document set; computing similarity degrees of the source language documents and target language candidate similar documents, sieving documents with high similarity degrees and constructing a comparable language material bank. The invention further discloses a device for implementing the method for mining the comparable network language materials. The method and the device have the advantages that the problem of ambiguity or long time consumption due to vocabulary translation can be |
---|