Chinese semantic knowledge representation and overlap measure for Chinese documents

Document copy detection is to judge whether a given query document plagiarizes content of other ones in the database, which plagiarism occurs in some ways, such as by duplicating partial or total document content, by using different words or sentences to express the same meanings of the text of prev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xu Li, Xiaoqiang Yu, Chunlong Yao, Xiuyan Zhao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Document copy detection is to judge whether a given query document plagiarizes content of other ones in the database, which plagiarism occurs in some ways, such as by duplicating partial or total document content, by using different words or sentences to express the same meanings of the text of previous documents. Matching hashed chunks is relatively simple and suffices for reliably detecting exact overlaps. However, detecting paraphrase overlap is subtle. To address the problem, a frame-based Chinese semantic knowledge representation and an overlap measure method for Chinese documents are proposed. The experimental results show that the method can identify the complicated plagiarism patterns, such as single-word synonym, voice changes, part of speech changes and breaking long sentence.
DOI:10.1109/ICICIP.2012.6391442