Methods for identifying versioned and plagiarized documents

The widespread use of on‐line publishing of text promotes storage of multiple versions of documents and mirroring of documents in multiple locations, and greatly simplifies the task of plagiarizing the work of others. We evaluate two families of methods for searching a collection to find documents t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Society for Information Science and Technology 2003-02, Vol.54 (3), p.203-215
Hauptverfasser:	Hoad, Timothy C., Zobel, Justin
Format:	Artikel
Sprache:	eng
Schlagworte:	Anchors Cheating Coderivations Collections Derivatives Developmental Stages Documents Exact sciences and technology Fingerprinting Identification methods Identification systems Information and communication sciences Information retrieval Information retrieval systems. Information and document management system Information science. Documentation Interfaces. Software Internet Linux Online information retrieval Parameter identification Plagiarism Queries Ranking Sciences and techniques of general use Searching Strategy Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The widespread use of on‐line publishing of text promotes storage of multiple versions of documents and mirroring of documents in multiple locations, and greatly simplifies the task of plagiarizing the work of others. We evaluate two families of methods for searching a collection to find documents that are coderivative, that is, are versions or plagiarisms of each other. The first, the ranking family, uses information retrieval techniques; extending this family, we propose the identity measure, which is specifically designed for identification of coderivative documents. The second, the fingerprinting family, uses hashing to generate a compact document description, which can then be compared to the fingerprints of the documents in the collection. We introduce a new method for evaluating the effectiveness of these techniques, and demonstrate it in practice. Using experiments on two collections, we demonstrate that the identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents. However, for fingerprinting parameters must be carefully chosen, and even so the identity measure is clearly superior.
ISSN:	1532-2882 2330-1635 1532-2890 2330-1643
DOI:	10.1002/asi.10170