Compressing a plurality of documents
Documents are compressed. A partially compressed document is obtained. The partially compressed document includes one or more code words that replace one or more common tokens of a document to be compressed. The one or more common tokens are tokens common to a plurality of documents, and included in...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Documents are compressed. A partially compressed document is obtained. The partially compressed document includes one or more code words that replace one or more common tokens of a document to be compressed. The one or more common tokens are tokens common to a plurality of documents, and included in a common dictionary. The common dictionary provides a mapping of code words to common tokens. A document associated dictionary is created from non-common tokens of the document to be compressed. The document associated dictionary provides another mapping of other code words to the non-common tokens. A compressed document is created. The creating of the compressed document includes replacing one or more non-common tokens of the partially compressed document with one or more other code words of the document associated dictionary. The compressed document includes the one or more code words of the partially compressed document and the one or more other code words of the document associated dictionary. |
---|