Intra-Category Aware Hierarchical Supervised Document Hashing

Document hashing is a powerful paradigm for document retrieval, which maps high-dimensional documents to compact hashing codes with preserving the similarity of original data. While fairly successful, the existing document hashing methods do not consider the relevance relationship among different do...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2023-06, Vol.35 (6), p.6003-6013
Hauptverfasser:	Guo, Jia-Nan, Mao, Xian-Ling, Wei, Wei, Huang, Heyan
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmark testing Categories Codes Decoding document retrieval Documents Errors hierarchical categories Semantic hashing Semantics Toy manufacturing industry Training Transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Document hashing is a powerful paradigm for document retrieval, which maps high-dimensional documents to compact hashing codes with preserving the similarity of original data. While fairly successful, the existing document hashing methods do not consider the relevance relationship among different documents from a category and the hierarchical relationship among categories. Intuitively, the intra-category relevance connects related concepts among different documents, which can supplement the omitted information for each document; meanwhile the hierarchical categories can help to identify whether mistakes occur in leaf categories or parent categories, which can be used to reduce the mistakes occurring in parent categories that are often more serious. Inspired by above intuitions, we propose a novel I ntra-category aware H ierarchical supervised D ocument H ashing, called IHDH. Specifically, IHDH is a binary autoencoder architecture equipped with two novel components: intra-category component and hierarchy component. The intra-category component exploits the difference among latent semantic representations of different documents from a category to supplement the omitted information for each document. The hierarchy component utilizes the hierarchical structure to transform the probabilities of leaf categories into the probabilities of parent categories by union operation, and then gives a further parent-level penalty to reduce the mistakes occurring in parent categories. Extensive experiments over three benchmark datasets show that IHDH significantly outperforms the state-of-the-art baselines.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2022.3161807