Set similarity query algorithm based on length partition
The invention relates to a set similarity query method based on length partition, and belongs to the field of data mining and information retrieval. The method comprises the following steps: sorting and numbering sets, namely records, in a data set; constructing an inverted index structure for the s...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a set similarity query method based on length partition, and belongs to the field of data mining and information retrieval. The method comprises the following steps: sorting and numbering sets, namely records, in a data set; constructing an inverted index structure for the sorted data set and constructing a length mapping table; and for a given query q and a similarity threshold t, retrieving all records of which the similarity with q is greater than or equal to t according to the created inverted index structure and length mapping table. According to the method, the thought of length partitioning is combined with a classic similarity query algorithm ScanCount. Records which cannot meet the similarity can be quickly filtered through data preprocessing, length partitioning and an efficient index structure. Therefore, the algorithm efficiency is improved. In addition, a simpler counting array is designed, so that the space overhead is reduced. Therefore, the method provided by the inven |
---|