Column Stores versus Search Engines and Applications to Search in Social Networks

Search engines and database systems both play important roles as we store and organize ever increasing amounts of information and still require the information to be easily accessible. Research on these two types of systems has traditionally been partitioned into two fields, information retrieval an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Bjørklund, Truls A
Format: Dissertation
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Search engines and database systems both play important roles as we store and organize ever increasing amounts of information and still require the information to be easily accessible. Research on these two types of systems has traditionally been partitioned into two fields, information retrieval and databases, and the integration of these two fields has been a popular research topic. Rather than attempting to integrate the two fields, this thesis begins with a comparison of the technical similarities between search engines and a specific type of database system often used in decision support systems: column stores. Based on an initial assessment of the technical similarities, which includes an evaluation of the feasibility of creating a hybrid system that supports both workloads, the papers in this thesis investigate how the identi_ed similarities can be used as a basis for improving the effciency of the different systems. To improve the efficiency of processing decision support workloads, the use of inverted indexes as an alternative to bitmap indexes is evaluated. We develop a query processing framework for compressed inverted indexes in decision support workloads and find that it outperforms state-of-the-art compressed bitmap indexes by being significantly more compact, and also improves the query processing e_ciency for most queries. Keyword search in social networks with access control is also addressed in this thesis, and a space of solutions is developed along two axes. One of the axes defines the set of inverted indexes that are used in the solution, and the other defines the meta-data used to filter out inaccessible results. With a exible and efficient search system based on a column-oriented storage system, we conduct a thorough set of experiments that illuminate the trade-offs between different extremes in the solution space. We also develop a hybrid scheme in between two of the best extremes. The hybrid approach uses cost models to find the most efficient solution for a particular workload. Together with an effcient query processing framework based on our novel HeapUnion operator, this results in a system that is e_cient for a wide range of workloads that consist of updates and searches with access control in a social network.