Rethinking the Data Model: The Drillbit Proof-of-Concept Library
The focus of many software architectures of the LHC experiments is to deliver a well-designed Event Data Model (EDM). Changes and additions to the stored data are often expensive, requiring large amounts of CPU time, disk storage and man-power. In addition, differing needs between groups of physicis...
Gespeichert in:
Veröffentlicht in: | Journal of physics. Conference series 2014-01, Vol.513 (4), p.42016-7 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The focus of many software architectures of the LHC experiments is to deliver a well-designed Event Data Model (EDM). Changes and additions to the stored data are often expensive, requiring large amounts of CPU time, disk storage and man-power. In addition, differing needs between groups of physicists lead to a tendency for common data formats to grow in terms of contained information whilst still not managing to service all needs. We introduce a new way of thinking about the data model based on the Dremel column store architecture published by Google. We present an EDM concept based on Dremel, which has the potential to significantly reduce the storage requirement for these common formats, decrease the time needed for independent physicists to compare their results and improve the speed at which data reprocessings can feasibly take place. The Dremel low-level encoding is implemented in a proof-of-concept C++ library called Drillbit, and it is shown that using a different encoding of the current data could save as much as 20% of disk space on average across a wide number of real-world derived data sets. |
---|---|
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/513/4/042016 |