Histograms are a lossy compression technique widely applied in various application contexts, like query optimization, statistical and temporal databases, OLAP applications, data streams, and so on. In most cases, accuracy in reconstructing from the histogram some original information, plays a crucial role. Thus, several proposals for constructing histograms trying to maximize their accuracy, have been given in the recent past. Besides bucket-based histograms (i.e., histograms whose construction is driven by the search of a ‘‘good’’ domain partition), there are different new histograms, characterized by more complex structures (like, for instance, wavelet-based histograms). This paper presents a new histogram, called nLT, belonging to the latter class. It is based on a hierarchical decomposition of the original data distribution kept in a full binary tree. This tree, containing a set of pre-computed hierarchical queries, uses bit saving for representing integer numbers, so that the reduced storage space allows us to increase the tree resolution and, consequently, its accuracy. Experimental comparison shows the superiority of nLT w.r.t. the state-of-the-art histograms.
Fast Range Query Estimation by N-Level Tree Histograms / Buccafurri, Francesco; Lax, Gianluca. - In: DATA & KNOWLEDGE ENGINEERING. - ISSN 0169-023X. - 51(2):(2004), pp. 257-275. [10.1016/j.datak.2004.04.004]
Fast Range Query Estimation by N-Level Tree Histograms
BUCCAFURRI, Francesco;LAX, Gianluca
2004-01-01
Abstract
Histograms are a lossy compression technique widely applied in various application contexts, like query optimization, statistical and temporal databases, OLAP applications, data streams, and so on. In most cases, accuracy in reconstructing from the histogram some original information, plays a crucial role. Thus, several proposals for constructing histograms trying to maximize their accuracy, have been given in the recent past. Besides bucket-based histograms (i.e., histograms whose construction is driven by the search of a ‘‘good’’ domain partition), there are different new histograms, characterized by more complex structures (like, for instance, wavelet-based histograms). This paper presents a new histogram, called nLT, belonging to the latter class. It is based on a hierarchical decomposition of the original data distribution kept in a full binary tree. This tree, containing a set of pre-computed hierarchical queries, uses bit saving for representing integer numbers, so that the reduced storage space allows us to increase the tree resolution and, consequently, its accuracy. Experimental comparison shows the superiority of nLT w.r.t. the state-of-the-art histograms.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.