ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Optimal Histograms with Quality Guarantees.

H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286
@inproceedings{DBLP:conf/vldb/JagadishKMPSS98,
  author    = {H. V. Jagadish and
               Nick Koudas and
               S. Muthukrishnan and
               Viswanath Poosala and
               Kenneth C. Sevcik and
               Torsten Suel},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Optimal Histograms with Quality Guarantees},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {275-286},
  ee        = {db/conf/vldb/JagadishKMPSS98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Histograms are commonly used to capture attribute value distribution statistics for query optimizers. More recently, histograms have also been considered as a way to produce quick approximate answers to decision support queries. This widespread interest in histograms motivates the problem of computing histograms that are good under a given error metric. In particular, we are interested in an efficient algorithm for choosing the bucket boundaries in a way that either minimizes the estimation error for a given amount of space (number of buckets) or, conversely, minimizes the space needed for a given upper bound on the error. Under the assumption that finding optimal bucket boundaries is computationally inefficient, previous research has focused on heuristics with no provable bounds on the quality of the solutions.

In this paper, we present algorithms for computing optimal bucket boundaries in time proportional to the square of the number of distinct data values, for a broad class of optimality metrics. This class includes the V-Optimality constraint, which has been shown to result in the most accurate histograms for several selectivity estimation problems. Through experiments, we show that optimal histograms can achieve substantially lower estimation errors than histograms produced by popular heuristics. We also present new heuristics with provably good space-accuracy tradeoffsthat are significantly faster than the optimal algorithm. Finally, we present an enhancement to traditional histograms that allows us to provide quality guarantees on individual selectivity estimates. In our experiments, these quality guarantees were highly effective in isolating outliers in selectivity estimates.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[CdB72]
...
[dB97]
...
[GES85]
...
[HS92]
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ioa93]
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[IP95]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[JKS98]
...
[Koo80]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LNS90]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MCS88]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MPS98]
...
[OR86]
Frank Olken, Doron Rotem: Simple Random Sampling from Relational Databases. VLDB 1986: 160-169 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PI97]
Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PIHS96]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SC84]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Zip49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Fri Mar 12 17:22:56 2010 by Michael Ley (ley@uni-trier.de)