ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Practical Selectivity Estimation through Adaptive Sampling.

Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11
@inproceedings{DBLP:conf/sigmod/LiptonNS90,
  author    = {Richard J. Lipton and
               Jeffrey F. Naughton and
               Donovan A. Schneider},
  editor    = {Hector Garcia-Molina and
               H. V. Jagadish},
  title     = {Practical Selectivity Estimation through Adaptive Sampling},
  booktitle = {Proceedings of the 1990 ACM SIGMOD International Conference on
               Management of Data, Atlantic City, NJ, May 23-25, 1990},
  publisher = {ACM Press},
  year      = {1990},
  pages     = {1-11},
  ee        = {http://doi.acm.org/10.1145/93597.93611, db/conf/sigmod/LiptonNS90.html},
  crossref  = {DBLP:conf/sigmod/90},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm, in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide "sanity bounds" to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.

Copyright © 1990 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Hector Garcia-Molina, H. V. Jagadish (Eds.): Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23-25, 1990. ACM Press 1990 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 19(2), June 1990
Contents

Online Edition: ACM Digital Library


References

[BDT83]
Dina Bitton, David J. DeWitt, Carolyn Turbyfill: Benchmarking Database Systems A Systematic Approach. VLDB 1983: 8-19 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Chr83a]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Chr83b]
Stavros Christodoulakis: Estimating Block Selectivities. Inf. Syst. 9(1): 69-79,(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Dem80]
Robert Demolombe: Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language. VLDB 1980: 55-63 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fed84]
Jane Fedorowicz: Database Evaluation Using Multiple Regression Techniques. SIGMOD Conference 1984: 70-76 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Fel68]
...
[HOT88]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Statistical Estimators for Relational Algebra Expressions. PODS 1988: 276-287 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HOT89]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Processing Aggregate Relational Queries with Hard Time Constraints. SIGMOD Conference 1989: 68-77 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HTY82]
Larry Kerschberg, Peter D. Ting, S. Bing Yao: Query Optimization in Star Computer Networks. ACM Trans. Database Syst. 7(4): 678-711(1982) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KK85]
Nabil Kamel, Roger King: A Model of Data Distribution Based on Texture Analysis. SIGMOD Conference 1985: 319-325 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Koo80]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LN89]
Richard J. Lipton, Jeffrey F. Naughton: Estimating the Size of Generalized Transitive Closures. VLDB 1989: 165-171 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LN90]
Richard J. Lipton, Jeffrey F. Naughton: Query Size Estimation by Adaptive Sampling. PODS 1990: 40-46 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LNS90]
...
[Lyn88]
Clifford A. Lynch: Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values. VLDB 1988: 240-251 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MCS88]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MD88]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MDL83]
Anthony Y. Montgomery, Daryl J. D'Souza, S. B. Lee: The Cost of Relational Algebraic Operations on Skewed Data: Estimates and Experiments. IFIP Congress 1983: 235-241 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MK85]
...
[OR86]
Frank Olken, Doron Rotem: Simple Random Sampling from Relational Databases. VLDB 1986: 160-169 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[OR89]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PSC84]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SAC+79]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Fri Mar 12 17:21:28 2010 by Michael Ley (ley@uni-trier.de)