ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Similarity Search in High Dimensions via Hashing.

Aristides Gionis, Piotr Indyk, Rajeev Motwani: Similarity Search in High Dimensions via Hashing. VLDB 1999: 518-529
@inproceedings{DBLP:conf/vldb/GionisIM99,
  author    = {Aristides Gionis and
               Piotr Indyk and
               Rajeev Motwani},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {Similarity Search in High Dimensions via Hashing},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {518-529},
  ee        = {db/conf/vldb/GionisIM99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the "curse of dimensionality." That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in k-d trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than brute-force linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should suffice for most practical purposes. In this paper, we examine a novel scheme for approximate similarity search based on hashing. The basic idea is to hash the points from the database so as to ensure that the probability of collision is much higher for objects that are close to each other than for those that are far apart. We provide experimental evidence that our method gives significant improvement in running time over other methods for searching in high-dimensional spaces based on hierarchical tree decomposition. Experimental results also indicate that our scheme scales well even for a relatively large number of dimensions (more than 50).

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
...
[2]
Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, Angela Y. Wu: An Optimal Algorithm for Approximate Nearest Neighbor Searching. SODA 1994: 573-582 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Jon Louis Bentley: Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM 18(9): 509-517(1975) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Stefan Berchtold, Daniel A. Keim: High-Dimensional Index Structures, Database Support for Next Decade's Applications (Tutorial). SIGMOD Conference 1998: 501 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
...
[6]
Timothy M. Chan: Approximate Nearest Neighbor Queries Revisited. Symposium on Computational Geometry 1997: 352-358 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
R. Scott Cost, Steven Salzberg: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10: 57-78(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998: 436-447 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
Paolo Ciaccia, Marco Patella, Pavel Zezula: A Cost Model for Similarity Queries in Metric Spaces. PODS 1998: 59-68 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman: Indexing by Latent Semantic Analysis. JASIS 41(6): 391-407(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
...
[13]
...
[14]
...
[15]
Christos Faloutsos, Ron Barber, Myron Flickner, Jim Hafner, Wayne Niblack, Dragutin Petkovic, William Equitz: Efficient and Effective Querying by Image Content. J. Intell. Inf. Syst. 3(3/4): 231-262(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[16]
...
[17]
Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker: Query by Image and Video Content: The QBIC System. IEEE Computer 28(9): 23-32(1995) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
Jerome H. Friedman, Jon Louis Bentley, Raphael A. Finkel: An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw. 3(3): 209-226(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
...
[20]
...
[21]
Trevor Hastie, Robert Tibshirani: Discriminant Adaptive Nearest Neighbor Classification. KDD 1995: 142-149 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[22]
...
[23]
...
[24]
Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998: 604-613 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
Kothuri Venkata Ravi Kanth, Divyakant Agrawal, Ambuj K. Singh: Dimensionality Reduction for Similarity Searching in Dynamic Databases. SIGMOD Conference 1998: 166-176 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
...
[28]
Norio Katayama, Shin'ichi Satoh: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. SIGMOD Conference 1997: 369-380 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[29]
Nathan Linial, Eran London, Yuri Rabinovich: The geometry of graphs and some of its algorithmic applications. FOCS 1994: 577-591 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[30]
...
[31]
...
[32]
...
[33]
Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Approximate Medians and other Quantiles in One Pass and with Limited Memory. SIGMOD Conference 1998: 426-435 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[34]
Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[35]
Rajeev Motwani, Prabhakar Raghavan: Randomized Algorithms. Cambridge University Press 1995, ISBN 0-521-47465-5
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[36]
...
[37]
Alex Pentland, Rosalind W. Picard, Stan Sclaroff: Photobook: Tools for Content-Based Manipulation of Image Databases. Storage and Retrieval for Image and Video Databases (SPIE) 1994: 34-47 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[38]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[39]
Hanan Samet: The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[40]
...
[41]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: Multidimensional Access Methods: Trees Have Grown Everywhere. VLDB 1997: 13-14 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[42]
...
[43]
John R. Smith, Shih-Fu Chang: Visually Searching the Web for Content. IEEE MultiMedia 4(3): 12-20(1997) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[44]
...
[45]
Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB 1998: 194-205 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:08 2010 by Michael Ley (ley@uni-trier.de)