ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Semantic Compression and Pattern Extraction with Fascicles.

H. V. Jagadish, J. Madar, Raymond T. Ng: Semantic Compression and Pattern Extraction with Fascicles. VLDB 1999: 186-198
@inproceedings{DBLP:conf/vldb/JagadishMN99,
  author    = {H. V. Jagadish and
               J. Madar and
               Raymond T. Ng},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {Semantic Compression and Pattern Extraction with Fascicles},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {186-198},
  ee        = {db/conf/vldb/JagadishMN99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Often many recoords in a database share similar values for several attributes. If one is able to identify and group together records that share similar values for some - even if not all - attributes, one can both obtain a more parsimonious representation of the data, and gain useful insight into the data from a mining perspective.

In this paper, we introduce the notion of fascicles. A fascicle F(k,t) is a subset of records that have k compact attributes. An attribute A of a collection F of records is compact if the width of the range of A-values (for numeric attributes) or the number of distinct A-values (for categorial attributes) of all the records in F does not exceed t. We introduce and study two problems related to fascicles. First, we consider how to find fascicles such that the total storage of the relation is minimized. Second, we study how best to extract fascicles whose sizes exceed a given minimum threshold (i.e., support) and that represent patterns of maximal quality, where quality is measured by the pair (k,t). We develop algorithms to attack both of the above problems. We show that these two problems are very hard to solve optimally. But we demonstrate empirically that good solutions can be obtained using our algorithms.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD Conference 1998: 94-105 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996: 226-231 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
Christos Faloutsos, King-Ip Lin: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. SIGMOD Conference 1995: 163-174 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
...
[9]
...
[10]
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim: CURE: An Efficient Clustering Algorithm for Large Databases. SIGMOD Conference 1998: 73-84 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
Theodore Johnson, Tamraparni Dasu: Comparing Massive High-Dimensional Data Sets. KDD 1998: 229-233 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
...
[13]
Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo: Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994: 401-407 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Flip Korn, H. V. Jagadish, Christos Faloutsos: Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. SIGMOD Conference 1997: 289-300 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
Heikki Mannila, Hannu Toivonen: Levelwise Search and Borders of Theories in Knowledge Discovery. Data Min. Knowl. Discov. 1(3): 241-258(1997) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
Wee Keong Ng, Chinya V. Ravishankar: Block-Oriented Compression Techniques for Large Statistical Databases. IEEE Trans. Knowl. Data Eng. 9(2): 314-328(1997) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
Frank Olken, Doron Rotem: Rearranging Data to Maximize the Efficiency of Compression. PODS 1986: 78-90 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
Michael B. Smyth: Power Domains. J. Comput. Syst. Sci. 16(1): 23-36(1978) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conference 1996: 103-114 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Fri Mar 12 17:22:57 2010 by Michael Ley (ley@uni-trier.de)