Tutorial 1:
Foundations of Automated Database Tuning
Tuesday, September 12, 11:00-12:30 and 14:00-15:30
Room 320

Instructor: Surajit Chaudhuri, Gerhard Weikum

Abstract:
One way of reducing the total cost of ownership for information systems infrastructure is by making systems more self-managing. A particularly difficult piece of the ambitious vision is the automation of database performance tuning. In this tutorial, we will study the progress made thus far on this important problem. Specifically, we will try to identify the key principles and paradigms. The tutorial is addressed to graduate students and scientific researchers who consider working on automated database tuning.

Instructor Biography:
Surajit Chaudhuri leads the Data Management and Exploration Group in Microsoft Research. He started the AutoAdmin research project in 1996 focusing on self-tuning database systems. Index Tuning Wizard (SQL Server 7.0 and SQL Server 2000) and Database Tuning Advisor in SQL Server 2005 are built using technology from the AutoAdmin project. Surajit is an ACM Fellow and was awarded the SIGMOD Contributions Award in 2004.

Gerhard Weikum is a Scientific Director at the Max-Planck Institute for Informatics in Saarbruecken, Germany, where he is leading the research group on databases and information systems. He received the VLDB 2002 ten-year award for his work on automated tuning, and he is an ACM Fellow.

 

 

Tutorial 2: Streaming in a Connected World: Querying and Tracking Distributed Data Streams
Wednesday, September 13, 11:00-12:30 and 14:00-15:30  
Room 320

Instructor: Graham Cormode, Minos Garofalakis

Abstract:
Today, a majority of data is fundamentally distributed in nature. Data for almost any task is collected over a broad area, and streams in at a much greater rate than ever before. In particular, advances in sensor technology and miniaturization have led to the concept of the sensor network: a (typically wireless) collection of sensing devices collecting detailed data about their surroundings. A fundamental question arises: how to query and monitor this rich new source of data? Similar scenarios emerge within the context of monitoring more traditional, wired networks, and in other emerging models such as P2P networks and grid-based computing. In all cases, if we can perform more computational work within the network to reduce the communication needed, then we can reduce bandwidth usage and improve performance. We consider two broad classes of approaches to such in-network query processing, by analogy to query types in traditional DBMSs. In the one shot model, a query is issued by a user at some site, and must be answered based on the current state of data in the network. We identify several possible approaches to this problem. For simple queries, partial computation of the result over a tree can reduce the data transferred significantly. For "holistic'' queries, such as medians, count distinct and so on, clever composable summaries give a compact way to accurately approximate query answers. Lastly, careful modeling of correlations between measurements and other trends in the data can further reduce the number of sensors probed. In the continuous model, a query is placed by a user which requires the answer to be available continuously. This yields yet further challenges, since even using tree computation, summarization and modeling, we cannot afford to communicate every time new data is received by one of the remote sites. Instead, the result of work on this problem has been a new tradeoff of reduced accuracy in the query answer for reduced communication cost. This has led to a variety of techniques for different query types to apportion the available " uncertainty'' in the query answer between different sites, and to model the evolution of measured values to anticipate future values and so reduce communication further. Our objective in this tutorial is to discuss the algorithmic foundations of this new world, illustrate some of the powerful techniques that have been developed to address these challenges, and outline interesting directions for future work in the area.

Instructor Biography:
Graham Cormode is a Member of Technical Staff in the Communication Protocols and Internetworking Research Center of Bell Laboratories. He joined Bell Labs in 2004 after a postdoctoral position at the DIMACS center in Rutgers University, and a Ph.D. in computer science from the University of Warwick, in 2002.

Minos Garofalakis is a Senior Research Scientist with Intel Research Berkeley. He obtained his Ph.D. from the University of Wisconsin-Madison in 1998, and joined Intel Research in July 2005, after spending 6.5 years as a Member of Technical Staff with Bell Labs in Murray Hill , NJ. His current research interests include data streaming, approximate query processing, network-data management, and XML databases. He has presented tutorials on data streaming and approximate query processing at several of the top data-management forums (including VLDB'2001, ACM SIGMOD'2002, VLDB'2002, and ACM SIGKDD'2002), and is the co-editor for an upcoming Springer-Verlag volume on Data-Stream Management. In addition to serving on a number of program committees for conferences in the data-management area, Minos also currently serves as on the editorial board of Foundations and Trends in Databases and the IEEE Data Engineering Bulletin, and is the Core Database Technology PC Chair for the VLDB'2007 conference in Vienna, Austria.

 

 

Tutorial 3: Query Co-Processing on Commodity Processors
Thursday, September 14, 11:00-12:30 and 14:00-15:30
Room 320

Instructor: Anastassia Ailamaki, Naga Govindaraju, Stavros Harizopoulos, Dinesh Manocha

Abstract:
The rapid increase in the data volumes for the past few decades has intensified the need for high processing power for database and data mining applications. Researchers have actively sought to design and develop new architectures for improving the performance. Recent research shows that the performance can be significantly improved using either (a) effective utilization of architectural features and memory hierarchies used by the conventional processors, or (b) the high computational power and memory bandwidth in commodity hardware such as network processing units (NPUs), Cell processors and graphics processing units (GPUs). This tutorial will survey the micro-architectural and architectural differences across these processors with data management in mind. We will briefly survey the computer architecture and database literature on evaluating database application performance on conventional processors. We will describe strategies to reduce memory and resource stalls using data parallel algorithms, cache-coherent data structures, instruction buffering algorithms, and better storage models. We also describe how many of the essential computational components for database and data mining algorithms such as relational database operations, stream data mining, linear algebra and sorting operations can be efficiently implemented on NPUs, GPUs, and Cell processors. In many cases, they outperform the fastest CPU-based implementations. We will discuss some of the performance growth characteristics of three processors based on the computer architecture principles such as power consumption, and memory-processor speed differences. We will discuss the open problems and future opportunities for expanding query processing algorithms to other hardware than general-purpose processors. In addition to the database community, we intend to increase awareness in the computer architecture scene about opportunities to construct heterogeneous chips (chip multiprocessors with different architectures in them).

Instructor Biography:
Anastassia (Natassa) Ailamaki received a B.Sc. degree in Computer Engineering from the Polytechnic School of the University of Patra, Greece, M.Sc. degrees from the Technical University of Crete, Greece and from the University of Rochester, NY, and a Ph.D. degree in Computer Science from the University of Wisconsin-Madison. In 2001, she joined the Computer Science Department at Carnegie Mellon University as an Assistant Professor. Her research interests are in the broad area of database systems and applications, with emphasis on database system behavior on modern processor hardware and disks. Her projects at Carnegie Mellon (including Staged Database Systems, Cache-Resident Data Bases, and the Fates Storage Manager), aim at building systems to strengthen the interaction between the database software and the underlying hardware and I/O devices. In addition, she is working on automated schema design and computational database support for scientific applications, storage device modeling and performance prediction, as well as internet query caching. Natassa has received a Sloan Research Fellowship (2005), six best-paper awards (VLDB 2001, Performance 2002, VLDB PhD Workshop 2003, ICDE 2004, FAST 2005, and ICDE 2006 (demo)), an NSF CAREER award (2002), and IBM Faculty Partnership awards in 2001, 2002, and 2003. She is a member of IEEE and ACM, and has also been a CRA-W mentor.

Naga Govindaraju is currently a research assistant professor in the Department of Computer Science at the University of North Carolina, Chapel Hill. He received a B. Tech degree in Computer Science from the Indian Institute of Technology, Bombay in 2001, and M.S. and Ph.D. degrees in Computer Science from the University of North Carolina at Chapel Hill in 2003 and 2004, respectively. Naga's research focuses on the effective utilization of commodity graphics processors to solve several computational problems in computer graphics, databases and high performance computing. He received the IEEE VR PRESENCE best paper award in 2005 and the Indy PennySort award in 2006 for designing the world's fastest price-to-performance large data management system. He has published over 30 peer-reviewed articles in major graphics, database and HPC conferences such as ACM SIGGRAPH, ACM SIGMOD and ACM SuperComputing. Naga has organized and presented tutorials at ACM SIGGRAPH, Eurographics, and IEEE ICDE. He has also served in the program committees of many conferences.

Stavros Harizopoulos is currently a Post-Doctoral researcher, working at the Database group of M.I.T. He received a Diploma in Electrical and Computer Engineering from the Technical University of Crete, Greece in 1998, and M.Sc. and Ph.D. degrees in Computer Science from Carnegie Mellon University in 2000 and 2005, respectively. His research interests are in database system design, implementation, and performance. Stavros's Ph.D. thesis research focused on improving database system performance by taking into account modern hardware architectures. Currently, he is investigating column-oriented database system designs for high-performance data warehouses. Stavros is a recipient of a best-demonstration award (ICDE 2006), an IBM Ph.D. fellowship, a Lilian Voudouri Foundation Ph.D. fellowship, and the Best Polytechnic Student Award from the Technical Chamber of Greece.

Dinesh Manocha is currently Mason distinguished professor of computer science at the University of North Carolina at Chapel Hill. He received his B.Tech. degree in computer science and engineering from the Indian Institute of Technology, Delhi in 1987; M.S. and Ph.D. in computer science at the University of California at Berkeley in 1990 and 1992, respectively. During the summers of 1988 and 1989, he was a visiting researcher at the Olivetti Research Lab and General Motors Research Lab, respectively. He received Alfred and Chella D. Moore fellowship and IBM graduate fellowship in 1988 and 1991, respectively, and a Junior Faculty Award in 1992. He was selected an Alfred P. Sloan Research Fellow, received NSF Career Award in 1995 and Office of Naval Research Young Investigator Award in 1996, Honda Research Initiation Award in 1997, and Hettleman Prize for scholarly achievement at UNC Chapel Hill in 1998. He has also received eight best paper and panel awards at the ACM SuperComputing, ACM Multimedia, ACM Solid Modeling, Pacific Graphics, IEEE VR, IEEE Visualization and Eurographics Conferences. His research interests include geometric and solid modeling, interactive computer graphics, physically-based modeling, virtual environments, robotics and scientific computation. His research has been sponsored by ARO, DARPA, DOE, Honda, Intel, NSF, ONR and Sloan Foundation. He has published more than 200 papers in leading conferences and journals on computer graphics, geometric and solid modeling, robotics, symbolic and numeric computation, virtual reality, molecular modeling and computational geometry. He has served as a program committee member for many leading conferences on virtual reality, computer graphics, computational geometry, geometric and solid modeling, animation and molecular modeling. He was the program co-chair for the first ACM Siggraph workshop on simulation and interaction in virtual environments and program chair of first ACM Workshop on Applied Computational Geometry. He was the guest co-editor of special issues of International Journal of Computational Geometry and Applications. He has served on the editorial boards of IEEE Transactions on Visualization and Computer Graphics, Graphical Models and Imaging Processing, and Journal of Applicable Algebra.

 

 

Tutorial 4: A Decade of Progress in Indexing and Mining Large Time Series Databases
Thursday, September 14, 16:00-17:30    
Room 320

Instructor: Eamonn Keogh

Abstract:
Time series data is ubiquitous; large volumes of time series data are routinely created in scientific, industrial, entertainment, medical and biological domains. Examples include gene expression data, electrocardiograms, electroencephalograms, gait analysis, stock market quotes, space telemetry etc. Although statisticians have worked with time series for more than a century, many of their techniques hold little utility for researchers working with massive time series databases. A decade ago, a seminal paper by Faloutsos, Ranganathan, Manolopoulos appeared in SIGMOD. The paper, Fast Subsequence Matching in Time-Series Databases, has spawned at least a thousand references and extensions in the database/ data mining and information retrieval communities. This tutorial will summarize the decade of progress since this influential paper appeared.

Instructor Biography:
Dr. Keoghs research interests are in Data Mining, Machine Learning and Information Retrieval. He has published papers on time series in the VLDB, SIGMOD, SIGKDD, SIGIR, SIGGRAPH, ICML, EDBT, PKDD, PAKDD, IEEE ICDM, IEEE ICDE, SIAM SDM, IDEAL, FQAS, SSDM, AI and INTERFACE conferences and in the TODS, DMKD, KAIS, INFORMATION VISUALIZATION, VLDB and IJTAI journals. Several of his papers have won "best paper" awards. In addition he has won several teaching awards. He is the recipient of a 5-year NSF Career Award for "Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases" and a grant from Aerospace Corp to develop a time series visualization tool for monitoring space launch telemetry. His papers on time series data mining have been referenced well over 2,000 times (see www.cs.ucr.edu/~eamonn/selected_publications.htm).


 

 

Tutorial 5: Randomized Algorithms for Matrices and Massive Data Sets
Friday, September 15, 11:00-12:30
Room 320

Instructor: Petros Drineas, Michael Mahoney

Abstract:
The tutorial will cover randomized sampling algorithms that extract structure from very large data sets modeled as matrices or tensors. Both provable algorithmic results and recent work on applying these methods to large biological and internet data sets will be discussed.

Instructor Biography:
Petros Drineas is an Assistant Professor in the Computer Science Department at Rensselaer Polytechnic Institute. His research interests include design and analysis of randomized and approximation algorithms for linear algebra problems, and their applications in information retrieval and data mining. He received his Ph.D. from the Department of Computer Science at Yale University with a dissertation in randomized algorithms for matrix operations and their applications. His web page is: http://www.cs.rpi.edu/~drinep.

Michael W. Mahoney is a Senior Research Scientist at Yahoo Research; prior to joining Yahoo in August 2005, he was an Assistant Professor at Yale University in the Department of Mathematics. His research interests include the design and analysis of randomized and approximation algorithms for large linear algebra problems, and the application of these algorithms to the analysis of massive scientific and internet data sets. He received his Ph.D. from the Department of Physics at Yale University with a dissertation in computational liquid state statistical mechanics. His web page is: http://www.cs.yale.edu/homes/mmahoney/.