30th International Conference on
Very Large Data Bases
Royal York Hotel
29 August - 3 September 2004
Toronto, Canada






Keynote Speech 1: Databases in a Wireless World

David Yach, VP Software, Research in Motion

Download the Presentation

Ballroom, Tuesday, 9:00-10:30

The traditional view of distributed databases is based on a number of database servers with regular communication. Today information is stored not only in these central databases, but on a myriad of computers and computer-based devices in addition to the central storage. These range from desktop and laptop computers to PDA's and wireless devices such as cellular phones and BlackBerry's. The combination of large centralized databases with a large number and variety of associated edge databases effectively form a large distributed database, but one where many of the traditional rules and assumptions for distributed databases are no longer true. This keynote will discuss some of the new and challenging attributes of this new environment, particularly focusing on the challenges of wireless and occasionally connected devices. It will look at the new constraints, how these impact the traditional distributed database model, the techniques and heuristics being used to work within these constraints, and identify the potential areas where future research might help tackle these difficult issues.

David Yach, VP Software, Research in Motion
David is Senior Vice President of Software at Research In Motion. David oversees and manages the development of software that has helped Research In Motion become a world leader in the mobile data communications market. With a Bachelor's degree in Math and an MBA, David has a solid portfolio of technical, operational and managerial experience from several senior-level positions. Prior to joining Research In Motion, David held the position of Vice President and Chief Architect at Sybase Inc.


Keynote Speech 2: Structures, Semantics and Statistics

Alon Halevy, University of Washington

Download the Presentation

Ballroom, Wednesday, 9:00-10:30

Integration of data from multiple sources is one of the longest standing problems facing the database research community. In addition to being a problem in most enterprises and in large-scale science projects, research on this topic has been fueled by the promise of querying the WWW. I will begin by highlighting some of the significant recent achievements in the field of data integration, and will then focus on what I consider to be the main challenges going forward, namely, large-scale reconciliation of semantic heterogeneity, and on-the-fly information integration. To address these challenges, I argue for an approach based on computing statistics over corpora of database structures. At a fundamental level, the key challenge in data integration is to reconcile the semantics of disparate data sets, each expressed with different database structures. Computing statistics over a large corpus of schemas and mappings offers a powerful methodology for producing semantic mappings, the expressions that specify such reconciliation. In essence, the statistics offer hints about the semantics of the symbols in the structures, thereby enabling to detect when two symbols, from disparate schemas, should be matched to each other. The same methodology can be applied to several other data management tasks that involve search in a space of complex structures. I will illustrate several examples where this approach has been successful.

Alon Halevy, University of Washington
Alon Halevy is an Associate Professor in the Computer Science and Engineering Department at the University of Washington. His interests are in data integration, management of XML data, peer-data management systems, query optimization, and the intersection of Database and AI technologies. His research developed several systems, such as the Information Manifold and Tukwila data integration systems, the Strudel web-site management system, and Piazza Peer-data Management System. In 1999, Dr. Halevy co-founded Nimble Technology, a data integration company, and in 2004 he founded Transformic Inc., a company focused on tools for bridging semantic heterogeneity. Dr. Halevy was a Sloan Fellow, and received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000. He serves on the editorial board of the VLDB Journal and the advisory board of the Journal of Artificial Intelligence Research, and served as the program chair of SIGMOD 2003. He received his B.Sc from the Hebrew University in Jerusalem in 1988, and his Ph.D in Computer Science from Stanford University in 1993.


10 Year Best Paper Award: Whither Data Mining?

Rakesh Agrawal, Ramakrishnan Srikant (IBM Almaden Research Center)

Ballroom, Thursday 9:00-10:30

The last decade has witnessed tremendous advances in data mining. We take a retrospective look at these developments, focusing on association rules discovery, and discuss the challenges and opportunities ahead.

Rakesh Agrawal, (IBM Almaden Research Center)
Rakesh Agrawal is an IBM Fellow, who leads the Intelligent Information Systems Research at the IBM Almaden Research Center. His current research interests include privacy and security technologies for data systems, web technologies, data mining and OLAP. He has pioneered fundamental concepts in data privacy, including Hippocratic Database, Sovereign Information Sharing, and Privacy-Preserving Data Mining. He earlier developed key data mining concepts and technologies. IBM's commercial data mining product, Intelligent Miner, grew out of this work. Rakesh Agrawal has published more than 100 research papers and he has been granted more than 50 patents. He is the recipient of the ACM-SIGKDD First Innovation Award, ACM-SIGMOD Edgar F. Codd Innovations Award, as well as the ACM-SIGMOD Test of Time Award. He is also a Fellow of IEEE and a Fellow of ACM. Scientific American named him to the list of 50 top scientists and technologists in 2003.

Ramakrishnan Srikant, (IBM Almaden Research Center)
Dr. Ramakrishnan Srikant is a Research Staff Member in the Intelligent Information Systems Research department at IBM Almaden Research Center, San Jose, CA. His research interests include privacy technologies for data systems, data mining, and web technologies. Dr. Srikant received the 2002 ACM Grace Murray Hopper Award "for his seminal work on mining association rules". Dr. Srikant has published more than 30 research papers that have been extensively cited. He has been granted 12 patents, and has another 9 pending patent applications. Dr. Srikant was a key architect for IBM's commercial data mining product, Intelligent Miner. He was named an IBM Research Division Master Inventor in 1999, and has also received 2 Outstanding Technical Achievement Awards for his contributions to Intelligent Miner. Dr. Srikant was the Program co-Chair of SIGKDD 2001. He has also served as the Tutorials Chair of KDD 2003, the Industrial Track co-Chair of PAKDD 2003, and co-Chair of the 1999 SIGMOD DMKD Workshop. Dr. Srikant received his M.S. and Ph.D. degrees in Computer Science from the University of Wisconsin, Madison in 1996. He also has a B. Tech. degree in Computer Science & Engineering from the Indian Institute of Technology, Madras, India.