30th International Conference on
Very Large Data Bases
Royal York Hotel
29 August - 3 September 2004
Toronto, Canada





Panel 1: Biological Data Management: Research, Practice and Opportunities

Territories, Tuesday 16:00-17:30

Moderator: Thodoros Topaloglou
Panelists: Susan B. Davidson, H. V. Jagadish, Victor M. Markowitz, Evan W. Steeg, Mike Tyers.

Biological research and drug development are routinely producing terabytes of data that need to be organized, queried and reduced to useful scientific knowledge. Although data management technology can provide solutions to problems, in practice the data needs of biomedical research are not well served. The goal of this panel is to expose the barriers blocking the effective application of advanced data management technology to biological data.
The current state of biological data management ranges from "malpractice" of database principles, to the reinvention of well-known data management methods, to the contribution of valuable new ideas to the state of the art in database research. On the other side, database research is often incompatible with the production requirements of the biomedical data operations: integration and interoperability of biological data sources, support for more meaningful data types, domain-aware querying interfaces, practical workflow management, and methods for evaluating data quality including data provenance are still considered open problems in biological data management, just as a decade ago. Our panelists bring wealth of experience in translating database research into biological data management tools and in communicating data requirements back to the database community. They will identify and debate the key challenges and opportunities to which the database community should contribute.

Thodoros Topaloglou, MDS Proteomics
Thodoros Topaloglou has worked with a wide range of biological data types including sequence, micro-array, proteomics and clinical data for which he developed querying and integration tools, and production infrastructures. He is presently a Senior Vice President of Scientific Computing at MDS Proteomics. In the past decade Thodoros held senior technical positions in several biotechnology and research organizations. From 2001 to 2002, he was Director of Data Management in Target Validation and Drug Discovery at Incyte Genomics. He was the Director of Gene Expression Data Management and played a lead role in the development of Gene Logic's Gene Express product between 1997 and 2001. He also held staff scientist positions at Lawrence Berkeley Laboratory and Affymetrix. Thodoros holds a PhD in Computer Science from the University of Toronto.

Susan B. Davidson, University of Pennsylvania (UPenn)
Susan B. Davidson received the B.A. degree in Mathematics from Cornell University, Ithaca, NY, in 1978, and the M.A. and Ph.D. degrees in Electrical Engineering and Computer Science from Princeton University, Princeton NJ, in 1980 and 1982. Dr. Davidson is currently a Professor in the Department of Computer and Information Science at the University of Pennsylvania (UPenn), where she has been since 1982. She is an ACM Fellow, a Fulbright scholar, and recently stepped down as founding co-Director of the Center for Bioinformatics at UPenn (PCBI).Preceeding the formation of the PCBI, Dr. Davidson was involved with planning and administering a 1994 NSF funded research training program in computational biology. She also helped establish degree programs in bioinformatics and computational biology run through the departments of Biology and Computer and Information Science, at UPenn. Dr. Davidson's research interests include database systems, database modeling, database integration, distributed systems, bioinformatics and real-time systems. Within bioinformatics she is best known for her work with the Kleisli data integration system, and more recently with XML as a data exchange and integration strategy.

H. V. Jagadish, University of Michigan, Ann Arbor
H. V. Jagadish is Professor of Electrical Engineering and Computer Science and Professor of Bioinformatics at University of Michigan, Ann Arbor since 1999. Prior to joining the University of Michigan, he worked at AT&T where he headed the database department. He has recently co-organized a workshop on data management issues for molecular and cell biology and guest edited a special issue, devoted to data management for biology, of the OMICS journal on integrative biology. He obtained his Ph. D. from Stanford in 1985.

Victor M. Markowitz, JGI and Lawrence Berkeley National Laboratory
Victor M. Markowitz, D.Sc., is Chief Informatics Officer at JGI and Head of the recently established Biological Data Management and Technology Center at Lawrence Berkeley National Laboratory. Until 2003 he was Chief Information Officer and Senior Vice President, Data Management Systems, at Gene Logic Inc., where he was responsible for the development and deployment of Gene Logic's data management products and software tools. Prior to joining Gene Logic, Dr. Markowitz was at Lawrence Berkeley National Laboratory where he lead the development of the Object Protocol Model data management and integration tools that have been used for developing public and commercial genome databases. Dr. Markowitz received his M.Sc. and D.Sc. degrees in computer science from Technion, the Israel Institute of Technology. Dr. Markowitz has authored articles and book chapters on various aspects of data management. He has served on review panels and program committees for database and bioinformatics programs and conferences.

Evan W. Steeg
Evan W. Steeg has developed algorithms and software for pattern discovery and computational molecular biology for twenty years in academia, government and industry. From DNA sequence analysis to protein structure prediction and x-ray crystallography, to microarray experiments and clinical outcome prediction, Dr. Steeg has faced many of the challenges of pulling valuable knowledge from biological databases. Formerly a scientific co-founder and President & CEO of Molecular Mining Corporation, he now consults and is writing a book on life sciences data mining for John Wiley & Sons. He received a B.A. in Mathematics from Cornell University, and M.Sc. and Ph.D. degrees in Computer Science from the University of Toronto.

Mike Tyers, Samuel Lunenfeld Research Institute and U. of Toronto
Mike Tyers is a Senior Investigator at the Samuel Lunenfeld Research Institute of Mount Sinai Hospital and Professor in the Department of Medical Genetics and Microbiology at the University of Toronto. He received his Ph.D in Biochemistry from McMaster University in 1989 and was a postdoctoral fellow at Cold Spring Harbor Laboratory from 1989-1992. Dr. Tyers currently holds a Canada Research in Proteomics, Functional Genomics and Bioinformatics. His laboratory uses biochemical, genetic and genome-wide approaches to elucidate mechanisms that control the cell division cycle. Current research interests include ubiquitin-dependent degradation pathways that control the G1/S transition, genome stability and gene expression, phosphorylation-dependent protein interactions, mechanisms that couple growth and division, high throughput analysis, archival and visualization of protein and genetic interactions, and mathematical modeling of signal transduction events.

Panel 2: Where is Business Intelligence taking today's database systems?

Confederation 5 & 6, Thursday 11:00-12:30

Moderator: William O'Connell
Panelists: Andy Witkowski, Ramesh Bhashyam, Surajit Chauduri, Nigel Campbell

The invention of technology made Business Intelligence (BI) possible over relational engines, but now the experiences of putting them into production has unearthed a new set of problems in need of further invention. Over a period of few past years, academia has provided very performant and storage efficient technologies for fundamental BI objects. Database industry either incorporated into their sql engines some of these algorithms (like data mining algorithms, OLAP engines), or tried tointegrate better stand alone BI engines like OLAP, or provided their own unique solutions for BI. This panel will discuss a few of these emerging issues and trends. The intent is not to overview individual products and or solutions, nor to provide a background on BI solutions. But, it is to point out select trends and issues, as well as old issues that are still very real. The panelists will also describe why these issues are important for the research community. The initial questions posed to the panelists will be twofold. First, is where should the BI community go on extending the SQL Language bindings (such as for data mining, reporting, and data analysis), as well as associated DBMS implications to support new and existing BI extensions. And, Second, what does XML mean to BI, as well as associated DBMS implications.

Bill O'Connell, IBM Toronto Lab
Bill O'Connell in the lead BI architect for DB2 database engineering. Bill is Senior Technical Staff Member within the IBM Toronto Lab and is responsible for DB2 BI and Warehousing technology for the database server. He is also driving DB2 enablement within the end-to-end BI Data Warehousing solution stack. Bill has been working in database development for 18 years, and received his Ph.D at Illinois Institute of Technology. Bill has also worked at NCR-Teradata, as well as AT&T Bell Labs in database development.

Andy Witkowski, Oracle
Andy Witkowski is an Architect for the Oracle Data Warehousing and Business Intelligence Groups. He has actively worked on extending RDBMS with language, algorithms and data structures for BI for the past six years. His interest include materialized views, Business Intelligence calculations in SQL like SQL Spreadsheet, Analytic and Statistical functions, compressed cubes, and query optimization. Andy has been working in database technology for 15 years, and received his Ph.D. at University of Southern California.

Ramesh Bhashyam, NCR-Teradata
Ramesh Bhashyam is the CTO for Teradata Database Engineering at NCR-Teradata. He is responsible for all the database server aspects of Teradata DBMS engineering. Ramesh has a Masters in Computer Science and a Bachelor's in Electrical Engineering.

Surajit Chaudhuri, Microsoft Research
Surajit Chaudhuri leads the Data Management and Exploration Group at Microsoft Research http://research.microsoft.com/dmx. In 1996, Surajit started the AutoAdmin project on self-tuning database systems at Microsoft Research and developed novel automated physical design tuning technology for SQL Server 7.0, SQL Server 2000 and SQL Server 2005. Technology on data mining from the group was incorporated in SQL Server 2000. More recently, Surajit has initiated work in the area of data cleaning and integration. Surajit's most recent project is Data Exploration which looks at the problem of flexible querying of information that spans text as well as relational data. Surajit did his Ph.D. from Stanford University in 1991 and worked at Hewlett-Packard Laboratories, Palo Alto from 1991-1995. He has published widely in major database conferences.