1: Biological Data Management: Research, Practice and Opportunities
Panelists: Susan B. Davidson, H. V. Jagadish, Victor M. Markowitz, Evan
W. Steeg, Mike Tyers.
and drug development are routinely producing terabytes of data that need
to be organized, queried and reduced to useful scientific knowledge. Although
data management technology can provide solutions to problems, in practice
the data needs of biomedical research are not well served. The goal of
this panel is to expose the barriers blocking the effective application
of advanced data management technology to biological data.
The current state of biological data management ranges from "malpractice"
of database principles, to the reinvention of well-known data management
methods, to the contribution of valuable new ideas to the state of the
art in database research. On the other side, database research is often
incompatible with the production requirements of the biomedical data operations:
integration and interoperability of biological data sources, support for
more meaningful data types, domain-aware querying interfaces, practical
workflow management, and methods for evaluating data quality including
data provenance are still considered open problems in biological data
management, just as a decade ago. Our panelists bring wealth of experience
in translating database research into biological data management tools
and in communicating data requirements back to the database community.
They will identify and debate the key challenges and opportunities to
which the database community should contribute.
Thodoros Topaloglou has worked with a wide range of biological data types
including sequence, micro-array, proteomics and clinical data for which
he developed querying and integration tools, and production infrastructures.
He is presently a Senior Vice President of Scientific Computing at MDS
Proteomics. In the past decade Thodoros held senior technical positions
in several biotechnology and research organizations. From 2001 to 2002,
he was Director of Data Management in Target Validation and Drug Discovery
at Incyte Genomics. He was the Director of Gene Expression Data Management
and played a lead role in the development of Gene Logic's Gene Express
product between 1997 and 2001. He also held staff scientist positions
at Lawrence Berkeley Laboratory and Affymetrix. Thodoros holds a PhD in
Computer Science from the University of Toronto.
Susan B. Davidson, University of Pennsylvania (UPenn)
Susan B. Davidson received the B.A. degree in Mathematics from Cornell
University, Ithaca, NY, in 1978, and the M.A. and Ph.D. degrees in Electrical
Engineering and Computer Science from Princeton University, Princeton
NJ, in 1980 and 1982. Dr. Davidson is currently a Professor in the Department
of Computer and Information Science at the University of Pennsylvania
(UPenn), where she has been since 1982. She is an ACM Fellow, a Fulbright
scholar, and recently stepped down as founding co-Director of the Center
for Bioinformatics at UPenn (PCBI).Preceeding the formation of the PCBI,
Dr. Davidson was involved with planning and administering a 1994 NSF funded
research training program in computational biology. She also helped establish
degree programs in bioinformatics and computational biology run through
the departments of Biology and Computer and Information Science, at UPenn.
Dr. Davidson's research interests include database systems, database modeling,
database integration, distributed systems, bioinformatics and real-time
systems. Within bioinformatics she is best known for her work with the
Kleisli data integration system, and more recently with XML as a data
exchange and integration strategy.
H. V. Jagadish,
University of Michigan, Ann Arbor
H. V. Jagadish is Professor of Electrical Engineering and Computer Science
and Professor of Bioinformatics at University of Michigan, Ann Arbor since
1999. Prior to joining the University of Michigan, he worked at AT&T
where he headed the database department. He has recently co-organized
a workshop on data management issues for molecular and cell biology and
guest edited a special issue, devoted to data management for biology,
of the OMICS journal on integrative biology. He obtained his Ph. D. from
Stanford in 1985.
Victor M. Markowitz,
JGI and Lawrence Berkeley National Laboratory
Victor M. Markowitz, D.Sc., is Chief Informatics Officer at JGI and Head
of the recently established Biological Data Management and Technology
Center at Lawrence Berkeley National Laboratory. Until 2003 he was Chief
Information Officer and Senior Vice President, Data Management Systems,
at Gene Logic Inc., where he was responsible for the development and deployment
of Gene Logic's data management products and software tools. Prior to
joining Gene Logic, Dr. Markowitz was at Lawrence Berkeley National Laboratory
where he lead the development of the Object Protocol Model data management
and integration tools that have been used for developing public and commercial
genome databases. Dr. Markowitz received his M.Sc. and D.Sc. degrees in
computer science from Technion, the Israel Institute of Technology. Dr.
Markowitz has authored articles and book chapters on various aspects of
data management. He has served on review panels and program committees
for database and bioinformatics programs and conferences.
Evan W. Steeg
Evan W. Steeg has developed algorithms and software for pattern discovery
and computational molecular biology for twenty years in academia, government
and industry. From DNA sequence analysis to protein structure prediction
and x-ray crystallography, to microarray experiments and clinical outcome
prediction, Dr. Steeg has faced many of the challenges of pulling valuable
knowledge from biological databases. Formerly a scientific co-founder
and President & CEO of Molecular Mining Corporation, he now consults
and is writing a book on life sciences data mining for John Wiley &
Sons. He received a B.A. in Mathematics from Cornell University, and M.Sc.
and Ph.D. degrees in Computer Science from the University of Toronto.
Samuel Lunenfeld Research Institute and U. of Toronto
Mike Tyers is a Senior Investigator at the Samuel Lunenfeld Research Institute
of Mount Sinai Hospital and Professor in the Department of Medical Genetics
and Microbiology at the University of Toronto. He received his Ph.D in
Biochemistry from McMaster University in 1989 and was a postdoctoral fellow
at Cold Spring Harbor Laboratory from 1989-1992. Dr. Tyers currently holds
a Canada Research in Proteomics, Functional Genomics and Bioinformatics.
His laboratory uses biochemical, genetic and genome-wide approaches to
elucidate mechanisms that control the cell division cycle. Current research
interests include ubiquitin-dependent degradation pathways that control
the G1/S transition, genome stability and gene expression, phosphorylation-dependent
protein interactions, mechanisms that couple growth and division, high
throughput analysis, archival and visualization of protein and genetic
interactions, and mathematical modeling of signal transduction events.
2: Where is Business Intelligence taking today's database systems?
5 & 6, Thursday 11:00-12:30
Panelists: Andy Witkowski, Ramesh Bhashyam, Surajit Chauduri, Nigel Campbell
The invention of technology
made Business Intelligence (BI) possible over relational engines, but
now the experiences of putting them into production has unearthed a new
set of problems in need of further invention. Over a period of few past
years, academia has provided very performant and storage efficient technologies
for fundamental BI objects. Database industry either incorporated into
their sql engines some of these algorithms (like data mining algorithms,
OLAP engines), or tried tointegrate better stand alone BI engines like
OLAP, or provided their own unique solutions for BI. This panel will discuss
a few of these emerging issues and trends. The intent is not to overview
individual products and or solutions, nor to provide a background on BI
solutions. But, it is to point out select trends and issues, as well as
old issues that are still very real. The panelists will also describe
why these issues are important for the research community. The initial
questions posed to the panelists will be twofold. First, is where should
the BI community go on extending the SQL Language bindings (such as for
data mining, reporting, and data analysis), as well as associated DBMS
implications to support new and existing BI extensions. And, Second, what
does XML mean to BI, as well as associated DBMS implications.
IBM Toronto Lab
Bill O'Connell in the lead BI architect for DB2 database engineering.
Bill is Senior Technical Staff Member within the IBM Toronto Lab and is
responsible for DB2 BI and Warehousing technology for the database server.
He is also driving DB2 enablement within the end-to-end BI Data Warehousing
solution stack. Bill has been working in database development for 18 years,
and received his Ph.D at Illinois Institute of Technology. Bill has also
worked at NCR-Teradata, as well as AT&T Bell Labs in database development.
Andy Witkowski is an Architect for the Oracle Data Warehousing and Business
Intelligence Groups. He has actively worked on extending RDBMS with language,
algorithms and data structures for BI for the past six years. His interest
include materialized views, Business Intelligence calculations in SQL
like SQL Spreadsheet, Analytic and Statistical functions, compressed cubes,
and query optimization. Andy has been working in database technology for
15 years, and received his Ph.D. at University of Southern California.
Ramesh Bhashyam is the CTO for Teradata Database Engineering at NCR-Teradata.
He is responsible for all the database server aspects of Teradata DBMS
engineering. Ramesh has a Masters in Computer Science and a Bachelor's
in Electrical Engineering.
Surajit Chaudhuri, Microsoft Research
Surajit Chaudhuri leads the Data Management and Exploration Group at Microsoft
Research http://research.microsoft.com/dmx. In 1996, Surajit started the
AutoAdmin project on self-tuning database systems at Microsoft Research
and developed novel automated physical design tuning technology for SQL
Server 7.0, SQL Server 2000 and SQL Server 2005. Technology on data mining
from the group was incorporated in SQL Server 2000. More recently, Surajit
has initiated work in the area of data cleaning and integration. Surajit's
most recent project is Data Exploration which looks at the problem of
flexible querying of information that spans text as well as relational
data. Surajit did his Ph.D. from Stanford University in 1991 and worked
at Hewlett-Packard Laboratories, Palo Alto from 1991-1995. He has published
widely in major database conferences.