This tutorial describes two topics which, until recently, evolved in parallel: semistructured data and XML. XML is the recently adopted W3C standard intended to be the universal Web data exchange format. Semistructured data describes data that is irregular, schema-less, and self-describing, and has been studied by the database research community. There are some overlaps, but also some fundamental differences between the two approaches, due to the different cultures of the communities where they originated: the database and the document community. This tutorial compares and contrasts the two developments. It will cover in some detail: data models (OEM, XML, RDF), query languages (Lore, UnQL, StruQL, XML-QL, XSL), schema formalisms (including DTD's and RDF-Schema), and it will discuss some systems issues.
Dan Suciu is a member of the technical staff at AT&T Labs - Research. His current interests are semistructured data, query optimisation and processing, and the interaction between query languages and programming languages. Dan Suciu has been involved in several projects related to semistructured data: UnQL, Strudel, and XML-QL, and is currently writing a book with Serge Abiteboul and Peter Buneman, entitled "Data on the Web: From Relations to Semistructured Data and XML", to be published by Morgan Kaufmann.
Dan Suciu has served on several program committees and coedited two special journal issues related to semistructured data and the Web. He received his PhD from the University of Pennsylvania and his BS from the Polytechnic University of Bucharest, Romania.
Tutorial 2: Metasearch Engines:
Solutions and Challenges
Many information search services or search engines have been installed in the Internet in recent years. Different search engines may employ different techniques to represent and rank documents, and they usually provide access to different sets of documents of diverse interest. Frequently, a user's information needs are stored in the databases of multiple local search engines. As the number of search engines increases it is inconvenient for an ordinary user to use them all and identify useful documents from the results returned. Consequently, there is an increasing need for automatic search brokers (metasearch engines) which can invoke multiple search engines. Through a search broker, only a single query is needed from a user to retrieve desired documents.
This tutorial will provide an overview of proposed methods for building an effective and efficient metasearch engine. The tutorial will specifically focus on different solutions to two problems. The first is database selection, the identification of search engines that are likely to return useful results to a given query. The second is collection fusion, the determination of which documents should be retrieved from each identified search engine and how to merge results from multiple search engines into a single ranked list. The tutorial will also point out some problems that need to be further researched.
Clement Yu is currently a professor in the Department of Electrical Engineering and Computer Science at the University of Illinois at Chicago, USA. He was the chairman of the ACM Special Interest Group on Information Retrieval (SIGIR) and has chaired several national and international conferences and workshops on databases and information/multimedia. His professional editorial activities include IEEE Transactions on Knowledge and Data Engineering, Distributed and Parallel Databases, and the International Journal of Software Engineering and Knowledge Engineering. He has served as an advisory committee member for the U.S. National Science Foundation. Weiyi Meng is currently an associate professor in the Department of Computer Science at the State University of New York at Binghamton, USA. His main research areas are Internet-based Information Retrieval and Multidatabase Query Processing. He has published many papers in leading journals and major conference proceedings (including TKDE, TODS, VLDB, and ICDE).
There are many boundaries that can seriously increase the cost of software development and of integrating existing systems: technical boundaries (such as multiple programming languages or operating systems) and social/organisational boundaries (between teams, departments or companies; and their web users). CORBA exists to help customers bridge these boundaries. It is rare for a complex CORBA system not to require some or all of its servers to use a DBMS. A vast array of choices however, have to be taken into account as the complexity of the systems increases. For example, there are a bewildering number of approaches to storing CORBA objects in a DBMS.
This tutorial has two aims. The first is to give an update on the CORBA standard, as well as a commentary on parts that are of particular importance. New areas will be covered, such as Object By Value, Portable Object Adapter, Notification Service, and CORBA Components and its close relationship to Enterprise Java Beans (EJB). The second aim is to describe in detail how CORBA relates to databases, including the recently announced Persistent State Service (PSS). The tutorial will assume no prior knowledge of CORBA, but it will nevertheless cover many of the new standards that are only now appearing in products.
Sean Baker is a founder and director of IONA Technologies, the designers of the Orbix implementation of CORBA. His original experience in distributed computing was gained by working as a faculty staff member in the Distributed Systems Group, Department of Computer Science, Trinity College Dublin. There he worked on many projects, designing and implementing support environments for distributed object systems, and also their relationships to databases and persistence in general. In 1992 he co-founded IONA Technologies to exploit this experience. He is the author of "CORBA Distributed Objects" (Addison Wesley Longman and ACM Press) and co-author of "CORBA Fundamentals and Programming" (Wiley).
Tutorial 4: Using SQLJ for
Enterprise Database Applications: Access, Procedures and Storage
Thousands of businesses and organisations world-wide use SQL to effectively manipulate and manage quantities of data stored in very large databases. Developers programming data-intensive applications in general purpose languages such as C and COBOL have long had the ability to use SQL for this data manipulation. Until now, Java was unable to benefit from the strengths of SQL the result being that Java has seen slower acceptance as a suitable language for mainstream business applications. The SQLJ effort is driven by major industry vendors such as Oracle, Sybase, Tandem, JavaSoft, IBM, Informix and others. The SQLJ specifications describe Embedded SQL in Java, Java Stored Procedures, Java UDFs and Java Data Types. This tutorial introduces SQLJ, explains how it works, and its relationship to JDBC, and highlights the benefits it can provide to those writing data-intensive applications.
Ekkehard Rohwedder is one of the lead developers of Oracle Corporation's SQLJ translator. He has also designed and developed the SQLJ translator reference implementation, which is freely available and serves as the basis for many leading DBMS vendors' SQLJ offerings. As a member of the SQLJ working group, he contributes to the ongoing multi-vendor effort to define the SQLJ language, now an ANSI standard. Ekkehard is responsible for developing a standard SQL syntax and semantics checking framework for the SQLJ translator. He is pursuing a PhD in Computer Science from Carnegie Mellon University.
Julie Basu has nearly 10 years of design and hands-on development experience at Oracle and has worked on various projects relating to database programming interfaces, financial applications, and EDI. She was formerly the project lead for Oracles SQLJ product. Among her current interests are web applications and electronic commerce. Julie holds a Masters degree and a Phd degree in Computer Science from Stanford University, and a Masters Degree from Indian Institute of Science, Bangalore. She has published several papers in database conferences and journals - details are available on her home page: http://www-db.stanford.edu/~basu/index.html.
Tutorial 5: Design and
Perception in Information Visualisation
This tutorial provides an overview of the use of 2D and 3D graphics in interaction with complex information. Information visualisation is seen as one way to address the problem of handling the data generated by web, data mining and database tools. Techniques for visualising a spectrum of data structures - from 1D and 2D data to high-dimensional and heterogeneous data - will be described, demonstrated and critiqued. The aim is not simply to describe the key 2D and 3D visualisation techniques, including those just making their way out of research labs, but also to help you understand the perceptual and design issues involved in visualisation, and the characteristic strengths and weaknesses of this approach to interaction. A number of prototype and commercial systems will be discussed and demonstrated, with examples taken from visualising data in web and banking applications.
Matthew Chalmers gained his PhD on ray tracing and concurrent object systems at East Anglia. He spent four years at Xerox EuroPARC in Cambridge working on CSCW, graph layout algorithms and information visualisation. He started the visualisation group at Ubilab, Union Bank of Switzerland's research lab in Zurich, working there for over three years. After a research fellowship at U. Hokkaido, Japan, he recently joined the University of Glasgow . His current research focuses on social and perceptual issues in visualisation and collaborative filtering, and, more theoretically, relating linguistics and philosophy to information representation. He is on programme committees for the IEEE Visualisation Conference, the IEEE Information Visualisation Symposium, the ACM Information Retrieval Conference (SIGIR), and the European Conference on Computer Supported Collaborative Work (ECSCW), as well as the 1999 Workshop on User Interfaces to Data Intensive Systems.