Title: Data Integration: The Teenage Years
Alon Halevy, Google
Anand Rajaraman, Kosmix Corp.
Joann J. Ordille, Avaya Labs
Thursday, 9:00-10:30
Room 401


Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own a multitude of data sources, for progress in large-scale scientific projects, where data sets are being produced independently by multiple researchers, for better cooperation among government agencies, each with their own data sources, and in offering good search quality across the millions of structured data sources on the World-Wide Web. Ten years ago we published ''Querying Heterogeneous Information Sources using Source Descriptions'', a paper describing some aspects of the Information Manifold data integration project. The Information Manifold and many other projects conducted at the time have led to tremendous progress on data integration and even to quite a few commercial data integration products. This talk and associated paper offer a perspective on the contributions of the Information Manifold and its peers, describes some of the important bodies of work in the data integration field in the last ten years, and outlines some challenges to data integration research today.

Profile of Speakers:



Dr. Alon Halevy received his Bachelors degree in Computer Science and Mathematics from the Hebrew University in Jerusalem in 1988, and his Ph.D in Computer Science from Stanford University in 1993. From 1993 to 1997, Dr. Halevy was a principal member of technical staff at AT&T Bell Laboratories, and then at AT&T Laboratories. He joined the faculty of the Computer Science and Engineering Department at the University of Washington in 1998. Dr. Halevy's research interests are in data integration, semantic heterogeneity, personal information management, management of XML data, web-site management, peer-data management systems, query optimization, database theory, knowledge representation, and more generally, the intersection between Database and AI technologies. His research developed several systems, such as the Information Manifold data integration system, the Strudel web-site management system, and the Tukwila XML data integration system. He was also a co-developer of XML-QL, which later contributed to the development of XQuery standard for querying XML data. In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Information Integration space. In 2004, Dr. Halevy founded Transformic Inc., a company that creates search engines for the deep web, content residing in databases behind web forms. Dr. Halevy was a Sloan Fellow (1999-2000), and received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000. He serves on the editorial boards of the VLDB Journal, the Journal of Artificial Intelligence Research (currently, a member of the advisory committee), and ACM Transactions on Internet Technology. He served as the program chair for the ACM SIGMOD 2003 Conference, and has given several keynotes at top conferences.


Anand Rajaraman is a co-founder of Kosmix, a company creating the next generation of web search technology. Anand also is Founding Partner of Cambrian Ventures, a venture capital firm that invests in disruptive new technologies. Before Cambrian, Anand was Director of Technology at Amazon.com, where he was responsible for technology strategy. Anand helped launch the transformation of Amazon.com from a retailer into a retail platform, enabling third-party retailers to sell on Amazon.com's website. Third-party transactions now account for almost 25% of all US transactions, and represent Amazon's fastest-growing and most profitable business segment. Anand came to Amazon.com in 1998 through the acquisition of Junglee, a database technology company where he was co-founder and CTO. Junglee created Virtual Databases that combined information from across the web, and pioneered several markets including online comparison shopping and online classifieds search.
Anand has extensive research and development experience at Stanford University, AT&T Bell Labs, and Xerox PARC, with numerous publications, patents, and awards at leading academic and industry forums. He is the recepient of the Best Paper Award, ACM SIGMOD 1996, the 10-year Best Paper Award at SIGMOD 2006, and the Best Student Paper Award at ICDT 1997. He obtained his Bachelor's degree in Computer Science and Engineering from the Indian Institute of Technology, Madras, where he won the President of India Gold Medal for graduating at the top of his class, and his M.S. and Ph.D. in Computer Science from Stanford University. Anand serves as Consulting Assistant Professor at Stanford University's Computer Science Department, and as investor, advisor, and Board member to several Silicon Valley startups.

Dr. Joann J. Ordille is a consulting research scientist in the Software Technology Research Department of Avaya Labs. Her research focuses on inventing communication services that leverage the convergence of voice with data, and wired with wireless to provide applications that have not yet been imagined. Dr. Ordille leads the Rome Research Project in right time communication for the enterprise. Through Rome , Dr. Ordille is creating new technologies that give us choice and power in how we communicate about what is most important and urgent to us. The results of the Rome Project include: a state of the art notification and response system called Via, a publish-subscribe system with special features to foster collaboration and enhance security for organizations, and an event-driven multi-modal collaboration system. Dr. Ordille was an early innovator in making it easier to search, integrate and use information available on the Internet for which she and her co-authors are receiving the "10 Year Best Paper Award" at VLDB 2006. She invented meta-directories for which she received the "Best Paper Award" at the International Conference on Distributed Computing Systems. Several commercial products, including meta-directories, publish-subscribe services, and notification and response systems, have resulted from Dr. Ordille's research. Dr. Ordille has given talks on the future of Internet technologies around the world as a featured speaker in the Bell Labs Seminar Series. She joined Avaya Labs at its birth in 2000 in a spin-off from Bell Labs and previously joined Bell Labs in 1993 after completing her Ph.D. in Computer Science at the University of Wisconsin-Madison. She also holds an M.S. in Computer Science from the University of Wisconsin-Madison, an M.A. in the History and Philosophy of Science from the University of Pittsburgh, and a B.A. in Applied Mathematics and Philosophy from The George Washington University in Washington, DC.

Back to Top