Hanan Samet

TITLE: Sorting in Space: A Foundation for Multidimensional and Metric Data Structures for Applications in Spatial Databases and Similarity Searching

Hanan Samet*
Computer Science Department
Center for Automation Research
Institute for Advanced Computer Studies
University of Maryland***
College Park, Maryland 20742, USA
hjs@cs.umd.edu

* Currently a Science Foundation of Ireland (SFI) Walton Fellow at the Centre for Geocomputation at the National University of Ireland at Maynooth (NUIM).

ABSTRACT
A foundation, based in part on representation issues, for applications in spatial databases, which include geographic information systems (GIS), and similarity searching is presented. We start with an introduction to the spatial database issues involved in the design of geographic information systems (GIS) from the perspective of a computer scientist. Some of the topics to be discussed include the nature of a GIS and the functionalities that are desired in such systems. One of the major issues is the representation of spatial data. A wide number of representations is currently in use. Recently, there has been much interest in hierarchical data structures such as quadtrees, octrees, R-trees, etc. The key advantage of these representations is that they provide a way to index into space. In fact, they are little more than multidimensional sorts and this approach will be central to the presentation. They are compact and depending on the nature of the spatial data they save space as well as time and also facilitate operations such as search. We will provide brief overview of hierarchical spatial data structures and related research results. In addition we demonstrate the SAND Browser and the VASCO JAVA applet which illustrate these methods.

We continue with a discussion of similarity searching which is usually achieved by means of nearest neighbor finding. Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a low-dimensional vector space which is then indexed using representations such as k-d trees, R-trees, quadtrees, etc. The second directly indexes the the objects based on distances using representations such as the vp-tree, M-tree, etc. Mapping from a high-dimensional space into a low-dimensional space is known as dimensionality reduction and is achieved using SVD, DFT, etc. At times, when we just have distance information, the data objects are embedded in a vector space so that the distances of the embedded objects as measured by the distance metric in the embedding space approximate the actual distance. The search in the embedding space uses conventional indexing methods which are often coupled with dimensionality reduction. Some commonly known embedding methods are multidimensional scaling, Lipschitz embeddings, and FastMap. These topics will be discussed at varying levels of detail.

BIOGRAPHY
Hanan Samet received the B.S. degree in engineering from the University of California, Los Angeles, and the M.S. Degree in operations research and the M.S. and Ph.D. degrees in computer science from Stanford University, Stanford, CA. He is a Fellow of the IEEE, ACM, and IAPR (International Association for Pattern Recognition), and was also elected to the ACM Council in 1989-1991 where he served as the Capital Region Representative.

In 1975 he joined the Computer Science Department at the University of Maryland, College Park, where he is now a Professor. He is a member of the Computer Vision Laboratory of the Center for Automation Research and also has an appointment in the University of Maryland Institute for Advanced Computer Studies. At the Computer Vision Laboratory he leads a number of research projects on the use of hierarchical data structures for geographic information systems. His research group has developed the QUILT system which is a GIS based on hierarchical spatial data structures such as quadtrees and octrees, the SAND system which integrates spatial and non-spatial data, the SAND Browser which enables browsing through a spatial database using a graphical user interface, the VASCO spatial indexing applet, and a symbolic image database system.

His research interests are data structures, computer graphics, geographic information systems, computer vision, robotics, and database management systems. He is the author of the recent book titled "Foundations of Multidimensional and Metric Data Structures" published by Morgan-Kaufmann, an imprint of Elsevier, in 2006, an award winner in the 2006 best book in Computer and Information Science competition of the Professional and Scholarly Publishers (PSP) Group of the American Publishers Association (AAP), and of the first two books on spatial data structures titled "Design and Analysis of Spatial Data Structures", and "Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS", both published by Addison-Wesley in 1990. He was the co-general chair of 15th ACM International Conference on Advances in Geographic Information Systems (ACM GIS 2007) and the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008). He is the founding chair of ACM SIGSPATIAL, and received best paper awards in the 2008 SIGMOD Conference, the 2008 SIGSPATIAL Conference, and the 2007 Computers & Graphics Journal.