Entity Linkage: Similarity Measures and Algorithms - Divesh Srivastava
Abstract: A central problem in data quality research is the problem of entity linkage: given two large multi-attribute data sets, identify all pairs of entities in the two sets that are approximately the same. This problem arises in data cleaning, heterogeneous data integration, flexible querying, and a variety of other data-centric applications. This tutorial provides a comprehensive and cohesive overview of the key research results on this problem, focusing on similarity measures and efficient algorithms for this problem.
Bio: Divesh Srivastava is the head of the Database Research Department at AT&T Labs-Research. He received his Ph.D. from the University of Wisconsin, Madison, and his B.Tech. from the Indian Institute of Technology, Bombay, India. His current research interests include data quality and data stream management systems.