Seminar: Explore or Exploit? Effective Strategies for Disambiguating Large Databases

Date: 
1 June 2011 3:00pm4:00pm

Speaker: Dr. Reynold Cheng (Department of Computer Science, University of Hong Kong)

Venue: 78-622, The University of Queensland, St Lucia

Title: Explore or Exploit? Effective Strategies for Disambiguating Large Databases
Abstract:
Data ambiguity is inherent in applications such as data integration, location-based services, and sensor monitoring. In many situations, it is possible to “clean”, or remove, ambiguities from these databases. For example, the GPS location of a user is inexact due to measurement errors, but context information (e.g., what a user is doing) can be used to reduce the imprecision of the location value. In order to obtain a database with a higher quality, we study how to disambiguate a database by appropriately selecting candidates to clean. This problem is challenging because cleaning involves a cost, is limited by a budget, may fail, and may not remove all ambiguities. Moreover, the statistical information about how likely database objects can be cleaned may not be precisely known. We tackle these challenges by proposing two kinds of algorithms. The first type makes use of greedy heuristics to make sensible decisions; however, these algorithms do not make use of cleaning information and require user input for parameters to achieve high cleaning effectiveness. We propose the Explore-Exploit (or EE) algorithm, which gathers valuable information during the cleaning process to determine how the remaining cleaning budget should be invested.

We also study how to fine-tune the parameters of EE in order to achieve optimal cleaning effectiveness. Experimental evaluations on real and synthetic datasets validate the effectiveness and efficiency of our approaches.

Biography:
Dr. Reynold Cheng is the Assistant Professor of the Department of Computer Science in the University of Hong Kong. He received his BEng (Computer Engineering) in 1998, and MPhil (Computer Science and Information Systems) in 2000, from the Department of Computer Science in the University of Hong Kong. He then obtained his MSc and PhD from Department of Computer Science of Purdue University in 2003 and 2005 respectively. Dr. Cheng was the Assistant Professor in the Department of Computing of the Hong Kong Polytechnic University. He was a visiting scientist in the Institute of Parallel and Distributed Systems in the University of Stuttgart during the summer of 2006.

Dr. Cheng was the recipient of the 2010 Research Output Prize in the Department of Computer Science of HKU. He also received the Universitas 21 Fellowship in 2011. He received the Performance Reward in years 2006 and 2007 awarded by the Hong Kong Polytechnic University. He is a member of the IEEE, the ACM, the Special Interest Group on Management of Data (ACM SIGMOD), the UPE (Upsilon Pi Epsilon Honor Society). He is also a guest editor for a special issue in TKDE. He is a keynote speaker in the First International Workshop on Quality of Context (QuaCon '09). He received an Outstanding Service Award in the CIKM 2009 conference. He has served as PC members and reviewer for international conferences and journals including TODS, TKDE, TMC, VLDBJ, IS, DKE, KAIS, VLDB, ICDE, ICDM, DEXA and DASFAA.

AttachmentSize
UQ-lecture2-cleaning-reynold.pdf4.96 MB