Common Sense as a Service
Abstract: The explosion of digitized data has brought great opportunity for gaining better understanding of human communication. To certain degree, it revives the semantic web movement, except that instead of taking the bottom up approach, which requires users to manually annotate text, the new top down approach focuses on automatically gaining understanding from natural language text. In this short course, I will first survey work in this area, some from as early as 20 years ago, including Cyc, Open Mind Common Sense, and others more recent, including Wikipedia, Freebase, YAGO, etc. I will introduce the scope of each project, their strengths and weaknesses, and applications built on top of these systems. I will also survey applications that try to leverage these techniques, in particular, start-up companies that build semantic search engines, including Powerset, Evri, etc. Furthermore, I will introduce the Probase project at Microsoft Research, which focuses on building a large scale knowledge base. Instead of projects such as Cyc and Open Mind Common Sense, which relies on human efforts to create a knowledge base, we use leverage machine learning techniques and large amount of text corpora (including the web, the search log, existing structured data, dictionaries and encyclopedias). I will also discuss the potential of Probase, which is to enable common sense computing. I will show that many applications, including NLP and search, can benefit from such as knowledge base, which reveals human intent in communication by reducing ambiguity.
Speaker's Bio: Haixun Wang joined Microsoft Research Asia in Beijing, China in 2009, and he leads research in data managment. Before joining Microsoft, he had been a research staff member at IBM T. J. Watson Research Center for 9 years. He was Technical Assistant to Stuart Feldman (Vice President of Computer Science of IBM Research) from 2006 to 2007, and Technical Assistant to Mark Wegman (Head of Computer Science of IBM Research) from 2007 to 2009. He received the Ph.D. degree in computer science from the University of California, Los Angeles in 2000. He has published more than 120 research papers in referred international journals and conference proceedings. He was PC Vice Chair of KDD'10, ICDM'09, SDM'08, and KDD'08, and he served as demo/workshop/sponsor Chair of various conferences, including SIGMOD'08, ICDM'08, ICDE'09, ICDM'11, etc. He serves on the editorial board of IEEE Transactions of Knowledge and Data Engineering (TKDE), and Journal of Computer Science and Technology (JCST). He is an adjunct professor of Nanjing University and Renmin University of China.
| Attachment | Size |
|---|---|
| WANGPh.D.Workshop.1.pdf | 2.84 MB |
| WANGPh.D.Workshop.2.pdf | 1.01 MB |
| WANGPh.D.Workshop.3.pdf | 1.66 MB |
| WANGPh.D.Workshop.4.pdf | 3.14 MB |
