Statistical Language Models for Information Retrieval and Text Mining
Abstract: Statistical language models have recently been successfully applied to many problems in information retrieval and text mining, two key technologies for text information management. In information retrieval, a great deal of recent work has shown that statistical language models not only achieve superior empirical performance, but also facilitate parameter tuning and provide a principled general way for modeling various kinds of complex and non-traditional retrieval problems. In text mining, topic language models have been applied to discover topics in text and analyze their variations over various kinds of context such as time and location. The purpose of this tutorial is to systematically review the recent progress in applying statistical language models to information retrieval and text mining with an emphasis on the underlying principles and framework, empirically effective language models, and language models developed for non-traditional retrieval and text mining tasks. School attendees can expect to learn the major principles and methods of applying statistical language models to information retrieval and text mining, the outstanding problems in this area, as well as obtain comprehensive pointers to the research literature. Attendees will be assumed to know basic probability and statistics.
Speaker's Bio: ChengXiang Zhai is an Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he also holds a joint appointment at the Institute for Genomic Biology, Statistics, and the Graduate School of Library and Information Science. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and bioinformatics. He has published over 80 papers in these areas. He serves on the editorial boards of ACM Transactions on Information Systems and Information Retrieval Journal , and is a program co-chair of ACM CIKM 2004 , NAACL HLT 2007, and ACM SIGIR 2009. He is an ACM Distinguished Scientist, and received the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), the ACM SIGIR 2004 Best Paper Award, and an Alfred P. Sloan Research Fellowship in 2008.
| Attachment | Size |
|---|---|
| zhai-slm-notes.pdf | 1.24 MB |
| zhai-slm-slides.pdf | 1.81 MB |
