Week 6: 11/4/08

 "Harvesting Semantic Content from the Web" 

Eduard Hovy 

3p-4:30p, Tuesday, November 4, 2008, Lecture Hall 3

Research in natural language processing (NLP) over the past fifteen years has produced impressive practical results using statistical methods.  But increasingly there are signs that continued quality improvement in language processing applications (including QA, summarization, information extraction, opinion mining, and machine translation) requires deeper and richer representations, possibly even (shallow) semantics of text meaning.  Although theories of semantics (formal and informal) abound, noone has yet built a resource of semantic symbols that effectively supports NLP, that is empirically based, and that has been validated through human-agreement scores.  Can this be done? 

This talk describes the harvesting of semantic knowledge from the web, and reformulation of that knowledge into the Omega ontology, to support various NLP applications.  We will explore a series of increasingly detailed experiments in knowledge harvesting and organization: from fully automated, through partly automated, ending with work requiring manual annotation.  The first two make extensive use of the web; the third is part of the OntoNotes project, a large collaborative effort to build a manually annotated corpus of one million words of English, Chinese, and Arabic text, with accompanying ontology for the senses of nouns and verbs. 

Throughout the lecture, we will touch on some problematic aspects of the semantics and semantic representations that must support robust large-scale reasoning and other applications.  We will see examples of cases where traditional, formal, semantics simply does not work, and where what does work instead looks woefully simplistic. 

The Speaker:  

Eduard Hovy leads the Natural Language Research Group at the Information Sciences Institute of the University of Southern California, and is Deputy Director of the Intelligent Systems Division, as well as a research associate professor of the Computer Science Department of USC and Advisory Professor of the Beijing University of Posts and Telecommunications.  He completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987, and his research focuses on information extraction, automated text summarization, the semi-automated construction of large lexicons and ontologies, machine translation, question answering, and digital government. He is the author or co-editor of five books and over 180 technical articles.  Dr. Hovy regularly serves in an advisory capacity to funders of NLP research in the US and EU. In 2001 Dr. Hovy served as President of the Association for Computational Linguistics (ACL) and in 2001–03 as President of the International Association of Machine Translation (IAMT); he currently serves as President of the Digital Government Society of North America (DGSNA). He regularly co-teaches a course in the Master’s Degree Program in Computer Science at the University of Southern California, as well as occasional short courses on machine translation and other topics at universities and conferences.  He has served on the Ph.D. and M.S. committees for students from USC, Carnegie Mellon University, Taiwan National U, the Universities of Toronto, Karlsruhe, Pennsylvania, Stockholm, Waterloo, Nijmegen, Pretoria, and Ho Chi Minh City. http://www.isi.edu/natural-language/nlp-at-isi.html and http://www.isi.edu/~hovy.html

Associated reading: 
The first three readings below are required reading for the Week 6 Data and Information seminar.  The other three are optional reading.

See the file list below for a printable announcement for this lecture.

6Hovy.doc36.5 KB