Week 4: 10/21/08

"The Past and Future of Search (Why Search is Hard, and Why Language Matters)"

Shauna Eggers

3p-4:30p, Tuesday, October 21, 2008, Lecture Hall 3

This talk describes the challenges involved in building a prototypical search engine, and how language domain knowledge can be used to inform improvements to search. Specific applications of natural language processing to search include machine translation, query disambiguation, information extraction, and question answering. This talk will give an overview of Google's work on this front, and compare a sketch of what search engines looked like ten years ago to what we can expect them to look like ten years in the future.

The Speaker:
Shauna Eggers is a software engineer at Google, Inc., where she has worked since 2007 on AdWords and AdSense. Prior to Google, she received a B.S. in Computer Science and B.A. in Linguistics from the University of Arizona, and participated in the first year of the Computational Linguistics master's program at the University of Washington. At Arizona, she worked as a researcher at the MIS Artificial Intelligence lab on NLP applications for information retrieval.

Associated reading:

  • "The Future of Search", from The Official Google Blog, 9/10/2008.  http://googleblog.blogspot.com/2008/09/future-of-search.html 
  • Sergey Brin, Lawrence Page.  "The Anatomy of a Large-Scale Hypertextual Web Search Engine", 1998.  http://infolab.stanford.edu/%7Ebackrub/google.html  (Full Version in: Computer Networks and ISDN Systems, Volume 30, Issue 1-7, April 1998, pages 107–117.)  This classic paper describes key concepts of modern search engines: crawling, parsing, creating an inverted index, and using link structure for ranking.
  • More Advanced Optional Reading (e.g., for those pursuing a project on this topic):  Enrique Alfonseca, Slaven Bilac, Stevan Pharies, "Decompounding Query Keywords from Compounding Languages", Proceedings of ACL-08: HLT, Short Papers (Companion Volume), pp 253-256, Columbus, OH, USA, June 2008.  © 2008 Association for Computational Linguistics.  http://www.aclweb.org/anthology-new/P/P08/P08-2064.pdf  This paper describes an example of where domain (i.e., language) knowledge is useful in addressing a current challenge in search technology.
  • Chapters 18, 19, and 21, and Case Study 1, from Lakoff's Women, Fire, and Dangerous Things.  (This is the Week 4 seminar reading for the Data and Information program; it is not otherwise associated with the week's PLATO lecture.)  

See the file list below for a printable announcement for this lecture.

4Eggers.doc32 KB