Linguistics | Advanced Natural Language Processing
L645 | 2934 | John Paolillo


Linguistics L645 Natural Language Processing
Joint Listed with SLIS L641 Information Storage and Retrieval Theory

05:45P-08:30P T, Main Library 002 (SLIS Library Lab)

In recent years, the fields of Natural Language Processing
and Information Retrieval have experienced a convergence of
interests. For researchers in NLP, IR represents a promising
real-world application.  For researchers in IR, improvement
of retrieval systems and new experimental question answering
systems require the application of understandings found in
NLP.  Moreover, both NLP and IR have converged on the use
of probabilistic models and statistical methods of analysis.

This course examines the convergence of NLP and IR through
the lense of large-scale analysis of texts, otherwise known
as corpus linguistics.  We examine the construction of corpora,
the types of linguistic analysis that they afford, and the
interesting properties of language that can be observed in
studies of corpora.  In addition we consider probabilistic
studies of corpora.  In addition we consider probabilistic
models of language that help account for properties of corpora,
and the nature of IR systems as linguistic corpora.  These
perspectives help us better understand the design and
performance aspects of IR systems.

Readings for the course are drawn from current research
literature in NLP, corpus linguistics and IR.  There are
a small number of focused assignments which are intended to
illustrate principles of corpus construction and analysis,
and probabilistic models of language.  The final course
project may be either substantial (~15 page) research
paper or a programming project.