Linguistics | Topics in Linguistics: Corpus Linguistics
L485 | 22548 | Sandra Kuebler
Topics in Linguistics: Corpus Linguistics
L485 also meets with L700
Advances in computer technology have revolutionized the ways
linguists can approach their data. By using computers, we can access
large bodies of text (corpora) and search for the phenomena in which
we are interested. Corpora give us a chance to get away form our
famous examples such as "Bill hit John" or "Every farmer who owns a
donkey beats it." to more natural examples. In the process, we may
find that language can be quite a bit more complex than the examples
that come to our mind when we need to think of examples.
Corpus linguistics, like many emerging fields, often seems to be
accessible only for initiated researchers. Even if you would like to
work with real life examples, it is often difficult to get started
and find answers to all these questions that one should be clear
about. In this course, we will approach many of these questions such
as the following: What exactly is a corpus, and what isn't? Are
there corpora around that I could use? Where do I find them? How do
I create my own corpus? What is XML, and why do I need it? How do I
find a specific phenomenon in millions of words? Can I work with
multi-lingual data? What is a concordancer good for? Do I need
syntactic annotation? Are there programs that do the annotation for
me? Are there tools that help me search in linguistically annotated
corpora?
No programming experience is assumed, computer experience
presupposed.
We will work with the following textbook:
Tony McEnery, Richard Xiao, Yukio Tono (2006) Corpus-Based Language
Studies. Routledge.