Building a parallel, annotated corpus: English and Late Latin
Time: 12:00pm - 01:00pm
Place: Memorial Hall 401
Olga Scrivner and Eric BaucomThis study describes the implementation of a resource-light approach, cross-language transfer, to build a parallel English-Latin annotated corpus.Our data come from a Late Latin text ”Peregrinatio Aetheriae” (Pilgrimage of Etheria), written at the end of the 4th century. This document is one of the latest attestations of Latin, displaying some characteristics of Romance languages: i) the increasing frequency of Subject-Verb-Object word order, compared to Subject-Object-Verb order in Classical Latin (Bauer, 1995) (1); ii) the rise of overt neutral pronominal subjects (2), among other features. Therefore, syntactic annotation of such corpus would be a very useful tool for Historical Linguistics. In addition, a parallel-aligned Latin-English corpus would enable researchers without prior knowledge of Latin to use and understand this text.(1) sic benedicet ﬁdeles...then blesses faithful...‘then he blesses the faithful’(2) intrat intra cancellos intra Anastasim, id est intra speluncamenters into rails in Anastasis, it is in cave‘he enters within the rails in the Anastasis, that is in the cave’The compilation of our corpus consisted of several phases. The ﬁrst phase included paragraph alignment, tokenization, and sentence alignment. The second step was the word alignment. Word alignment is extremely challenging due to the free word order in Latin, as opposed to English. We used Berkeley aligner (Liang et al., 2006), supplemented by a Latin-English dictionary (FreeLang). Both texts being aligned, we then proceeded to parsing annotation. Since dependency relations “deal especially well with languages involving relatively free word order” (Bamman and Crane, 2011), such as Latin; we have parsed English via a dependency MaltParser (Nivre et al., 2007), trained on Penn Treebank (Marcus et al., 1993). Finally, the dependency relations were transferred from English to Latin via the alignments.While the annotations in a historical corpus require high accuracy, we have shown that we can start building corpus by using resource-light methods, such as cross-language transfer.
In category: Computational linguistics
Multilingualism: Understanding linguistic diversity
Time: 04:00pm - 05:30pm
Place: Indiana Memorial Hall (IMU) Georgian Room
John Edwards (St. Francis Xavier University)The general intent of this lecture is two-fold. The first goal is to present an outline picture of global linguistic diversity, with some of its important ramifications and consequences. The second goal is to point out that the most compelling aspects of this diversity are not linguistic at all. They have to do, rather, with the symbolic and group-identity-marking features of language.
In category: Sociolinguistics and pragmatics
JEvents v3.0.9 Stable
Copyright © 2006-2013
Powered by Joomla!®
Copyright © 2008 The Trustees of Indiana University | Copyright Complaints