Linguistic Variation, Theory-building and Statistics: Toward an Integrated Perspective

John C. Paolillo, Indiana University

Language variation and change is closely associated with the application of logistic regression model to linguistic data, so much that "Varbrul analysis" is often taken to be a distinctive trademark of "variationist" linguistics. Originally, this mode was strongly coupled with formal linguistics and linguistic theory building, via a tightly-argued connection between linguistic processes and observed outcomes. Over the past 20 years (at least), conceptions of linguistic process have changed while the variationist statistical methodology has been retained relatively unchanged. Consequently, the statistics of the method have become progressively de-coupled from grammatical theory building, and are treated as informative mainly about the social distribution of linguistic variables. Hence, the empirical approach offered by variationist linguistics is under-utilized in linguistic theory building.

This workshop intends to address this situation by placing variationist methods in the context of a broader range of statistical methods, and framing them in terms of their utility for linguistic theory-building. By doing so, it is possible to re-introduce the relevance of research design, enabling the criticism and refinement of linguistic hypotheses research questions in empirical terms. Traditional variationist research questions, investigated with logistic regression, are shown to occupy one specific place in linguistic theory-building. Alternative conceptions of linguistic processes (e.g. constraint interaction, or psycholinguistic instead of historical processes) may alter how logistic regression (and similar models) is applied and interpreted, but similar observational methods are relevant to all questions of this type. Specifically, these questions require some circumstances of the observations to be controlled, i.e. subject to manipulation or a priori exclusion.  Research questions of this sort, using regression methods, are appropriate when a highly refined theoretical backdrop is presupposed, and questions are to be answered only within the parameters provided by that theoretical backdrop

Ontologically prior to these questions are a range of other questions regarding the variation shared among a number of linguistic variables; such questions presuppose less detailed understandings of how the linguistic variables studied relate to one another. Such questions require multivariate methods (e.g. principal components and factor analysis) to address, and their answers can bear upon larger, architectural characteristics of a linguistic theory, or basic conceptions of language and linguistic processes. These types of questions require entirely different research designs from the regression-type analysis, and raise different issues for data sampling. By appreciating the distinct contributions of these two different types of statistical model, and their relevance to linguistic theory building, both types of model can be used alongside one another for a more fully rigorous empirical approach to linguistic theory-building.

These two types of models and their applications will be discussed in the context of a range of supporting examples. Time will be provided to allow workshop participants to bring problems of their own, with the aim of identifying appropriate statistical methods for their analysis, and seeing where those problems fit within the larger context of empirical theory building in linguistics.