section banner

4.4 Learning phonology

Babies learning phonology

Since a newborn infant is (equally) capable of learning any human language, what abilities would an infant have to come equipped with to allow it to learn phonology?

The beginning

As we've seen, languages differ with respect to what is contrastive: all languages treat some distinctions as significant and others as non-significant. In English the difference between [ɪ] and [i] matters — it distinguishes words from one another — but in Spanish it doesn't. In Amharic the difference between [k] and [k'] matters; in English it doesn't. When an infant is born, it is capable of learning any human language. How could it figure out which distinctions are contrastive and which aren't?

Obviously the only information the infant gets to help it is the language that it hears being spoken around it. If it is to figure out which distinctions matter, it has to be able to hear those distinctions. So for example, it has to be able to hear the difference between [i] and [ɪ] or the difference between [k] and [k']. In fact experiments with very young infants indicate that they can perceive all of the distinctions that matter in the world's languages. This is quite impressive since, as we'll see, they tend to lose the ability to hear many of these distinctions later on.

Babies start learning the sounds of the language around them long before they understand any of it.

So what happens when an infant starts getting exposed to a particular language? For at least six months after birth, the baby makes the same sorts of sounds regardless of what language this is. (Here's an example of what a baby sounds like at three months.) That is, if we only looked at the baby's production, we would not see any signs that learning is taking place. But it is. Within a few months after birth, babies can distinguish the language around them from other languages that they haven't heard. This means that they have already become sensitive to some of the properties that distinguish languages from each other.

What sorts of properties? Experiments with eight-month-olds show that they are capable of learning the frequencies at which different phones occur together. Nine-month-olds can distinguish sequences of sounds that obey the phonotactics of their language from those that don't and can also distinguish sequences that are frequent in their language from sequences that are phonotactically legal but infrequent (for example, /cʌn/ vs. /tʌš/ in English). Babies apparenlty start learning phonotactics at a very early age. As we saw in the last section, knowing the phonotactics of the target language comes in handy in learning the phonemes of the language because it makes clear what the possible contexts are.


Starting at around their sixth month, the sounds that babies make start to take on a different character. This stage, which normally lasts about 12 months, is called babbling. (Here's an example of what a baby learning English sounds like at nine months.) Babies start producing simple syllables, such as [ba], and these may include a very wide variety of consonants and vowels, including many not found in the target language. Later, several things happen. Babies begin to string the syllables together in sequences, such as [bababa] and [batabatabata]. And the sounds they are producing begin to resemble the target language more and more. What happens with deaf children is somewhat more complicated. They begin this phase like hearing children, producing a range of simple syllables. But the sounds they make never get more language-like and never come to resemble the spoken language around them. Deaf children who are exposed to a sign language from birth also go through a sort of sign babbling phase, however, "babbling" with their hands.

In the second half of their first year, babies sound more and more like the language around them.

What is going on during babbling? It seems to have three related sorts of functions. First, it should be clear from what we've learned about phonology that producing language is very complex and requires a great deal of coordination. In this sense, babbling may be a form of practice; the baby is figuring out how to use its articulatory apparatus in a fluent manner.

Second, the baby has to learn to tie what it hears to what it says. The auditory and the articulatory properties of linguistic sounds are totally different things, but, as we've seen, a particular phoneme needs to be associated with both. How could the baby learn to make this association? It's possible that during babbling the baby tries out various articulatory positions and movements and then listens to the auditory consequences, associating the behavior with the sounds each time this happens.

Third, the baby has to learn to sound more like the target language. This may work through a mechanism known as reinforcement learning. The baby tries out a particular articulatory pattern, listens to the consequences, and if these sound close to the kinds of sounds it is hearing around it, that articulatory pattern gets reinforced for the baby. The result is that the baby is more likely to produce this pattern later on. If, on the other hand, the sound that is produced sounds very different from the linguistic sounds in its environment, that articulatory pattern fails to get reinforced, or is penalized. In that case, the baby will be less likely to produce the pattern later on.

While this is going on, there are also changes in what the baby can perceive. As it begins to learn the phonemes of the language, it begins to lose the ability to hear distinctions that are not contrastive in that language. One well-known example of this phenomenon is the distinction between [r] and [l], a phonemic distinction in English, but not in Japanese, Lingala, Inuktitut, or many other languages. For many speakers of languages like these, the ability to hear the distinction is lost. We'll see more examples when we consider what happens in second language learning below.


Children start producing recognizable words around the beginning of their second year. In the beginning what they produce only very roughly approximates the forms they are hearing. Partly this may come from not having worked out what the phonemes of the language are yet; this may take several years. But it also results from the inability to produce some of the distinctions that the child does hear. Production lags behind perception throughout the learning of phonology.

But the forms the child produces in its second, third, and fourth years do not deviate from the adult forms in random ways. Rather they can be seen as simplifications of the adult forms. The kinds of simplification include the following.

  • Some phones are inherently more difficult to produce than others, so there tends to be a order in which phonemes are learned. For example, stops and nasals are easier to produce than fricatives. For stops the articulators are brought together completely, whereas for fricatives they must be brought close enough together to yield the characteristic fricative turbulence but not so close as to block the passage of the air completely. The result is that young children may replace fricatives with stops ([mɛti] messy) or replace all fricatives with one fricative that they've mastered ([mɛfi] messy).
  • Syllable structure may be simplified. As we saw when we looked at the phonotactics of different languages, the simplest syllables are those consisting of a vowel preceded by at most one consonant. Thus young children sometimes simplify syllables ending in one or more consonants by dropping them ([kæ] cat) or simplify syllables beginning with consonant clusters by dropping all but one of the consonants ([tɑp] stop).
  • In words of more than one syllable, children may replace one of the phonemes with another one found in the word ([kiki] kitty), drop a syllable, or combine two syllables ([bænə] banana).

Adults learning phonology

We learned that the Spanish voiced stop phonemes /b, d, g/ are pronounced as approximants when they follow vowels. Yet English speakers learning Spanish tend to pronounce them as stops in all contexts. Why might this be?

When a person learns a language later than in the first few years of life, their success depends on their age, as well as on a range of other factors, though why age matters and which ages, if any, are crucial remain hotly debated issues. Here we'll just consider what happens in adult language learning. I'll be using the term "second language" to refer to any language other than the learner's native language since in most ways the learning of third and later languages follows the same pattern as the learning of a second language.

The learning of second-language phonology seems to be quite independent from the learning of grammar and vocabulary. Most of us are familiar with people whose grammar and vocabulary are indistiguishable from those of a native speaker but who still have a noticeable foreign accent. So what I'll be discussing in this section applies only to the learning of pronunciation. We'll look briefly at the learning of second-language grammar later on.

The phonological learning task of the adult learner is the same as that of the baby: to figure out what distinctions in the target language are constrastive and to learn how to produce and recognize the different phonemes in different contexts. But what we see in adult phonological learning looks quite different from what we see with babies. One clear difference is the amount of variability. Normal children learning their first language end up roughly equivalent in their ability to pronounce and understand the language. Adults, on the other hand, differ dramatically from one another. While the great majority of adults never achieve native-like proficiency in the pronunciation of a second language, no matter how much they are exposed to it, the degree of foreign accent they exhibit varies a lot from one learner to another.

By comparing the phonology of the first and second languages, we can predict the kinds of errors that second-language learners will make.

An even more important difference between child and adult language learners stems from the fact that adults already know the phonology of at least one language. This can both help and hinder them in their learning of the new phonological system. In general the influence of a body of knowledge on the learning of new knowledge is called transfer.

The crucial issue is the ways in which the first language phonology agrees with that of the second language. When they agree more or less perfectly, we can expect positive transfer. That is, knowledge of the first language makes the target language easier to learn than it would be for learners with other first languages. For example, as we saw in the section on vowels, Spanish and Japanese vowels are quite similar. In general, it is easier for a Spanish speaker to learn Japanese vowels or for a Japanese speaker to learn Spanish vowels than it is for an English speaker to learn the vowels of either of these languages or for speakers of either of these languages to learn English vowels. Similarly, English and Spanish both have the phoneme /f/, realized in virtually identical ways, whereas Japanese has no such phoneme. In general, then, it is harder for Japanese speakers to learn this aspect of English or Spanish than it is for English or Spanish speakers to learn this aspect of each other's languages.

Much more noticeable in second language phonology are the consequences of differences between the languages. Differences may result in negative transfer, that is, interference from the first language to the target language. As you will see in the section on English accents, accents or languages can differ phonologically in several ways. These differences can often predict some features of foreign accent and areas of phonological difficulty for second language learners.

Phonetic differences

One possible difference is purely phonetic. The first language and the second language both have a similar phoneme P, distinguished from other similar phonemes, but the phoneme differs in the details of how it is produced. Either it is always pronounced differently in the two languages, or it is pronounced differently in some contexts. Learners will tend to pronounce the phoneme as it is pronounced in their first language.

English /r/, Spanish /r/ ([ɾ])
Though English /r/ and Spanish /r/ are pronounced quite differently, they have similar auditory properties, so learners will tend to map them onto each other. English speakers will tend to pronounce Spanish /r/ as [r] (in marido and cortar, for example), and Spanish speakers will tend to pronounce English /r/ as [ɾ] (in marry and quarter, for example).
English /k/, Spanish /k/
In both languages, this voiceless stop is distinguished from the corresponding voiced stop, /g/, but English /k/ has an aspirated allophone [kh] that is never used in Spanish. Learners should have no problems when /k/ does not begin a stressed syllable. But English speakers learning Spanish will tend to aspirate Spanish /k/ when it begins a stressed syllable, as in que and como, for example, and Spanish speakers learning English will tend not to aspirate English /k/ in any context, including those where it would be aspirated in English, as in come and quick, for example.

Phonemic differences

Second-language learners may have difficulty hearing and producing a distinction that matters in the second language.

Another possible difference is phonemic. The second language makes a distinction that is not made in the first language. Learners may fail to hear the distinction and will tend to pronounce the two forms in the same way. Because this sort of difference can interfere with communication, it is more serious than problems of the first type.

Japanese /r/, English /l/ and /r/
While English makes a distinction between /r/ and /l/, Japanese has a single phoneme /r/ that is usually pronounced like an alveolar tap ([ɾ]) but sometimes takes the form of an alveolar lateral (similar to English /l/) or a phone that is somewhere in between these two. Which form it takes may be difficult to predict; it depends on the phonetic context, the speaker, and even the situation. Japanese speakers learning English may fail to hear the difference between English /l/ and /r/, so they may not be able to distinguish right from light. And they will tend to pronounce both /l/ and /r/ as either [ɾ] or [l].
English /k/, Amharic /k/ and /k'/
As we saw in our discussion of ejectives, Amharic and many other languages make a distinction between plain, non-glottalized voiceless stops and ejective voiceless stops (for example, between /k/ and /k'/), a distinction not made in English. English speakers learning Amharic often have difficulty hearing the difference, so they may not be able to distinguish /kɛbɛro/ 'drum' from /k'ɛbɛro/ 'fox'. And they will tend to pronounce /k/ and /k'/ in the same way, like English /k/.

Phonotactic differences

The languages may also differ in their phonotactics. If the second language has more complicated syllables than the first language, in particular if it allows more complicated clusters of consonants at the beginnings and ends of syllables, it may present special difficulties for the learner. These learners may drop consonants, replace one consonant with another, or add vowels to break up consonant clusters. Another potential problem is a difference in the range of phones that can appear in a particular position, for example, the vowels in unstressed syllables. If the first language is more constrained, the learner may tend to follow those constraints in speaking the second language.

Beginnings of syllables in English and Japanese
As we saw in the section on syllables, English allows a variety of consonant clusters at the beginnings of syllables, while some other languages do not. Except for clusters ending in the semivowel /y/, Japanese allows no more than one consonant at the beginning of a syllable, and Japanese speakers may tend to insert vowels between the consonants at the beginnings of English syllables, for example, [gu'ɾeet] for great. Japanese, on the other hand, permits more consonant+/y/ clusters than English does. In particular, Japanese allows syllables to begin with /ry/ ([ɾy]), whereas /ry/ is not a possible syllable beginning in English. English speakers learning Japanese tend to insert a vowel between /r/ and /y/. That is, they may pronounce the words /ryuu/ 'dragon' and /riyuu/ 'reason' the same way, [ri'yu], and may also be unable to hear the difference between these words.
Frequent [ə] is a feature of English-accented Japanese or Spanish.
Unstressed vowels in English and other languages
English has a strong tendency for unstressed vowels to be pronounced with the vowel /ə/. This is not true for many other languages with stress, for example, Spanish. This means that English speakers will tend to use /ə/ for vowels in some unstressed syllables in Spanish. For example, they may say [,ɛnčə'lɑdə] for [ɛnči'laða] enchilada. In word-final position, English does commonly have /i/ in unstressed syllables, for example, in happy and hurry, and English speakers will sometime replace /ɛ/ or /e/ in final unstressed syllables with /i/. We sometimes see this in the pronunciation of words borrowed from Japanese into English. Even though Japanese does not have stress, English speakers impose stress on words borrowed from Japanese, treating some syllables as stressed and others as unstressed. So the word karaoke is usually pronounced /,kɛri'oki/ in English, whereas the original Japanese word was [kaɾaokɛ].

Suprasegmental differences

English and Japanese use pitch in very different ways, leading to difficulties for second-language learners.

As we saw in the section on syllables, languages can use suprasegmental dimensions, especially pitch, in very different ways. It is used as one component of stress in some languages and as a signal for tone or pitch accent in other languages, and it is the major component of intonation in all languages. English and Spanish speakers have little difficulty learning stress in each others' languages, but they may have great difficulty learning pitch accent in a language like Japanese or tone in a language like Mandarin Chinese or Lingala. The situation becomes more complex because of the way intonation interacts with stress, pitch accent, and tone. Thus English uses a sharply falling intonation pattern on words that are being emphasized in statements, for example, when contradicting the hearer, as in the following conversation.

A: That cookie looks good.
B: It's not a cookie; it's a candy.

The first syllable of candy would be pronounced more loudly and with the pitch falling to the next syllable: candy.

In Japanese, the second part of B's line would be

ame da yo
candy is assrt
'It's (a) candy.'

The word yo at the end of the sentence makes the sentence more assertive.

The word ame means 'candy', but its lexical pitch pattern is one that rises from the first to second syllable: ame. In this emphatic context, the word would be pronounced more loudly, but the pitch pattern would not change because it is part of the word, though the pitch rise might be more exaggerated:

ame da yo. 'It's candy.'

An English speaker learning Japanese will tend to use the characteristic falling English intonation here to signal the emphasis on ame:

ame da yo.

But ame with pitch falling from the first to second syllable is a different Japanese word, meaning 'rain', so the English speaker's Japanese would come out as 'it's not a cookie; it's rain'. Here's what the last part of that sentence sounds like in Japanese:

ame da yo. 'It's rain.'

Japanese does not have many minimal pairs such as this differing only in their pitch pattern, but pitch errors of English-speaking learners could still be expected to make their language less comprehensible, especially when combined with other errors at the level of individual phones.

TOC Email author Printer Friendly Next