section banner
3.1  Phonemes

3.1 Phonemes

Vowel categories

Each word in a language needs its own form. From the perspective of the Hearer, what should be generally true for these forms? From the perspective of the Speaker, what should be true?

We saw in the last chapter how there may be an infinite range of possible things in the world, even an infinite range of perceivable things, and that people cope with this by categorizing what they experience, by grouping the very large set of possible things into a relatively small set of categories. We saw also that the categories that people have for things are learned, that people seem to be very good at learning categories for what's around them, in particular for the categories that they label with words. There are good reasons for these categories. People can learn a particular response or set of responses for a category and then apply it to all instances of that category. They don't have to start over with each new thing they face. For language this means that Speakers can use a known word for a novel situation and that Hearers can understand known words when applied to novel situations.

Now let's consider the other end of words, their form. Linguistic forms require two channels, one for production, one for perception. The production channel could theoretically be any means we have of creating distinguishable signals. Probably because of constraints related to the human body and how it is controlled by the human nervous system, the only known human production channels are speech, signing, and writing. The perception channel makes use of a sensory system capable of distinguishing the signals produced on the production channel. This sensory system is audition (hearing) for speech and vision for signing and writing.

The simplest sounds we can make

Let's return to our imaginary tribe of Lexies, who are enjoying the communicative advantages brought by the invention of nouns. We've had nothing to say about how these words are produced. Our Lexies could have started with signing, using the hands, arms, upper body, and face, and we'll return to this possibility later. But for now let's assume they started with speech, using the vocal tract,wikipedia the lips, tongue, oral (mouth) cavity, nasal (nose) cavity, throat, larynx (voice box), and lungs. In the beginning the idea is relatively simple, consisting of the following actions.

  1. Opening the oral cavity some amount
  2. Placing the tongue in some position within the oral cavity that allows air to pass through
  3. Tightening the vocal cords in the larynx
  4. Expelling air from the lungs to cause the tightened vocal cords to vibrate

By varying 1 and 2, the Lexies get different sounds. We'll call these sounds vowels.wikipedia Here are some examples of the vowels they get.

Playing around with this mechanism, the Lexies realize that there are very many different sounds, maybe even an infinite number of sounds, that they can produce. Here is what the playing around sounds like.

Vowel categories and word form categories

But what the Lexies require is not an infinite number of word forms; they require exactly one word form for each word. Just as they group the very large number of possible things in the world into a relatively small set of categories, they need to group the very large number of possible word forms into a small set of categories, one for each word. Let's assume that early on in the history of their developing language, they have only four words, two proper nouns naming important individuals in the tribe and two common nouns naming important categories of things in their world, tiger and apple.

Given their vowel mechanism, they could easily develop four separate categories for the four words. Let's say the four categories are centered on vowels 1-4 above; that is, these vowels are the prototypes for the categories. But recall what it means to be a category. Each actual pronunciation of one of the four words is an instance of one of the four word form categories. No two of these instances are likely to ever be exactly the same. So even if I know how to produce the four word forms, there will be some variation. For example, the following might be some possible instances of the word meaning apple, which is centered on vowel 1 above.

The differences between the vowels are very real differences, both in terms of how they are produced and what they sound like, but these differences would not matter to the Lexies because the meaning in each case is the same, apple. On the other hand, the difference between any of the above vowels and the following vowel would matter because this is an instance of the word form that means tiger.

Differences that matter and differences that don't

The vowel in the word for tiger contrasts with the vowels in the three instances of the word for apple because it makes a difference in the meaning of the word. The vowels in the three instances of the word for apple do not contrast with each other, even though they do differ from each other, because they do not make a difference in the meaning of the word.

Given the very simple language at this stage, let's consider what the members of the tribe need to store in long-term memory. To say the words, they need to remember, for each one, roughly how to place their tongue and their jaw, and to recognize the words (in order to be able to understand them), they need to remember, for each one, roughly what it sounds like. I won't go into the acoustics of vowels here; all you need to know is that there are ways in which it corresponds roughly to the placement of the tongue. While this is quite a lot of information to remember for each vowel, there are only four of them, so it does not tax long-term memory too severely.

Syllable and consonant categories

What would the disadvantage be of a language whose word forms consisted only of single vowels?

Now let's see what might happen when the Lexies find the need for more words as they find the need to refer to more and more individuals and categories. They might get to fifteen or twenty words using the approach we've described so far, but at this point they'd begin to run into problems as the categories become more and more similar. From the Speaker's perspective, it will be more and more difficult to produce the sounds accurately enough so that they don't overlap. More importantly, from the Hearer's perspective, it will be more and more difficult to distinguish the sounds as they become closer and closer to one another acoustically.

Clearly the Lexies need a way to come up with more easily distinguishable words. Through experimenting more with their vocal tracts, they discover that they can produce a much larger set of sounds by moving their lips, tongue, and/or jaw during the production of the word. Each of these new sets of sounds is produced by performing the following sequence of actions.

  1. Closing the vocal tract at some point using the lips or tongue.
  2. Beginning the vibration of the vocal cords.
  3. Opening the vocal tract by releasing the contact that was made by the lips or tongue.
  4. Moving the tongue and jaw to the position of one of the vowels as the vocal cords continue to vibrate.

I'll call the new sounds that result from movement of the lips, tongue, and/or jaw, along with the simpler ones that don't involve movement, that is, that consist just of vowels, syllables.wikipedia Here is a syllable that results when the contact at the beginning is made with the lips.

Here are three syllables with the contact made by the tip of the tongue, all with the same vowel.

Here are three syllables with the contact made by the middle and back of the tongue, also with the same vowel.

The important point about these different ways of making the contact is that they result in qualitatively different sounds to hearers. Thus using syllables greatly increases the number of distinguishable words the Lexies have access to. But note that, as with vowels, there is a very large number of possibilities for how the lips and tongue are placed for the contact. Since the Lexies only need a different syllable for each word, what is called for again are categories. Each syllable representing a word is a separate category. To remember a word, then, a Lexie would need to remember how the syllable for that words sounds and how to produce the contact at the beginning and the vowel part at the end. Again this a lot to remember for each syllable, but if there aren't too many, it might be manageable.

Syllable categories consisting of consonant categories and vowel categories

Let's say the Lexies make use of five easily distinguished vowels, and five easily distinguished initial contacts, which we will call consonants.wikipedia In all that gives them 25 (5 × 5) possible syllables. To remember all of these, they'd have to store 25 separate sets of production instructions and acoustic data in long-term memory. But there is a more efficient way; they could treat each of the five contact possibilities as categories and each of the five vowels as categories — only ten things to remember. Then to produce or recognize a syllable, they would only need to combine the information for the consonant and the vowel. And to remember a word form, they would only need to remember what combination of consonant and vowel was used.

Unfortunately it's not quite that simple, especially on the perception (acoustic) end, because the way a consonant sounds depends a lot on the vowel that follows it. This means that it is probably impossible for people to remember the acoustic properties of consonants independently from vowels. And this is why there are people who argue that the basic categories making up word forms are syllables, rather than consonants and vowels.


Why do you think the writing systems of some languages, such as English, Spanish, Lingala, and Tzeltal, use separate characters for the consonants and vowels rather than separate characters for each syllable? (Note: the writing systems of many languages, for example, Amharic, Japanese, Chinese, and Inuktitut, do use separate characters for each syllable.)

Let's recap where the Lexies are. Their words, whose meanings we learned about in the last section, need forms. Each word form needs to be stored in long-term memory; they need to be able to remember how to produce it and what it sounds like. Each word form is a category because it represents a set of possible forms, a very large, possibly infinite, set. Word forms consist of syllables, each representing for the Speaker either a particular open position of the vocal tract (that is, a vowel) or a closing of the vocal tract followed by an opening, ending in a particular position (that is, a consonant followed by a vowel). The most efficient way for the Lexies to remember all of the possible syllable categories is to remember the consonant categories and the vowel categories.

Thus the consonant and vowel categories make up a kind of alphabet to be used in making new words. Given a concept that the Lexies would like to refer to, all they need to do is find a new (previously unused) combination of a consonant and a vowel to make the new word form. The elements of this basic alphabet for linguistic form are called phonemes;wikipedia each vowel or consonant category is a phoneme of the language. The phoneme is one of the most important concepts in modern linguistics, and we will spend a lot of time getting a handle on it. For now, the important points to keep in mind are the following.

Variability within languages and between languages
  1. Each phoneme is a category; that is, it represents a cluster of possible consonant or vowel instances, centered on a prototype.
  2. The way the space of possible sounds (consonants and vowels) is divided up into phonemes is to some extent arbitrary. That is, it can be expected to vary from language to language; the phonemes that a given language has are conventions. What is important is that the phonemes are distinctive enough to be distinguished by hearers. Thus one tribe of Lexies might end up with five vowel phonemes, another tribe might also end up with five vowel phonemes but centered on different sounds from the first tribe, and a third tribe might end up with eight vowel phonemes.
  3. Speakers can produce differences that contrast, that is, that represent different phonemes and make a difference in meaning, and differences that do not contrast, that is, that represent different ways of producing the same phoneme and do not change the meaning.
  4. Phonemes are not letters. Letters are the basic elements of a writing system, which may or may not have been designed to represent the phonemes of a language. But the phonemes of a language such as English are represented only very imperfectly by the English alphabet, as we will see, and other languages, such as Mandarin Chinese, have writing systems that do not even pretend to represent phonemes. Equally importantly, most languages are not written at all.

The word forms in all modern languages can be described in terms of a small set of phonemes, though the set is different for each language (and may vary between dialects of the same language, as we will see for English dialects). The number of phonemes varies from about 13 to about 150; English has about 40, depending on how we count and which dialect we are considering.

Extending the lexicon

More vowels, more consonants, more syllables

With only 25 or so different syllables to choose from, the Lexies will only be able to make use of a very small set of words. There are several ways in which they can extend the set of possible word forms. One is to produce more possible vowels by varying something other than the position of the tongue and opening of the jaw. For example, they could round their lips during the production of the vowel and get a different effect, or they could move their tongue, jaw, and/or lips during the production of the vowel to get a combination of vowel sounds. Another way to extend the number of possible word forms is to produce more possible consonants by varying something other than how the contact is made at the beginning of the syllable. For example, they could make the contact incomplete, allowing some air to pass through the gap that is made, or they could delay the beginning of the vibration of their vocal cords until after the contact is released. A third possibility is to produce more complicated syllables, for example, by allowing the vocal tract to close at the end as well as the beginning of the syllable. A final possibility is to combine syllables to make longer word forms. If there are only 25 possible syllables, there are 625 (25 × 25) possible two-syllable words.

All four of these ways of coming up with more word forms are used in modern languages. We will be discussing them all at length later in this chapter.

Visual/spatial phonemes

There seems to be nothing in sign languages corresponding to vowels and consonants, though there are syllables. What dimensions do you think would distinguish different sign syllables from one another?

Sign languages also have categories of form.

One tribe of Lexies takes a different approach from the others. Instead of using their vocal tracts to produce words, they use their hands and arms. Rather than beginning like the speaking Lexies with static vowel-like patterns to represent the first words, these signing Lexies realize that hand/arm movements would be easier to see than static hand/arm configurations. As with speaking, however, it is clear that there are very many possible movements, and they realize they need movement categories. So they settle on a small set of such categories and use these in combination to make word forms.

Like spoken languages, modern sign languages also have phonemes as the units that words (signs) are built up from. As with the signing Lexies, in modern sign languages, the phonemes consist of movements of the fingers, hands, or arms. Corresponding to dimensions such as the position of the tongue or lips for spoken language are the particular configuration of the hands, for example, which fingers are extended; the position of the hands with the respect to the body; and the direction of movement of the hands, fingers, and arms. Also, as with spoken languages, we can expect the specific set of phonemes to vary from one sign language to another. However, research on the phonemes of sign languages is still relatively new, so there is not yet as much agreement on their properties as there is for spoken languages. In the rest of this chapter, we will be focusing mainly on the phonemes of spoken languages and how they are combined to form syllables and words.

TOC Email author Printer Friendly Next