How Language Works

Mike Gasser
Indiana University

3 Word forms: units (Part 2: 3.4, 3.5, 3.6)

3.4 English consonants

We have seen how the vowels of languages can be described in terms of values on a small number of dimensions, or equivalently, features. In this section we will see that the same holds true for consonants. Languages vary a great deal with respect to how many vowel and consonant phonemes they have, but all languages seem to have more consonants than vowel phonemes. Not surprisingly, more consonant dimensions than vowel dimensions are contrastive for languages. We'll first look at the dimensions and the values on those dimensions that are relevant for English. In the next section, we'll see how the consonants of some other languages differ from those of English.

Stops

With your tongue touching your upper teeth, pronounce a syllable like dah. Then do the same thing with the tip of your tongue touching a position a little further back. Continue moving your tongue back a little at a time, and see how many distinguishable consonants you can produce, some like English, some not. Now try the same thing with the body of your tongue (you can put the tip of your tongue behind your lower teeth to keep it out of the way). The furthest forward you can put it is probably close to the position for the consonant at the beginning of the word church.

Let's start with the simple consonants that the speaking Lexies developed. Recall that these are produced with a complete closure of the vocal tract, blocking the passage of air. Consonants made by completely closing the vocal tract are called stops. As we discussed informally, different consonants can be produced by varying the place where the closure occurs. This consonant dimension is known as place of articulation; we will see later that the place of articulation is also relevant when there isn't a complete closure of the vocal tract. In one sense place of articulation is really six different dimensions because it involves the independent movement of six separate parts of the vocal tract. Each of these structures is called an articulator. The articulators relevant for place of articulation are the lips, the tongue tip, the tongue body, the tongue root, the pharynx (the region behind and below the oral cavity), and the glottis (the gap between the vocal cords). Each of these, except the glottis, is indicated in this figure from the last section.

You get different stops by making the contact at different places in the vocal tract.

Each language makes of use of several places of articulation, usually between three and six, to distinguish its consonant phonemes. In this section, we'll only consider those places that are relevant for English.

There are two possible places of articulation involving the lips as articulators. For bilabial place of articulation, the lips are brought together (or for non-stops, as we'll see later, close together). The first and last consonants in the word bib are bilabial stops. The symbol for this consonant is /b/, so the pronunciation of bib is written /bIb/. A further possible position of the lips is contact between the lips and upper teeth; this is not used for English stops, though it used for other English consonants. It is discussed below.

With one part or another of the tongue as articulator, there is a continuous range of possible places for contact with the roof of the mouth, beginning with the upper teeth and extending back to the uvula at the back of the mouth. All languages apparently make use of at least two positions within this range. For English stops, two positions are relevant. One of these is contact between the tip of the tongue and the ridge that is just behind the upper teeth, the alveolar ridge. This is referred to as alveolar place of articulation. It is a feature of the first and last consonants in the word did. The symbol for these stops is /d/, so the pronunciation of the word did is written /dId/. The configuration of the vocal tract for the pronunciation of /d/ is shown in the figure below.

d

There is a further possibility for contact between the tongue and the roof of the mouth that is used in most languages. The back of the tongue body contacts the region near the back of the roof of the mouth, near the structure called the velum, which I'll have more to say about below. This is called velar place of articulation. It is a feature of the first and last consonants in the word gag. the symbol for these stops is /g/, so the pronunciation of gag is written /gæg/. The position of the vocal tract for the pronunciation of /g/ is shown in the figure below.

g

So English distinguishes the stops /b, d, g/ along the dimension of place of articulation. Other places of articulation are utilized for other English consonants that are not stops and for consonants in other languages.

Now notice what these three stops share. Like all stops, they are produced with complete contact between the articulators. In addition, all of them are accompanied by voicing, that is, vibration of the vocal cords, during the contact. Thus when they appear at the beginning of a word, the voicing starts before the contact is released (and continues through the following vowel), and when they appear at the end of a word, the voicing continues after the contact is made. (Note that the voicing can't start too much before the beginning of a word like bib or stop too much after the end of a word like bib because the air being passed through the vocal cords can't escape, and pressure builds up quickly behind the point of contact, preventing more air from being expelled from the lungs. You can verify this for yourself by trying to pronounce a long /b/, /d/, or /g/ sound without releasing the contact between the articulators.)

You get different stops by varying when you start or stop voicing.

But this is not the only possibility for how the voicing and the beginning and end of the contact can be timed. When the stop consonant comes at the beginning of the word, we get a different effect when the voicing begin after the release of the contact; listen to the difference between the words bay and pay. Similarly at the end of the word, the effect is different if the voicing ends before the contact or roughly at the same time as the contact; compare add and at. The dimension that distinguishes these pairs of words from each other is called voicing. For the moment we will consider only two values for this dimension — voiced and voiceless — but, as we will see later, voicing is actually more complicated than this.

Just as English has voiced stops at the bilabial, alveolar, and velar places of articulation, it also has voiceless stops at these places. The voiceless bilabial stop is illustrated at the beginning and end of the word pep. It is symbolized with /p/, so the pronunciation of pep is written /pεp/. The voiceless alveolar stop is illustrated at the beginning and end of the word tot. It is symbolized with /t/, so the pronunciation of tot is written /tαt/. The voiceless velar stop is illustrated at the beginning and end of the word kick. It is symbolized with /k/, so the pronunciation of kick is written /kIk/. There is also a voiceless stop with its place of articulation at the glottis; this is referred to as glottal place of articulation. For a glottal stop, the vocal cords are brought together, blocking the airstream as for other stops, and then released suddenly. The glottal stop may appear at the beginning of English words that begin with a vowel, and it appears in the middle of the word uh-oh. It is not normally considered an English phoneme, however, because it is not used to create new English words or to distinguish English words from one another. The symbol for a glottal stop in this book is /?/ (the IPA symbol is like a question mark with no dot at the bottom), so the pronunciation of uh-oh is written /?^?o/.

Fricatives and affricates

Produce the syllables /ba/, /da/, /ga/. Now do the same thing but instead of making a complete closure for the stops at the beginnings of the syllables, leave a little gap between the articulators, and see what consonants result. Do the same thing for /pa/, /ta/, /ka/. (Note that the resulting consonants should sound like English only for /da/ and /ta/.)

So far all of the consonants we have looked at have involved a complete closure of the vocal tract, blocking air from passing out. But this is not the only way to make consonants. In fact we need a new dimension for the various possibilities (really a whole cluster of dimensions); this is called manner of articulation. One crucial variable within manner of articulation is the distance between the articulators. For stops, the closure is complete, but there are two further possibilities. One, discussed in this subsection, involves a narrow, but not complete, closure that allows air to pass through the aperture but with accompanying noise. The other, discussed in the next subsection, involves an opening that is wide enough for the air to pass through unimpeded.

Stops and fricatives are different manners of articulation.

Consider what happens when you bite your lower lip with your upper teeth and then blow air out. Unless you're biting too hard, some of the air can pass between your teeth and lip, creating a sound like that at the beginning and end of the word fife. A phone made like this, with an incomplete or approximate closure that permits air to pass through and produces a noisy sound due to the resulting turbulence, is called a fricative. The fricative at the beginning and end of the word fife is voiceless because the fricative sound is not accompanied by voicing. That is, the voicing starts after the vocal tract is opened up for the vowel and stops just before the closure made again at the end of the word. The place of articulation for this consonant is one we didn't encounter for English stops; it is the second of the possible places associated with the lips (in addition to bilabial place of articulation). It is called labiodental place of articulation. The symbol for the voiceless labiodental fricative is /f/, so the pronunciation of the word fife is written /fayf/. English also has a phoneme that is the same as /f/, but voiced. This is the sound at the beginning and end of the word verve. It is symbolized by /v/, so the pronunciation of verve is written /v@rv/.

English has a pair of fricatives at another place of articulation where there are no English stops. Try putting your tongue between your teeth or against the back of your upper teeth and then expelling air from your mouth. Again if the contact is not too tight, some air should pass between your tongue and your teeth, generating turbulence that results in the consonant that appears at the beginning of the word thing and at the end of the word both. The place of articulation for this consonant is called dental place of articulation. The symbol for the voiceless dental fricative is /θ/, so the pronunciation of the word both is written /boθ/. English also has the corresponding voiced phoneme; it is the initial consonant in the word this and the final consonant in the word bathe. It is symbolized by /ð/ in this book, so the pronunciation of bathe is written /beð/.

Although English has no bilabial fricatives, it does have alveolar fricatives. When the tongue is allowed to approach but not quite come in contact with the alveolar ridge, we get the consonants in the word sauce if it is not accompanied by voice and the consonants in the word zoos if it is accompanied by voice. The symbols for these alveolar fricatives are /s/ and /z/, so the pronunciation of sauce is written /s⊃s/, and the pronunciation of zoos is written /zuz/.

Most English fricatives are produced at different places of articulation than English stops.

Somewhat behind the alveolar ridge, it is possible to bring part of the body of the tongue near the roof of the mouth and produce voiceless and voiced fricatives that are distinguishable from /s/ and /z/. The voiceless fricative appears at the beginning and end of the word shush. It is symbolized by /š/, so the pronunciation of shush is written /š^š/. The voiced fricative at this place of articulation is a somewhat marginal phoneme in English, and it does not normally appear at the beginnings of words. It is the consonant in the middle of the word Asia. The symbol for this consonant in this book is /3/ (somewhat like the IPA symbol), so the pronunciation of Asia is written /e3@/. /š/ and /3/ are produced at what is called the postalveolar place of articulation.

English does not have velar fricatives, but it does have a voiceless glottal fricative, produced by making the glottis narrow enough for a breathy sound to be created. This is the consonant at the beginning of the word hot; this phoneme does not occur at the end of English words. It is symbolized by /h/, so the pronunciation of hot is written /hαt/

We have seen that stops involve complete closure, and fricatives involve approximate closure. It is also possible to combine these two by beginning with a complete (stop) closure and ending with an approximate (fricative) closure. Such phones are called affricates. English has two of them, voiced and voiceless affricates produced at the postalveolar place of articulation. The voiceless postalveolar affricate is the first and last consonant in the word church; it is symbolized by /c/ in this book, so the pronunciation of church is written /c@rc/. The voiced postalveolar affricate is the first and last consonant in the word judge; it is symbolized by /j/ in this book, so the pronunciation of judge is written /j^j/. Notice how /c/ is similar to a /t/ followed by a /š/ and how /j/ is similar to a /d/ followed by a /3/; in fact, an alternate way to write these affricates is /tš/ and /d3/.

Sonorants

Produce the syllables /ba/, /da/, /ga/. Now do the same thing but instead of making a complete closure for the stops at the beginnings of the syllables, leave a little gap between the articulators, and see what consonants result. Do the same thing for /pa/, /ta/, /ka/. (Note that the resulting consonants should sound like English only for /da/ and /ta/.)

Pronounce the syllable /ba/ while holding your nose. Now try the same thing, replacing the /b/ with an "m" sound (as in mama). What can you conclude about the difference between /b/ and the sound of "m"? The the same thing with /d/ and the sound of "n". Also try to pronounce the word sing while holding your nose, and notice what happens to the final consonant (written with the letter combination "ng").

English nasal and lateral consonants have a complete stop-like contact in one place, but the air escapes somewhere else.

One way to produce a sonorant consonant is to completely close the oral cavity, just as for a stop, but to open up the nasal cavity, the empty region behind the nostrils and above the oral cavity. This is achieved by lowering the velum, the flap at the back of the roof of the mouth. The nasal cavity and velum are shown in the figure below, in which the vocal tract is configured for the production of the sound at the beginning and end of the word mom. Such phones are called nasal consonants (as we will see later on, there are also nasal vowels). For nasal consonants, the air is allowed to pass through the nasal cavity, but it also resonates in the oral cavity, and the place of articulation (within the oral cavity) distinguishes different nasal consonants from one another. English has three nasal consonant phonemes, at the bilabial, alveolar, and velar places of articulation. The bilabial nasal is the one at the beginning and end of the word mom; it is symbolized by /m/, so the pronunciation of mom is written /mαm/. The alveolar nasal is the one at the beginning and end of the word none; it is symbolized by /n/, so the pronunciation of none is written /n^n/. The velar nasal is the one at the end of the word sing (this phoneme does not appear at the beginning of words in English); it is symbolized in this book by /ŋ/, which is close to the IPA symbol, so the pronunciation of sing is written /sIŋ/.

m

The other way to produce a sonorant is to leave an opening in the oral cavity that is wide enough so that there is none of the noise that characterizes fricatives. These consonsants are called approximants because the approach of the articulators is only approximate. One way to achieve this is to make a complete contact as for a stop consonant but release the air at one or both sides of the tongue. Such a sound is called a lateral approximant. English has one lateral approximant phoneme, with the contact at the alveolar place of articulation. This is the sound at the beginning and end of the word lull; it is symbolized with /l/, so the pronunciation of the word lull is written /l^l/.

A further possibility is for no closure of the oral cavity at all. English has three such consonants. One is produced with the tip of the tongue curled somewhat back and approaching the roof of the mouth behind the alveolar ridge. This is the sound at the beginning and end of the word rear (as pronounced by most North Americans); it is symbolized by /r/. Sounds produced with the tongue curved in this way are called retroflex; we can treat this as a particular place of articulation (though not everyone does).

The two other approximants are produced similarly to high vowels, except that the articulators are usually not brought as close together as they would be for vowels. One of these consonants approaches the vowels /u/ and /U/. It is the consonant found at the beginning of the word we and is symbolized by /w/, so the pronunciation of we is written /wi/. Note that this phoneme has two simultaneous places of articulation: velar, because the back of the tongue approaches the velar region, just as for /u/, and bilabial, because the lips are rounded and brought close together, as for /u/. The other English approximant resembles the vowels /i/ and /I/. It is the consonant found at the beginning of the word you and is symbolized by /y/, so the pronunciation of you is written /yu/. (Note that in IPA, this consonant is symbolized by /j/.) The place of articulation for this consonant is one we haven't seen yet for any consonants; as for the vowel /i/, the middle of the tongue approaches the region in the middle of the roof of the mouth. This is referred to as the palatal place of articulation.

It should be clear by now that the distinction between vowels and consonants is really a matter of degree. The consonants that are the least vowel-like are stops, which involve a complete closure of the vocal tract and cannot be pronounced continuously. A little more like vowels are fricatives, which can be pronounced continuously but which still have the characteristic fricative noise resulting from the narrow opening in the vocal tract. Closest to vowels are sonorants. All of these can be pronounced continously. In fact, the English sonorants /m/, /n/, /ŋ/, /r/, and /l/ can all be pronounced as separate syllables by themselves, in which case they behave something like vowels. We have already seen how the combination /@r/, as in burn, is sometimes treated as a separate English vowel phoneme. We can treat it as a retroflex vowel or a syllabic retroflex consonant; it really doesn't matter for our purposes. We find the syllabic variant of /n/ in the usual pronunciation of the word button; unless the word is pronounced very carefully, there is no /@/ between the /t/ and the /n/. The same is true for the /l/ in saddle. In this book, to indicate that a sonorant is syllabic, a [.] as added after the consonant symbol, as in [sædl.].

Not all phones are unambiguously vowels or consonants.

The phones /w/ and /y/ are the closest of all to vowels. Each has a place and manner of articulation very similar to a high vowel. The main difference between these phones and "true vowels" is that, unlike the other sonorants, they cannot be syllabic; they always require a vowel before or after them to create a syllable. So possible English syllables include [kway], [pyus], and [yaw], but not [kw], [py], or [y]. Because of their similarity to "true vowels", [w] and [y] are sometimes called semivowels. For our purposes (and maybe for anyone's purposes) there will be no point to arguing about whether semivowels are vowels or consonants. When I am emphasizing their consonant properties, I will speak of them as consonants; when I are emphasizing their vowel properties, I will speak of them as vowels.

The tables below are a summary of the English consonants in terms of three dimensions, place of articulation (POT), manner of articulation (MOT), and voicing, but recall that place and manner of articulation are more precisely viewed as multiple dimensions.


/p/ /b/ /m/ /f/ /v/ /θ/ /ð/
POA bil bil bil lbdn lbdn dent dent
MOA stop stop nas fric fric fric fric
Voice vcls vcd vcd vcls vcd vcls vcd

/t/ /d/ /s/ /z/ /n/ /l/
POT alv alv alv alv alv alv
MOT stop stop fric fric nas lat
Voice vcls vcd vcls vcd vcd vcd

/c/ /j/ /š/ /3/ /r/ /y/
POA alvpl alvpl alvpl alvpl rflex pal
MOA affr affr fric fric approx approx
Voice vcls vcd vcls vcd vcd vcd

/k/ /g/ /ŋ/ /w/ /h/ (/?/)
POA vel vel vel vel+bil glot glot
MOA stop stop nas approx fric stop
Voice vcls vcd vcd vcd vcls vcls

The table below gives examples for each English consonant phoneme. As with vowels, each phoneme may vary somewhat from word to word.

/p/ pin, spin, lap
/b/ bin, lab
/m/ man, ham
/f/ fin, if
/v/ vine, live
/θ/ thin, both, ether
/ð/ this, bathe, either
/t/ talked, stone, lit,
/d/ den, lid, hugged
/s/ sin, kiss, lips
/z/ zoo, easy, lose, eggs
/n/ pin, manner, listen
/l/ lip, sell, castle
/c/ church, nature
/j/ gene, jar, gradual
/š/ shin, mission, nation, fish, machine
/3/ leisure, garage (for some speakers)
/r/ rip, narrow, year
/y/ year, cute /kyut/
/k/ kin, call, lick, chemical
/g/ get, anger, leg
/ŋ/ sing, anger, anchor
/w/ witch, which, reward
/h/ hip
/?/ uh-oh /?^?o/

In this section we have seen how the consonants of one language, English, can be described in terms of three dimensions. These three dimensions do most of the work in distinguishing the consonants of the world's languages, but, as we will see in the next section, others are also required. Furthermore, as you can see from the tables above, there are gaps in English, combinations of features on the three dimensions for which there is no English phoneme. For example, English has no velar fricative, either voiced or voiceless. To some extent, this is just an accident of the history of English, and other languages will fill those gaps (and have gaps of their own elsewhere). The next section gives examples.

3.5 Consonants in other languages

Listen to this phrase in Amharic, meaning 'he finished carefully'. See what consonants sounds you can pick out that do not occur in English.

Recall how vowel phonemes in different languages differ from each other. One possibility is that one of the vowel dimensions may be organized differently. For example, the backness dimension has two contrastive values (front and back) in Spanish and Japanese but three contrastive values (front, central, and back) in English (and Amharic). A second possibility is that there is a gap in one system that is filled in the other. For example, Spanish and Japanese have low vowels and front vowels, but they have no low front vowel (/æ/), whereas English does have such a vowel. A third possibility is that one language may use a dimension contrastively which is not used contrastively at all in other languages. Thus the dimension of tenseness distinguishes English vowels from one another (/i/ from /I/ and /u/ from /U/), while this dimension is irrelevant for Spanish and Japanese vowels.

Organization of dimensions

As with vowels, a language may make fewer distinctions on a given dimension than other languages make. Consider Lingala, which, like English, has bilabial, alveolar, and velar stops, nasals, and fricatives, but, unlike English, Spanish, Japanese, Amharic, and Tzeltal, makes no use of the postalveolar place of articulation. That is, Lingala has no phonemes like /c/, /j/, /š/, and /3/.

How the same phonemic symbols (/d/) can stand for somewhat different sounds

Another possibility is that two languages make the same number of distinctions along a dimension but not the same distinctions. Consider place of articulation for stops, affricates, and nasals. English stops, affricates, and nasals (other than the marginal glottal stop) appear at four places of articulation: bilabial (/p/, /b/, /m/), alveolar (/t/, /d/, /n/), postalveolar (/c/, /j/), and velar (/k/, /g/, /ŋ/). Spanish and Japanese also have stops and affricates at four different positions, and three of these are roughly the same as for English, but alveolar is replaced by dental place of articulation, that is, with the tongue tip against the upper teeth rather than against the alveolar ridge. (Recall that English has fricatives at this place of articulation (/θ/, /ð/), but no stops or nasals.) When we are concerned only about the phonemes within a language, we can use the same symbols that we use for the English alveolar phonemes — /t/, /d/, /n/ — for the dental phonemes in these languages because it is not to make sure each phone is kept distinct from every other. However, when it is important to make it clear that the place of articulation is dental rather than alveolar, I will use the special symbols [t̪], [d̪], [n̪]. See if you can hear the difference between the alveolar and dental places of articulation in the following syllables: [tata], [t̪at̪a], [dada], [d̪ad̪a].

Now consider voicing. Recall that English consonants are either voiced, with voicing during the production of the consonant, or voiceless, with voicing beginning after or ending before (or simultaneously with) the consonant. Spanish also has voiced and voiceless consonants, but it differs in the details. Listen to your pronunciation of the word pie. The lips are brought together for the /p/. Next the lips are opened with a kind of explosive puff of air (which you can feel if you put your hand in front of your mouth). Then the vocal cords begin to vibrate and the vowel /ay/ is produced. Now listen to the pronunciation of the Spanish word pai, a word borrowed from English with the meaning 'pie': /pai/. The consonant at the beginning sounds something like English /p/, but the release of the lip closure and the beginning of voicing happen almost simultaneously for Spanish, and there is no puff of air. We call the English voiceless stop in pie aspirated, and when we need to distinguish it from consonants like those in Spanish, we use [h] following the consonant symbol, for example, [ph]. So if we want to show the detailed pronunciations of the English and Spanish words, we would write them [phay] and [pay] respectively.

As in English, the Spanish /p/ is distinct from its voiced counterpart, /b/, as in the word vaya, pronounced /baya/. That is, Spanish and English both make a two-way distinction in voicing. The pattern that holds for /p/ and /b/ in the two languages also holds for the other stops and affricates. So English /t/ is aspirated, while Spanish /t/ is not; English /k/ is aspirated, while Spanish /k/ is not. In brief, then, both English and Spanish have voiced and voiceless stops, but for Spanish voiceless stops there is no lag between the release and the voicing as there is for English voiceless stops.

We can see that the voicing dimension is really a continuous dimension with many different possibilities for the relative timing of the release of the consonant closure and the voicing. When we think of voicing in this way, the dimension is sometimes called voice onset time. Voice onset time is illustrated for three bilabial stops in the figure below. The top line shows the closure (single line) and opening (double line) of the lips. Each of the three other lines shows when voicing begins relative to the opening of the lips (the dashed vertical line) for three different stops, [ph], as in English pie; [p], as in Spanish pie; and [b], as in English buy and Spanish vaya.

voice onset time

How learning a language can lead to auditory illusions

The figure shows only three of many possible times for voicing to begin. But both English and Spanish have exactly two categories along this continuum. This means that English and Spanish hearers perceive discontinuity where there is continuity. An English hearer would perceive some of the cases as /p/ and some as /b/. The differences between the different /p/ cases and the differences between the different /b/ cases might not be perceived at all; the /p/s and the /b/s would tend to sound the same. At the same time, the differences between the /p/s and the /b/s would be exaggerated; they would tend to sound more different than they actually are. This phenonemon is referred to as categorical perception. Both Spanish and English hearers experience categorical perception for voice onset time, but the line dividing their categories is in different places. As we will see in the section on phonetic contexts, the situation in English is somewhat more complicated than what we've seen so far; English speakers actually produce a range of voiceless stops that include ones like the Spanish voiceless stops.

You should not be surprised to know that other languages divide up the voice onset time dimension differently from English or Spanish. In Mandarin Chinese there is also a two-way distinction for stops (and affricates), but the distinction is between voiceless aspirated stops (like English [ph]) and voiceless unaspirated stops (like Spanish [p]). In problems later on, we will see how other languages treat voice onset time.

What about manner of articulation, the third major dimension distinguishing consonants? While all languages make use of different manners of articulation, some make use of more possibilities. Recall that to some extent, manner of articulation can be seen as a collection of possible ways of configuring the vocal tract to produce sounds. The possibilities we have seen are stops, fricatives, affricates, nasals, and approximants, including lateral approximants.

Two more ways to make a complete contact between two articulators

There are two other possibilities that involve bringing the articulators into contact, as for stops and affricates. For stops and affricates the articulators are brought together and held there till the release. A different approach is for one articulator to quickly tap against the other but not remain in contact with it. This is known as a tap. This is the manner of articulation used for the second consonant in the Spanish word pero 'but'. For this consonant, the tip of the tongue strikes the alveolar ridge quickly but does not remain there. A similar sound is also used in Japanese and Amharic. When it is not necessary to distinguish with a sound like the English retroflex /r/, it is usual to symbolize the alveolar tap with /r/ too. When it is necessary to distinguish the two sounds, I will use the symbol [ɾ] for the alveolar tap.

The other possibility is for one articulator to be brought quickly in contact with the other several times in succession. This is known as a trill. This is the manner of articulation used for the second consonant in the Spanish word perro 'dog'. The place of articulation is the same as for [ɾ], but in this case the tongue strikes the alveolar ridge several times. A similar sound is also used in Amharic; it appears in the Amharic phrase referred to in the box at the beginning of this section. It is conventional to symbolize the alveolar trill with a double /r/. So the pronunciation of the Spanish word perro is written /perro/.

Gaps

Now consider how some languages fill the gaps of other languages. Notice that in the alveolar place of articulation, English has stops, fricatives, and a nasal. In the bilabial place of articulation, it has stops and a nasal but no fricatives, though there are fricatives in the nearby labiodental place of articulation. In the velar place of articulation, English has stops and a nasal but no fricatives in this or any nearby position. Notice also that while English has bilabial, alveolar, and velar nasals, it has no nasal phoneme in the postalveolar or palatal places of articulation.

These last two gaps are filled in Spanish. Spanish has a voiceless velar fricative, the consonant in the middle of the word México. This sound is symbolized by /x/, so the pronunciation of the word México is written /mexiko/. Spanish also has a palatal nasal, the consonant in the middle of the word año 'year'. This consonant is symbolized in this book by /ŋ/, so the pronunciation of the word año is written /aŋo/. But Spanish has gaps of its own. There is no velar nasal phoneme, a gap which is filled by the English phoneme /ŋ/. And while Spanish has a postalveolar affricate (/c/), it has no postalveolar fricative (/š/), unlike English, Hindi, Japanese, Amharic, and Tzeltal.

New dimensions

A single new dimension with only two values can add a number of new phonemes to a language.

Just as for vowels, one language may make use of a consonant dimension which other languages do not. Two of the languages on our list, Amharic and Tzeltal, have an alternate way of producing stops and affricates. This mode of production involves a buildup of pressure behind the point of contact and an explosive release accompanied by a glottal stop; such consonants are referred to as ejective consonants. I will use a following /'/ to indicate these sounds and will refer to this dimension as glottalization. (Note that we have to consider this a new dimension rather than just a new value for manner of articulation because it is possible with different manners of articulation; there are glottalized stops, fricatives, affricates, and sonorants in different languages.)

Listen to the following contrasts, and see if you can hear the difference between ejectives and plain, non-glottalized stops and affricates: [papa], [p'ap'a], [tata], [t'at'a], [kaka], [k'ak'a], [caca], [c'ac'a]. Glottalization is used contrastively in Amharic and Tzeltal. That is, the sounds /t'/, /c'/, and /k'/ are used, like /t/, /c/, and /k/, to make distinct words in these languages. Three of these sounds appear in the Amharic phrase mentioned in the box at the beginning of this section, which it may be worth transcribing now: /bεt'InIkk'ak'e c'εrrεsε/. To show that glottalization is contrastive in Amharic, we can cite pairs like /kok/ 'peach' vs. /k'ok'/ 'partridge' and /tIl/ 'worm' vs. /t'Il/ 'quarrel'.

This section has not been a complete survey of possible consonants in human languages, or even in the ten languages discussed in this book. There are other places of articulation, other manners of articulation, and even other dimensions. The point has been to show how consonant systems differ along the basic dimensions of place of articulation and manner of articulation and what all languages share: a small set of phonemes produced with different kinds of constrictions within the vocal tract. In the next section we'll see how these consonant phonemes are combined with the vowel phonemes we discussed earlier to form syllables and how languages resemble and differ from each other in how this is done.

3.6 Syllables

We have seen how each spoken language has a set of consonant and vowel categories that are used by its speakers and hearers to distinguish the words of the language. The consonants and vowels in turn are combined into larger units, syllables. Syllables are distinguished from one another in terms of the consonants and vowels that they consist of. But syllables can also be distinguished from one another in other ways, and some of these ways are very commonly used contrastively, that is, to distinguish words from each other. We will look at some of these "suprasegmental" features of language in this section. Languages also differ in terms of how consonants and vowels can be combined into syllables, the "phonotactics" of the language, and we will also look at this property of languages in this section.

Suprasegmentals

Let's go back to our Lexies in an early stage of their word development. They have vowels and consonants, and their word forms consist of one or two syllables. Consider the possible word form /bago/. Without changing the vowels and consonants, how would it be possible to make this pair of syllables into more than one distinguishable word?

We've discussed vowels and consonants, and in the section on phonemes we looked briefly at how they are combined to form syllables such as /pa/, /bi/, and /ne/. We've also discussed the dimensions that distinguish different vowels from each other and the dimensions that distinguish different consonants from each other. Now we consider the dimensions and features that distinguish syllables from each other, independently of the consonants and vowels in them. Since consonants and vowels are sometimes referred to as "segments", these dimensions and features are referred to as suprasegmentals, that is, 'above the segments'.

One property that clearly characterizes a syllable and that could distinguish one syllable from another is loudness. A particular one-syllable word could be spoken more loudly than other words, or a two-syllable word could have one syllable spoken more loudly than the other. In this case we would be concerned with relative rather than absolute loudness. That is, we would only care that the first syllable of a two-syllable word is louder than the second, not that the first syllable has a particular loudness.

Another property of syllables is their length (though this may amount to the same thing as vowel length). One syllable in a word may be held for a longer time than the other(s). Again what seems to matter for language is relative, rather than absolute, length.

Finally syllables may differ from one another in their pitch, that is, the dimension that distinguishes musical notes from one another. Once again, what will we care about is relative pitch; if absolute pitch mattered, as it does in music, women, men, and young children would be unable to achieve the same effects. One syllable can have a higher pitch than another. A syllable can also be characterized by a particular pitch movement, say, rising or falling, rather than a level pitch. Note that movement is a separate dimension from overall relative pitch; a pitch fall could start and end relatively high or relatively low.

The main question that should concern us, because our focus in this chapter is what distinguishes word forms from one another, is whether any of these suprasegmental dimensions is used contrastively. Let's start with English. Consider the two instances of permit in the following sentence.

  1. Without a permit, they wouldn't permit me to participate.
English and Spanish use syllable prominence to distinguish words.

Both of these words would be transcribed with the same set of consonant and vowel phonemes: /p@rmIt/. But they are pronounced differently. There is more "effort" expended on the first than the second syllable in the first word and on the second than the first syllable in the second word. The actual difference may involve loudness, length, and pitch: the first syllable in the first permit is probably louder and longer than the second, and the first syllable probably involves a fall from a relatively high to a low pitch while the second syllable is more or less level and low. The reverse is probably true for the second permit.

This suprasegmental dimension of English is called stress. Because there are words like the two permits in English, we can see that this dimension is contrastive in English. English may have as many as three different values (levels) of stress within a word. I will symbolize them with /'/ before a syllable with high ("primary") stress, /,/ before a syllable with medium ("secondary") stress, and nothing before a syllable with weak stress. Thus the two words we have been discussing would be written /'p@rmIt/ and /p@r'mIt/, and the word constitution would be written /,kαnst@'tuš@n/. Other English examples in which stress alone distinguishes words are torment (/'t⊃rmεnt/ and /t⊃r'mεnt/) and survey (/'s@rve/ and /s@r've/). Spanish also has contrastive stress. For example, the words canto 'I sing' and cantó 'he sang' differ only in stress: /'kanto/ and /kan'to/ respectively.

Now let's look at how pitch alone behaves in some languages, for example, Lingala. Consider the following words, in which /´/ over a character indicates a relatively high pitch and no mark over a character indicates a relatively low pitch.

  1. /moto/ 'person', /motó/ 'head'
  2. /ebóló/ 'piece of cloth', /eboló/ 'skull', /ebólo/ 'group'
  3. /moluka/ 'fishing', /molúka/ 'canoe trip', /molúká/ 'river'
Lingala and Japanese use syllable pitch to distinguish words.

Clearly pitch alone is enough to distinguish words in Lingala. That is, pitch is used contrastively in this language. This use of pitch is called tone. In a tone language such as Lingala, Mandarin Chinese, or one of the thousands of other tone languages of Africa, Asia, or the Americas, each syllable has an associated tone, that is, a pitch level or movement. Each tone language has a small set of tone categories, or tonemes, which are used to distinguish words in the language just as phonemes are (and, as we'll see later, in languages like Lingala also to distinguish grammatical forms). In Lingala the basic tonemes are high and low tone; there are also somewhat marginal rising and falling tones. Note that in a tone language, it is relative pitch that matters. "High tone" means high relative to the pitch of the speaker's voice and to the pitch of the rest of the utterance in which the syllable occurs, not a particular pitch or range of pitches.

Japanese also uses pitch alone to distinguish words, but the system works somewhat differently from that in a tone language like Lingala. In Japanese, for words of a given number of syllables there are a small number of possible pitch patterns. Rather than specify the pitch of every syllable for a given word, we just need to specify (and the learner needs to remember) which of the pitch patterns is used for that word. A language like this is called a pitch accent language. The following examples illustrate the three possible patterns for a two-syllable noun followed by the word wa, which indicates that the meaning of that noun is the "topic" of the sentence. In the transcription, pitch is indicated by the height of the syllables, with low pitch at the level of the line.

  1. [haši wa] 'edge TOPIC'
  2. [haši wa] 'bridge TOPIC'
  3. [haši wa] 'chopsticks TOPIC'

Because these three phrases are distinguished by pitch alone, we can see that pitch is used contrastively in Japanese, as it is in Lingala.

All languages apparently use pitch and loudness in their grammar.

The suprasegmental dimensions of pitch, loudness, and length also play a somewhat different role in languages. Consider the following English sentences, in which the word in boldface is emphasized.

  1. Lois married Clark.
  2. Lois married Clark.
  3. Lois married Clark.
  4. Lois married Clark?
  5. Lois married Clark?
  6. Lois married Clark?

Notice how suprasegmentals (loudness, length, and pitch) are used to emphasize different words in the sentences and to indicate whether the sentence is a statement or a question. These uses of suprasegmentals are referred to as intonation. All human languages appear to use intonation.

Phonotactics

Consider the following made-up words, each written both how it might be spelled in English and with phonetic symbols.

  1. glooce /glus/
  2. verm /v@rm/
  3. binzle /bInz@l/
  4. fkotch /fkαc/
  5. sreep /srip/
  6. noo /nU/
  7. taheh /tαε/
  8. lingg /lIŋg/

Do all of these seem like possible English words to you? If some don't, what about them seems to be impossible in English?

As we have seen, each spoken language has an "alphabet" of form categories — consonant and vowel phonemes — which are combined to form the syllables that make up words. But languages differ not only in the particular vowel and consonant phonemes they have. They also differ with respect to how the vowels and consonants may be combined to form syllables.

Some English consonants (like /ŋ/) and some English vowels (like /ε/) are limited in where they can appear.

Let's start with simple English syllables consisting of a consonant followed by a vowel; I'll abbreviate this as "CV". First, can any consonant appear in the "C" position? Taking the vowel as the constant /o/, certainly all of the following are possible syllables in English: /po/, /bo/, /mo/, /vo/, /to/, /co/, /šo/, /ko/, /lo/, /ro/, /wo/, /ho/. But what about /ŋo/? A complete search of the English lexicon reveals that there are no English words that have syllables beginning with the phoneme /ŋ/. Although other nasal consonants (/m/ and /n/) and other velar consonants (/k/ and /g/) can appear at the beginnings of syllables, English seems to constrain syllables to not begin with the phoneme /ŋ/.

What about the vowels in a CV syllable? Let's be more specific and assume that the syllable is stressed and comes at the end of an English word. Keeping the consonant as the constant /b/, all of the following seem possible: /bi/, /be/, /bu/, /bo/, /b⊃/, /bay/, /baw/, /b⊃y/. (For speakers who do not make the distinction between /⊃/ and /α/, /bα/ would also be possible.) But what about the following: /bI/, /bε/, /bæ/, /bU/, /b^/, /bα/ (for speakers who distinguish /α/ and /⊃/)? None of these syllables seems possible. Again there is apparently a sort of prohibition on the kinds of phonemes that can appear in English syllables. In this case, the most efficient way to state the prohibition is to say that English forbids lax vowels, other than /⊃/, from appearing at the ends of syllables (at least stressed syllables at the end of words). Note that /⊃/ presents a problem for the generalization; this is one of the ways in which this vowel does not quite fit into the lax/tense, short/long distinction.

Thus English has constraints on the structure of syllables. Such constraints are referred to as phonotactics. It's beyond our goals to go into English phonotactics in detail, but let's investigate a bit further what the bounds are on English syllables.

What about syllables with more than one consonant at the beginning? In general, clusters of consonants not separated by vowels are more difficult for speakers to produce than consonants that are separated by vowels. This is because the articulators must move from one consonant position to another without opening up in between (because the opening would be realized as a vowel). And the difficulty of particular combinations varies considerably. Thus we should expect more constraints on what is possible in clusters than for single consonants. An examination of the English lexicon reveals that the following consonant clusters can appear at the beginnings of General American English syllables (my accent) if we count the semivowels /w/ and /y/ as consonants.

  1. /tw/, /dw/, /kw/, /gw/
  2. /by/, /py/, /my/, /fy/, /vy/, /ky/, /hy/
  3. /pl/, /bl/, /fl/, /kl/, /gl/, /sl/, /šl/
  4. /pr/, /br/, /fr/, /θr/, /tr/, /dr/, /kr/, /gr/
  5. /sp/, /st/, /sk/, /sm/, /sn/, /šp/
  6. /spl/, /spr/, /str/, /skl/, /skr/

We can see some patterns in what is possible. /s/ seems to be special. If we leave it out, we see that all of the clusters end in a sonorant consonant, /w/, /y/, /l/, or /r/. Clusters of three consonants must consist of /s/ followed by a voiceless stop followed by either /l/ or /r/. In fact, for this and other reasons, /l/ and /r/ are often treated as forming a category in their own right.

A consonant can constrain the vowels that precede it.

English has a range of more detailed constraints when it comes to which vowels can occur before which consonants. Consider syllables ending in a vowel and a consonant. Some syllable-final consonants, for example, /t/, permit any English vowel before them. But before /r/, the possibilities are quite restricted. In my accent, only the following vowels are possible before /r/: /I/ (pier), /ε/ (pair), /U/ (poor), /⊃/ (pour), /@/ (per). In other words, none of the tense vowels may appear before /r/. Another way to see this is as the neutralization of the lax-tense distinction before /r/; that is, the distinction between tense and lax vowels has disappeared in the context before an /r/. Evidence for this is that the vowel in the word pour is actually somewhere between the usual /⊃/ and the usual /o/ in this accent, and we could actually represent it with either symbol when we are just representing the phonemes of the dialect.

To some extent, the constraints on English syllable clusters seem to be related to what is easy to do. A cluster such as /mk/ or /lpr/, not possible in English, is quite difficult to produce. But the constraints also seem somewhat arbitrary. For example, there is no reason to believe that /šk/, which is not possible in English, is any more difficult than /sk/, which is possible. And /bw/, which does not occur, seems no more difficult to produce than /tw/, which does.

Languages differ a lot in how many syllable types they allow.

If we examine the phonotactics of other languages, we see these same basic properties. In addition, we see that the degree of complexity which is permitted for syllable structure is specific to the language (and varies considerably between languages). Let's look at Japanese. A Japanese syllable can begin with at most one consonant; no consonant clusters are permitted. A Japanese syllable can end with a vowel, or /n/, or, if the syllable is not at the end of a word, with the consonant that begins the next syllable (but only if that consonant is /p/, /t/, /k/, /s/, /š/, or /c/). Thus the first group below includes possible Japanese words, and the second group includes impossible Japanese words.

  1. /e/, /se/, /te/, /ten/, /kantan/, /henkai/, /nattoo/, /makka/
  2. /nat/, /mak/, /bum/, /nas/, /ste/

We can see that Japanese draws the line between what is phonotactically possible and what is not in a very different place than English does.

Spanish fits somewhere between English and Japanese. Spanish permits at most one consonant at the end of a syllable, and this consonant can only be one of the following: /d, s, n, l/. (If you know some Spanish and think that /m/ can appear at the end of a syllable, as in comprender, you're right in a way. Later we'll see that this "m" can be seen as a kind of variant of /n/.) Spanish does permit consonant clusters at the beginnings of words but no clusters beginning with /s/. It does permit two-consonant clusters ending in /l/ and /r/, however, much as English does. And, because Spanish has a number of diphthongs beginning with /w/ and /y/, Spanish syllables permit more clusters at the beginning of syllables ending in /w/ and /y/ than English does, for example, /fw/ and /sy/. In fact, Spanish permits some three-element clusters consisting of two consonants followed by /w/ or /y/, for example, /prw/, as in the word prueba /prweba/ 'test'.

Not surprisingly, there are languages which are more extreme than English in terms of the complexity they permit in syllables, though none of these is among the eight other spoken languages discussed in this book. Among these language with more complex syllables are familiar languages like Russian. To take perhaps the most extreme of all, in the Canadian Indian language Nuxalk, a word may consist of as many as four consonants and no vowel, for example, /sk'st/.

Speaker-oriented and hearer-oriented phonotactics

We have seen that each language has its own idea about what counts as a good syllable; that is, each language has a syllable category. Though we will not have time to go into it, it turns out that syllables are grouped together into higher-level units and that languages also differ in the ways this can be done. Some of the constraints make good sense from the perspective of the Hearer. It turns out that for a hearer, it is easier to distinguish consonants at the beginnings than at the ends of syllables. Thus it should not be surprising that more different consonants are possible at the beginnings than the ends of syllables in many languages; making distinctions that are hard to hear would not serve any function.

What about the constraints that distinguish one language from another? Why do languages seem so different when it comes to phonotactics? These differences seem to be related again to the Hearer-oriented pressure to make syllables more distinctive so that they are easier to distinguish along with the opposing Speaker-oriented pressure to make words easy to pronounce. The more different consonants and consonant clusters are possible at the beginnings of syllables, the more distinct syllables are, but, at the same time, the more difficult syllables become to produce. Different languages have sorted out the conflict in different ways.

So how do languages with relatively constrained syllable structure deal with the need for word forms to be distinct? One strategy is tone; more different syllables are possible if each syllable has an associated tone as well as a sequence of consonants and vowels. Mandarin Chinese is a language with relatively simple phonotactics and hence relatively few possible syllable types, but it compensates this by having four separate tones.

Another strategy is words consisting of more than one syllable. Compare these Japanese and English nouns referring to basic body parts: atama, head; kokoro, heart; karada, body; ashi, leg/foot; te, hand; hana, nose; mimi, ear; mune, chest; koshi, hip. Only one of these Japanese words consists of one syllable, while only one of the English words consists of more than one syllable. Japanese has fewer distinct syllable types than English, so it compensates by making words out of longer sequences of syllables than English does.