section banner
3.6  Syllables

3.6 Syllables

We have seen how each spoken language has a set of consonant and vowel categories that are used by its speakers and hearers to distinguish the words of the language. The consonants and vowels in turn are combined into larger units, syllables. Syllables are distinguished from one another in terms of the consonants and vowels that they consist of. But syllables can also be distinguished from one another in other ways, and some of these ways are very commonly used contrastively, that is, to distinguish words from each other. We will look at some of these "suprasegmental" features of language in this section. Languages also differ in terms of how consonants and vowels can be combined into syllables, the "phonotactics" of the language, and we will also look at this property of languages in this section.


Let's go back to our Lexies in an early stage of their word development. They have vowels and consonants, and their word forms consist of one or two syllables. Consider the possible word form /bago/. Without changing the vowels and consonants, how would it be possible to make this pair of syllables into more than one distinguishable word?

We've discussed vowels and consonants, and in the section on phonemes we looked briefly at how they are combined to form syllables such as /pa/, /bi/, and /ne/. We've also discussed the dimensions that distinguish different vowels from each other and the dimensions that distinguish different consonants from each other. Now we consider the dimensions and features that distinguish syllables from each other, independently of the consonants and vowels in them. Since consonants and vowels are sometimes referred to as "segments", these dimensions and features are referred to as suprasegmentals, that is, 'above the segments'.

One property that clearly characterizes a syllable and that could distinguish one syllable from another is loudness. A particular one-syllable word could be spoken more loudly than other words, or a two-syllable word could have one syllable spoken more loudly than the other. In this case we would be concerned with relative rather than absolute loudness. That is, we would only care that the first syllable of a two-syllable word is louder than the second, not that the first syllable has a particular loudness.

Another property of syllables is their length (though this may amount to the same thing as vowel length). One syllable in a word may be held for a longer time than the other(s). Again what seems to matter for language is relative, rather than absolute, length.

Finally syllables may differ from one another in their pitch, that is, the dimension that distinguishes musical notes from one another. Once again, what will we care about is relative pitch; if absolute pitch mattered, as it does in music, women, men, and young children would be unable to achieve the same effects. One syllable can have a higher pitch than another. A syllable can also be characterized by a particular pitch movement, say, rising or falling, rather than a level pitch. Note that movement is a separate dimension from overall relative pitch; a pitch fall could start and end relatively high or relatively low.

The main question that should concern us, because our focus in this chapter is what distinguishes word forms from one another, is whether any of these suprasegmental dimensions is used contrastively. Let's start with English. Consider the two instances of permit in the following sentence.

  1. Without a permit, they wouldn't permit me to participate.
English and Spanish use syllable prominence to distinguish words.

Both of these words would be transcribed with the same set of consonant and vowel phonemes: /pərmɪt/. But they are pronounced differently. There is more "effort" expended on the first than the second syllable in the first word and on the second than the first syllable in the second word. The actual difference may involve loudness, length, and pitch: the first syllable in the first permit is probably louder and longer than the second, and the first syllable probably involves a fall from a relatively high to a low pitch while the second syllable is more or less level and low. The reverse is probably true for the second permit.

This suprasegmental dimension of English is called stress. Because there are words like the two permits in English, we can see that this dimension is contrastive in English. English may have as many as three different values (levels) of stress within a word. I will symbolize them with /'/ before a syllable with high ("primary") stress, /,/ before a syllable with medium ("secondary") stress, and nothing before a syllable with weak stress. Thus the two words we have been discussing would be written /'pərmɪt/ and /pər'mɪt/, and the word constitution would be written /,kɑnstə'tušən/. Other English examples in which stress alone distinguishes words are torment (/'tɔrmɛnt/ and /tɔr'mɛnt/) and survey (/'sərve/ and /sər've/). Spanish also has contrastive stress. For example, the words canto 'I sing' and cantó 'he sang' differ only in stress: /'kanto/ and /kan'to/ respectively.

Now let's look at how pitch alone behaves in some languages, for example, Lingala. Consider the following words, in which /´/ over a character indicates a relatively high pitch and no mark over a character indicates a relatively low pitch.

  1. /moto/ 'person', /motó/ 'head'
  2. /ebóló/ 'piece of cloth', /eboló/ 'skull', /ebólo/ 'group'
  3. /moluka/ 'fishing', /molúka/ 'canoe trip', /molúká/ 'river'
Lingala and Japanese use syllable pitch to distinguish words.

Clearly pitch alone is enough to distinguish words in Lingala. That is, pitch is used contrastively in this language. This use of pitch is called tone. In a tone language such as Lingala, Mandarin Chinese, or one of the thousands of other tone languages of Africa, Asia, or the Americas, each syllable has an associated tone, that is, a pitch level or movement. Each tone language has a small set of tone categories, or tonemes, which are used to distinguish words in the language just as phonemes are (and, as we'll see later, in languages like Lingala also to distinguish grammatical forms). In Lingala the basic tonemes are high and low tone; there are also somewhat marginal rising and falling tones. Note that in a tone language, it is relative pitch that matters. "High tone" means high relative to the pitch of the speaker's voice and to the pitch of the rest of the utterance in which the syllable occurs, not a particular pitch or range of pitches.

Japanese also uses pitch alone to distinguish words, but the system works somewhat differently from that in a tone language like Lingala. In Japanese, for words of a given number of syllables there are a small number of possible pitch patterns. Rather than specify the pitch of every syllable for a given word, we just need to specify (and the learner needs to remember) which of the pitch patterns is used for that word. A language like this is called a pitch accent language. The following examples illustrate the three possible patterns for a two-syllable noun followed by the word wa, which indicates that the meaning of that noun is the "topic" of the sentence. In the transcription, pitch is indicated by the height of the syllables, with low pitch at the level of the line.

  1. [haši wa] 'edge TOPIC'
  2. [haši wa] 'bridge TOPIC'
  3. [haši wa] 'chopsticks TOPIC'

Because these three phrases are distinguished by pitch alone, we can see that pitch is used contrastively in Japanese, as it is in Lingala.

All languages apparently use pitch and loudness in their grammar.

The suprasegmental dimensions of pitch, loudness, and length also play a somewhat different role in languages. Consider the following English sentences, in which the word in boldface is emphasized.

  1. Lois married Clark.
  2. Lois married Clark.
  3. Lois married Clark.
  4. Lois married Clark?
  5. Lois married Clark?
  6. Lois married Clark?

Notice how suprasegmentals (loudness, length, and pitch) are used to emphasize different words in the sentences and to indicate whether the sentence is a statement or a question. These uses of suprasegmentals are referred to as intonation. All human languages appear to use intonation.


Consider the following made-up words, each written both how it might be spelled in English and with phonetic symbols.

  1. glooce /glus/
  2. verm /vərm/
  3. binzle /bɪnzəl/
  4. fkotch /fkɑc/
  5. sreep /srip/
  6. noo /nʊ/
  7. taheh /tɑɛ/
  8. lingg /lɪŋg/

Do all of these seem like possible English words to you? If some don't, what about them seems to be impossible in English?

As we have seen, each spoken language has an "alphabet" of form categories — consonant and vowel phonemes — which are combined to form the syllables that make up words. But languages differ not only in the particular vowel and consonant phonemes they have. They also differ with respect to how the vowels and consonants may be combined to form syllables.

Some English consonants (like /ŋ/) and some English vowels (like /ɛ/) are limited in where they can appear.

Let's start with simple English syllables consisting of a consonant followed by a vowel; I'll abbreviate this as "CV". First, can any consonant appear in the "C" position? Taking the vowel as the constant /o/, certainly all of the following are possible syllables in English: /po/, /bo/, /mo/, /vo/, /to/, /co/, /šo/, /ko/, /lo/, /ro/, /wo/, /ho/. But what about /ŋo/? A complete search of the English lexicon reveals that there are no English words that have syllables beginning with the phoneme /ŋ/. Although other nasal consonants (/m/ and /n/) and other velar consonants (/k/ and /g/) can appear at the beginnings of syllables, English seems to constrain syllables to not begin with the phoneme /ŋ/.

What about the vowels in a CV syllable? Let's be more specific and assume that the syllable is stressed and comes at the end of an English word. Keeping the consonant as the constant /b/, all of the following seem possible: /bi/, /be/, /bu/, /bo/, /bɔ/, /bay/, /baw/, /bɔy/. (For speakers who do not make the distinction between /ɔ/ and /ɑ/, /bɑ/ would also be possible.) But what about the following: /bɪ/, /bɛ/, /bæ/, /bʊ/, /b^/, /bɑ/ (for speakers who distinguish /ɑ/ and /ɔ/)? None of these syllables seems possible. Again there is apparently a sort of prohibition on the kinds of phonemes that can appear in English syllables. In this case, the most efficient way to state the prohibition is to say that English forbids lax vowels, other than /ɔ/, from appearing at the ends of syllables (at least stressed syllables at the end of words). Note that /ɔ/ presents a problem for the generalization; this is one of the ways in which this vowel does not quite fit into the lax/tense, short/long distinction.

Thus English has constraints on the structure of syllables. Such constraints are referred to as phonotactics. It's beyond our goals to go into English phonotactics in detail, but let's investigate a bit further what the bounds are on English syllables.

What about syllables with more than one consonant at the beginning? In general, clusters of consonants not separated by vowels are more difficult for speakers to produce than consonants that are separated by vowels. This is because the articulators must move from one consonant position to another without opening up in between (because the opening would be realized as a vowel). And the difficulty of particular combinations varies considerably. Thus we should expect more constraints on what is possible in clusters than for single consonants. An examination of the English lexicon reveals that the following consonant clusters can appear at the beginnings of General American English syllables (my accent) if we count the semivowels /w/ and /y/ as consonants.

  1. /tw/, /dw/, /kw/, /gw/
  2. /by/, /py/, /my/, /fy/, /vy/, /ky/, /hy/
  3. /pl/, /bl/, /fl/, /kl/, /gl/, /sl/, /šl/
  4. /pr/, /br/, /fr/, /θr/, /tr/, /dr/, /kr/, /gr/
  5. /sp/, /st/, /sk/, /sm/, /sn/, /šp/
  6. /spl/, /spr/, /str/, /skl/, /skr/

We can see some patterns in what is possible. /s/ seems to be special. If we leave it out, we see that all of the clusters end in a sonorant consonant, /w/, /y/, /l/, or /r/. Clusters of three consonants must consist of /s/ followed by a voiceless stop followed by either /l/ or /r/. In fact, for this and other reasons, /l/ and /r/ are often treated as forming a category in their own right.

A consonant can constrain the vowels that precede it.

English has a range of more detailed constraints when it comes to which vowels can occur before which consonants. Consider syllables ending in a vowel and a consonant. Some syllable-final consonants, for example, /t/, permit any English vowel before them. But before /r/, the possibilities are quite restricted. In my accent, only the following vowels are possible before /r/: /ɪ/ (pier), /ɛ/ (pair), /ʊ/ (poor), /ɔ/ (pour), /ə/ (per). In other words, none of the tense vowels may appear before /r/. Another way to see this is as the neutralization of the lax-tense distinction before /r/; that is, the distinction between tense and lax vowels has disappeared in the context before an /r/. Evidence for this is that the vowel in the word pour is actually somewhere between the usual /ɔ/ and the usual /o/ in this accent, and we could actually represent it with either symbol when we are just representing the phonemes of the dialect.

To some extent, the constraints on English syllable clusters seem to be related to what is easy to do. A cluster such as /mk/ or /lpr/, not possible in English, is quite difficult to produce. But the constraints also seem somewhat arbitrary. For example, there is no reason to believe that /šk/, which is not possible in English, is any more difficult than /sk/, which is possible. And /bw/, which does not occur, seems no more difficult to produce than /tw/, which does.

Languages differ a lot in how many syllable types they allow.

If we examine the phonotactics of other languages, we see these same basic properties. In addition, we see that the degree of complexity which is permitted for syllable structure is specific to the language (and varies considerably between languages). Let's look at Japanese. A Japanese syllable can begin with at most one consonant; no consonant clusters are permitted. A Japanese syllable can end with a vowel, or /n/, or, if the syllable is not at the end of a word, with the consonant that begins the next syllable (but only if that consonant is /p/, /t/, /k/, /s/, /š/, or /c/). Thus the first group below includes possible Japanese words, and the second group includes impossible Japanese words.

  1. /e/, /se/, /te/, /ten/, /kantan/, /henkai/, /nattoo/, /makka/
  2. /nat/, /mak/, /bum/, /nas/, /ste/

We can see that Japanese draws the line between what is phonotactically possible and what is not in a very different place than English does.

Spanish fits somewhere between English and Japanese. Spanish permits at most one consonant at the end of a syllable, and this consonant can only be one of the following: /d, s, n, l/. (If you know some Spanish and think that /m/ can appear at the end of a syllable, as in comprender, you're right in a way. Later we'll see that this "m" can be seen as a kind of variant of /n/.) Spanish does permit consonant clusters at the beginnings of words but no clusters beginning with /s/. It does permit two-consonant clusters ending in /l/ and /r/, however, much as English does. And, because Spanish has a number of diphthongs beginning with /w/ and /y/, Spanish syllables permit more clusters at the beginning of syllables ending in /w/ and /y/ than English does, for example, /fw/ and /sy/. In fact, Spanish permits some three-element clusters consisting of two consonants followed by /w/ or /y/, for example, /prw/, as in the word prueba /prweba/ 'test'.

Not surprisingly, there are languages which are more extreme than English in terms of the complexity they permit in syllables, though none of these is among the eight other spoken languages discussed in this book. Among these language with more complex syllables are familiar languages like Russian. To take perhaps the most extreme of all, in the Canadian Indian language Nuxalk, a word may consist of as many as four consonants and no vowel, for example, /sk'st/.

Speaker-oriented and hearer-oriented phonotactics

We have seen that each language has its own idea about what counts as a good syllable; that is, each language has a syllable category. Though we will not have time to go into it, it turns out that syllables are grouped together into higher-level units and that languages also differ in the ways this can be done. Some of the constraints make good sense from the perspective of the Hearer. It turns out that for a hearer, it is easier to distinguish consonants at the beginnings than at the ends of syllables. Thus it should not be surprising that more different consonants are possible at the beginnings than the ends of syllables in many languages; making distinctions that are hard to hear would not serve any function.

What about the constraints that distinguish one language from another? Why do languages seem so different when it comes to phonotactics? These differences seem to be related again to the Hearer-oriented pressure to make syllables more distinctive so that they are easier to distinguish along with the opposing Speaker-oriented pressure to make words easy to pronounce. The more different consonants and consonant clusters are possible at the beginnings of syllables, the more distinct syllables are, but, at the same time, the more difficult syllables become to produce. Different languages have sorted out the conflict in different ways.

So how do languages with relatively constrained syllable structure deal with the need for word forms to be distinct? One strategy is tone; more different syllables are possible if each syllable has an associated tone as well as a sequence of consonants and vowels. Mandarin Chinese is a language with relatively simple phonotactics and hence relatively few possible syllable types, but it compensates this by having four separate tones.

Another strategy is words consisting of more than one syllable. Compare these Japanese and English nouns referring to basic body parts: atama, head; kokoro, heart; karada, body; ashi, leg/foot; te, hand; hana, nose; mimi, ear; mune, chest; koshi, hip. Only one of these Japanese words consists of one syllable, while only one of the English words consists of more than one syllable. Japanese has fewer distinct syllable types than English, so it compensates by making words out of longer sequences of syllables than English does.

TOC Email author Printer Friendly Next