2.3 Results and discussion
2.3.1 Results and discussion of the Ningbo normal-length vowels
2.3.1.1 An overview
Figures 2.3a and 2.3b show the results of F1 and F2 values from 10 female and 10 male Ningbo native speakers respectively, which were plotted in a two-dimensional acoustic space (F2 against F1) with the origin of the axes to the top right of the plot.In the figures, each IPA symbol represents a data point.The data points have been converted to the bark scale, in order to better approximate the perceptual distances.The scale on the ordinate is double that on the abscissa in order to give appropriate prominence to F1 and make the plots more in accord with the auditory judgments of vowels.However, the values along the axes still correspond to the original values in Hertz.Vowel ellipses with radii of two standard deviations were drawn along axes oriented along the principal components of each vowel cluster (Disner, 1983).Figures 2.4a and 2.4b are the same vowel ellipses plotted in an acoustic F1/F2 vowel plane as in Figures 2.3a and 2.3b respectively, with the individual data points being removed.The means and standard deviations of the values of the first three formants for each of the Ningbo normal-length vowels are summarized in Table 2.2[2].
Figure 2.3a:Ningbo vowels (data from 10 female speakers).
Figure 2.3b:Ningbo vowels (data from 10 male speakers).
Figure 2.4a:Ningbo vowel ellipses (data from 10 female speakers).
Figure 2.4b:Ningbo vowel ellipses (data from 10 male speakers).
Table 2.2:Means and SDs (in Hz) of the frequency values of the first three formants of the Ningbo normal-length vowels.
1 For the female speakers, the sample number for F3 is smaller than for the corresponding F1 and F2 for [u] and [o].This is because, as pointed out by Fant (1956), reliable measurements for F3 are nearly impossible for some sampled tokens, due to the low energy and the wide bandwidth (sometimes larger than 1,000 Hz).
It can be seen from the figures that there is a good correlation between the linguistic vowel height and vowel backness and the F1 and F2 values respectively, in that, the high vowels have a smaller F1 value and the back vowels have a smaller F2 value.The figures also show that the data from the female and male speakers display a similar pattern in the acoustic vowel charts in terms of vowel height and vowel backness.In terms of the location in the acoustic vowel space, [i y ʏ u] can be characterized as high, [e ø o] as mid-high, [ɛ ɔ] as mid-low, and [a] as low.Furthermore, [i y ʏ e ø ɛ] are front, [u o ɔ] are back, and [a] is central.This is true for both female and male speakers.
Meanwhile, a comparison of the data for the female and male speakers in the figures and table shows that there are obvious differences in the formant value and the ellipse position between the two speaker groups as well.Due to the physiological differences, formant values are usually greater for female speakers than for male speakers.As a result, the vowel ellipses for the female speakers occupy much larger spaces than those for the male speakers in the acoustic vowel plane.In other words, the vowel ellipses for the female speakers have a more peripheral distribution.Along the F1 dimension, the ellipses for the non-high vowels are much lower for the female speakers than for the male speakers.For instance, the F1 value of the females [e] is 168 Hz larger than the males, and is even larger than the males [ɛ].Interestingly the high vowels show little differences in F1 or vowel height between the females and males.Along the F2 dimension, all the vowel ellipses for the females are located toward the left lower corner in the acoustic vowel plane, since all the vowels have a greater F2 value for the female speakers than for the male speakers.However, it should be pointed out that the F2 difference is greater for the front vowels than for the back vowels; for instance, the F2 difference for [ɛ] between the female and male speakers is 398 Hz, whereas the F2 difference for [ɔ] is merely 150Hz between the female and male speakers.A detailed interpretation of the formant value differences between the female and male speakers will be presented in 2.3.4.
Figures 2.5a-2.5t show the data points and ellipses for the 10 normal-length vowels in the F1/F2 plane for each individual Ningbo speaker.As can be seen from the figures, the vowel ellipses in each individual speaker basically display a pattern similar to that in the pooled data, though the differences in the spacing of the vowel ellipses is quite obvious among individual speakers[3].
2.5a:Female speaker 1
2.5b:Female speaker 2
2.5c:Female speaker 3
2.5d:Female speaker 4
2.5e:Female speaker 5
2.5f:Female speaker 6
2.5g:Female speaker 7
2.5h:Female speaker 8
2.5i:Female speaker 9
2.5j:Female speaker 10
2.5k:Male speaker 1
2.5l:Male speaker 2
2.5m:Male speaker 3
2.5n:Male speaker 4
2.5o:Male speaker 5
2.5p:Male speaker 6
2.5q:Male speaker 7
2.5r:Male speaker 8
2.5s:Male speaker 9
2.5t:Male speaker 10 Figures 2.5a-t:Ningbo vowel ellipses (Data from 20 individual speakers).
2.3.1.2 Vowel height
As introduced in Chapter 1, vowel features such as height and backness are initially articulatory terms (Jones, 1909, 1956).But later, a number of researchers found that vowel height and backness do not correspond to the articulatory reality (Meyer, 1910; Russel, 1928; Ladefoged et al., 1972); rather they are correlated with vowel acoustics (Ladefoged, 1967, 1971, 1975, 1976).The latter view has been widely accepted by phoneticians.In this connection, vowel height and backness are viewed as ‘abstract’ phonological features in this study.They are not equal to tongue height and tongue backness, as defined in traditional terms.One major task of the present study is to review these vowel features with the obtained acoustic and articulatory data.
As schematized in Figure 2.1, the Ningbo vowels are expected to have five levels of vowel height according to the traditional descriptions in the past studies by the dialectologists in China (Chao, 1928; Tang et al., 1997).However, this is not supported by the vowel formant frequency data presented here.As can be observed from the vertical, i.e., F1 dimension of the acoustic vowel plane in Figures 2.3a, 2.3b, 2.4a, and 2.4b, there are only four phonetically distinguishable levels of acoustic vowel height.The so-called semi-high rounded vowel [ʏ] (Chao, 1928; Tang et al., 1997) does not constitute its own level of vowel height; rather it belongs to the high level group (see 2.3.1.3 for a discussion of the effect of lip rounding on formants and 4.3.4 for a discussion of the articulation of lip rounding).As shown in Figure 2.5a, the vowel ellipse for [ʏ] is lower than that for the unrounded [i] for Female Speaker 1, but this should not be taken to be a counterexample, since the other rounded high vowel [y] is also as low as [ʏ] for this speaker.It is likely therefore that this is an idiosyncratic characteristic of lip rounding of high vowels for this speaker, namely the rounded high vowels are relatively lower than the unrounded counterpart.
Phonologically speaking, one may argue that there are only three levels of vowel height distinctions in Ningbo, because the low vowel [a] does not contrast with any other vowel in the high-back dimension (Lindau, 1978).Acoustically, four levels of phonetic vowel height are detected in the F1/F2 plane.The pooled data in Figures 2.3a-b and 2.4a-b show that both the front and back vowels, by and large, distribute quite equidistantly along the F1 dimension, while the individual data in Figures 2.5a-2.5t show speaker variation in the spacing of vowel ellipses.It can be observed from the figures that the low vowel [a] always keeps a certain distance from its neighboring mid-low vowel [ɛ] or [ɔ] in all speakers, which suggests that the production of the low vowel is relatively stable.But the other vowels, either front vowels or back vowels, show variations in acoustic vowel spacing among speakers.Leaving the front rounded vowels aside, the relative distance between the non-low vowels in either front or back series are summarized in Table 2.3.It should be pointed out that the tendency for [e] and [ɛ] to merge is observed in the speech of Male Speakers 6 and 9, which however is an isolated case.Thus, such cases were excluded from the summary.
Table 2.3:Acoustic vowel spacing for the non-low vowels in Ningbo.
Results in Table 2.3 show that only a few speakers exhibit a quasi-equidistant distribution for the non-low vowels along the front or back dimension in the acoustic vowel plane.[4] For the other speakers, the non-low vowels show relative closeness to their neighboring vowels along the F1 dimension.For nine out of ten female speakers and two out of eight male speakers, [e] and [ɛ] exhibit a closer spacing than [e] and [i], while for only three male speakers, [e] and [i] exhibit a closer spacing than [e] and [ɛ] in the acoustic F1/F2 plane.For five female speakers and seven male speakers, [o] and [u] are closer than [o] and [ɔ], while for just five female speakers, [o] and [ɔ] are closer.There is hardly any physiological explanation that accounts for the asymmetry of acoustic vowel spacing.A possible explanation is that in the diachronic perspective, the vowel system of Ningbo Chinese is at an initial stage of structural change.That is, vowels [e] and [ɛ] as well as the vowels [o] and [u] have shown a tendency to merge for some speakers [5].Independent evidence comes from Shanghai Chinese, a dialect with a close genetic affinity to Ningbo.The Shanghai vowel system can be schematized as shown in Figure 2.6.
Figure 2.6:The schematized Shanghai vowel system.
As can be seen from a comparison between Figure 2.6 and Figure 2.1, the Shanghai vowel system looks like a simplified version of the Ningbo system, in that [e ɛ] have merged to become [E].And in the new variety of the Shanghai dialect spoken by the young generation under 30 years of age, [o] has merged into [u].
2.3.1.3 Vowel backness
Another primary dimension concerning the description of the variations of vowel quality is vowel backness.Ningbo Chinese, like many of the world’s other languages, makes limited use of the front-back dimension vis-à-vis the height dimension, namely it contrasts just by being front and back.The low vowel [a] does not make such a distinction.All the back vowels are rounded by default, while the front vowels are not necessarily unrounded.For the mid-low vowels, vowel backness is predictable from lip rounding, as the unrounded vowel is front and the rounded vowel back.The high and mid-high vowels do contrast in vowel backness, since a rounded vowel with the same vowel height can be front or back.
As can be seen from the acoustic F1/F2 vowel plane in the figures, there is a good correlation between vowel backness and the F2 value, where the front vowels have a greater F2 value, while the back vowels a smaller F2 value.Meanwhile, the F2 value for the front vowels decreases, as the vowel height decreases, whereas the F2 value for the back vowels increases, as the vowel height decreases.For the front vowels, the decrease in the F2 value for both the unrounded and rounded vowels is mainly attributed to the jaw-opening effect.As for the back vowels, it is mainly due to the universal relationship between lip rounding and vowel height, namely higher vowels are usually more rounded than lower vowels and thus have a smaller F2.
2.3.1.4 Lip rounding
Although the majority of the world’s languages have a predictable relationship between the phonetic vowel backness and rounding dimensions, namely front vowels are usually unrounded and back vowels rounded (Lindau, 1975, 1978; Maddieson, 1980; Ladefoged & Maddieson 1990), rounded-unrounded distinction is found in the front vowels in Ningbo Chinese.Acoustically, lip rounding lowers all formant frequencies, especially higher formants such as F2 and F3, under all conditions.This is because lip protrusion lengthens the tube-like articulatory cavity and decreases the size of lip orifice (Stevens & House, 1955; Fant, 1960).F2 lowering effect of lip rounding is shown in the F1/F2 planes in the figures in 2.3.1.1, i.e., the vowel ellipses for [y ʏ] and [ø] are located adjacently to the right of their unrounded counterparts [i] and [e], respectively.
As mentioned in 2.3.1.2, [y] and [ʏ] do not contrast in vowel height.They are both high rounded vowels.From the pooled data plotted in the F1/F2 plane in Figures 2.3a-b and 2.4a-b, we can see that vowel ellipses for [y] and [ʏ] overlap extensively with each other, and the only difference is that the central point (i.e., the mean) of the ellipse for [ʏ] distributes somewhat to the left of that for [y].Even in the individual data as shown in Figures 2.5a-2.5t, for most speakers, the vowel ellipses for [y] and [ʏ] also overlap with each other, which is uncommon, because vowel ellipses usually do not overlap for an individual speaker.Nevertheless, to some extent and for most of the speakers, the central point of the vowel ellipses for [ʏ] is always located to the left of that for [y], where the ellipses for [y] and [ʏ] overlap.And for three female speakers, FS1, FS2, and FS8, where the ellipses of [y] and [ʏ] do not overlap, [ʏ] is located to the left of [y] in the acoustic vowel plane.The only counterexample is from Male Speaker 4, where [ʏ] is located to the right of [y].To summarize, spectral characteristics indicate that both [y] and [ʏ] are high rounded vowels and [y] has a smaller F2 value than [ʏ].Consequential questions arise naturally.Is there a three-way distinction of lip rounding in Ningbo? If so, how? And why does one rounded vowel have a smaller F2 value than the other?
As pointed out by Ladefoged and Maddieson (1990:100), “there are no clear-cut cases of three contrastive types of rounding”, and a controversial case may be found in Swedish.The Swedish vowel [ʉ], transcribed with a central-high rounded IPA symbol in the literature, brings about a long history of debate about whether the vowel is a high front vowel as [i] or [y] with a different lip gesture (Sweet, 1877, 1879; Malmberg, 1956; Fant, 1973).Fant (1973) provided x-ray data of [i y ʉ] and suggested that Swedish may have three degrees of lip rounding.I once asked two native Swedish speakers to produce the Swedish vowels [i y ʉ u] and I concentrated my observation on the lip gestures.My impression is that [ʉ] sounds like [y], whereas the lip gestures for [ʉ] and [u] are similar.It seems that the Swedish [ʉ] is produced with a tongue configuration similar to [y] and a lip gesture similar to [u].Lindau (1975, 1978) and Ladefoged & Maddieson (1990:101, 102, 120) proposed two lip rounding parameters for vowels, namely vertical lip compression as in the production of the Swedish [ʉ] and [u] and horizontal lip protrusion as in the production of the Swedish [y].A more detailed cross-linguistic study of lip gestures in vowel production is in Linker (1982).In the case of Swedish rounded high vowels, one articulatory setting of lip gestures, compression vs.protrusion, exactly captures the articulatory feature, while the other articulatory setting, vertical vs.horizontal, is a concomitant feature.Here I would like to propose that the vertical vs.horizontal articulatory setting of lip gestures is responsible for the three-way distinction of lip rounding in Ningbo.My observation is that lips are protruded horizontally, i.e., along the speaker’s bite plane, in the production of the Ningbo [y], whereas lips are protruded vertically, i.e., approximately being orthogonal to the speaker’s bite plane, in the production of [ʏ][6].
So far the acquired acoustic data is supportive to the proposed hypothesis.Horizontal protrusion as in [y] results in a relatively longer articulatory tube of vocal tract and a comparatively smaller lip opening than vertical protrusion as in [ʏ].The acoustic consequence of this articulatory difference is apparent.According to the acoustic theory of lip rounding (e.g.Stevens & House, 1955; Fant, 1960), F2 decreases when (1) lips are protruded horizontally and consequently the total articulatory cavity is lengthened, or (2) the size of lip orifice decreases.As far as the Ningbo case is concerned, both horizontal and vertical protrusions lengthen the articulatory tube and thus decrease the F2 vis-à-vis their unrounded counterpart [i]; but the F2 lowering effect is not so pronounced in vertical protrusion ([ʏ]) as in horizontal protrusion ([y]) due to a relatively smaller effect on the lengthening of the articulatory tube and to a relatively larger size of lip orifice.The results of a paired t-test show that the F2 difference between [y] and [ʏ] is significant (p < .001 for the pooled female data and p < .05 for the pooled male data).
The phonemic horizontal vs.vertical lip rounding distinction was only found in high front vowels in Ningbo.However, phonetic vertical lip rounding was also observed in the production of the rounded apical vowel [ɥ].Similar acoustic consequence is observed for the rounded apical vowel [ɥ], in that, the vertical lip protrusion leads to a relatively smaller effect on F2 lowering (see 2.3.3 for details).
2.3.2 Results of the Ningbo short vowels
Ningbo does not contrast in the long-short vowel distinction phonemically.The two phonetic short vowels [a o] are allophones of /a o/ respectively.In Ningbo, short vowels have a restricted distribution and they only occur in the checked syllables.As a result, short vowels have a pronounced shorter duration than their normal-length counterparts.Table 2.4 summarizes the mean durations in millisecond and standard deviations (SD) of the short vowels and the corresponding normal-length vowels for ten male and ten female speakers.
In the table we can see that the duration of both short vowels for both male and female speakers is reduced substantially relative to the duration of the same vowel in the (C)V syllables.The short vowels are nearly 50% shorter than the corresponding normal-length vowels.
Table 2.4:Mean durations in ms and SD of the Ningbo short vowels.
As is well known, the reduction of syllable duration may affect vowels’ formant values due to the target vowel undershoot.Figure 2.7 compares the first two formants of the Ningbo short vowels with those of the corresponding normal-length vowels in a F1/F2 plane.And the ellipse for [ɔ] is also drawn in the figures for reference.The vowel symbols for the short vowels are in italics.
Figure 2.7:Vowel ellipses for the Ningbo short and normal-length vowels from 10 female speakers (upper) and 10 male speakers (lower); short vowels in italics.
In the figure we can see that the vowel ellipse for the short [o] is lower and more fronted than that for the normal-length [o].In other words, the short [o] is mid-centralized in the F1/F2 acoustic vowel plane.In fact, the vowel ellipse for the short [o] is closer to that for [ɔ], especially for the male speakers.The short [a] exhibits gender difference.In the male speech, the vowel ellipses for the short [a] and the normal-length [a] overlap extensively, whereas in the female speech, the short [a] shows a tendency of centralization.
So far the pooled data have shown that the short [o] is centralized and occupies a very different position in the F1/F2 plane vis-à-vis the normal-length [o].The data from individual speakers show similar patterns.See Figures 2.8a-2.8t.
As can be seen from the figures, for both individual female and male speakers, the vowel ellipse for the short [o] occupies a lower and more fronted position than that for the normal-length [o] in the F1/F2 plane.Moreover, the vowel ellipses for the short [o] and the normal-length [o] seldom overlap; rather, the vowel ellipses for the short [o] and the normal-length [ɔ] overlap in some cases.
The location for the other short vowel [a] in the F1/F2 plane differs from individual to individual.In the speech of six female speakers (FS1, FS2, FS3, FS4,FS9 and FS10) and one male speaker (MS8), the vowel ellipse for the short [a] occupies a higher position, i.e., centralized, and it does not overlap with that for the normal-length [a].For three female (FS5, FS6 and FS7) and three male speakers (MS7 MS9 and MS10), the vowel ellipses for the short [a] and normal-length [a] overlap, though the short [a] is not so peripheral as the normal-length [a].However, there is little difference between the short [a] and normal-length [a] for the rest of the speakers, i.e., Male Speaker 1, 2, 3, 4, 5, 6, and Female Speaker 8.
Figures 2.8a-2.8t:Vowel ellipses for the Ningbo short vowels (data from 20 individual speakers); short vowels are in italics.
For summary, the data from the individual speakers are consistent with the pooled data presented earlier, i.e., most of the female speakers show a tendency to centralize the short [a], whereas most of the male speakers tend to produce the short [a] similar to the normal-length [a], i.e., there is a less degree of centralization in the short [a] for the male speakers.
The detailed formant frequency results, the means in Hz and standard deviations of the values of the first three formants for the Ningbo short vowels are summarized in Table 2.5.
Table 2.5:Means in Hz and SD of the values of the first three formants for the two Ningbo short vowels [a] and [o] from female and male speakers.
2.3.3 Acoustic characteristics of the Ningbo apical vowels
In the IPA tradition, apical vowels were treated as syllabic consonants or syllabic approximants (Handbook of IPA, 1999).There are mainly two reasons behind this, one phonetic and the other phonological.Phonetically, apical vowels are approximant sounds, as they are produced with the approximated tongue tip and the alveolar ridge as in the alveolar apical vowel [ɿ].Phonologically, an apical vowel is usually preceded by a homorganic obstruent as in the majority of the Chinese dialects.And as a result, apical vowels are usually in a complementary distribution with other vowels, such as [i] or [y].However, there is evidence to support the view that these sounds should be treated better as vowels.Phonetically, an apical vowel resembles the vowel [i] in that they are both produced with the approximation of the articulators and are thus both approximants in nature (Catford, 1977, 1988).Also, frication can be heard when an apical vowel or[i] isdevoiced.Phonologically, apical vowels do have a wide distribution in a number of northern Jiangsu, Anhui, Shandong and South-western Mandarin dialects (Chen & Li, 1996).For instance, in Hefei Mandarin in Anhui province, the alveolar apical vowel can be preceded by an alveolar or bilabial obstruent, or it can occur without any initial consonant, i.e., the apical vowel itself may constitute a monosyllabic word.Phonological distinction does exist between apical vowels and the other phonemic vowels (Hanyu Fangyin Zihui, 2003).Therefore, the only difference between an apical vowel and a non-apical vowel is the articulators involved:instead of the tongue body, the tongue apex is the main active articulator in the production of apical vowels (refer to 4.3.6 for a discussion of the articulation of apical vowels)[7].
The two Ningbo apical vowels [ɿ] and [ɥ] are plotted in the F1/F2 plane in Figures 2.9 and 2.10.The detailed results of the frequency values of the first three formants in Hz, the means and standard deviations, are summarized in Table 2.6.
Figure 2.9:Vowel ellipses with data points for the Ningbo apical vowels [ɿ] and [ɥ] from 10 female speakers (left) and 10 male speakers (right).
Figure 2.10:Vowel ellipses for the Ningbo apical vowels [ɿ] and [ɥ] from 10 female speakers (left) and 10 male speakers (right).
Table 2.6:Means in Hz and SD of the values of the first three formants for the Ningbo apical vowels [ɿ] and [ɥ].
In the figures we can see that the vowel ellipses for both of the two apical vowels occupy a semi-high central position in the F1/F2 plane.This is consistent with the apical vowels in Beijing Mandarin.As is well documented in the literature, apical vowels in Beijing are also high or semi-high central vowels (Howie, 1976; Wu & Lin, 1989; Zee, 2001).And the [+high] nature of apical vowels indicates their affinity with the high vowels from a diachronic perspective.It is typical in the Chinese dialects that apical vowels are historically developed from the high front vowels (Karlgren, 1915-26).As for the Ningbo case, the unrounded apical vowel is developed from [i] (ts-ɿ<*ts-i) and its rounded counterpart from [y] (ts-ɥ<*tʃ-y) (Hu, 2001).
Figures 2.11a-2.11t:Vowel ellipses for the Ningbo apical vowels [ɿ] and [ɥ] from 10 female and 10 male speakers.
A comparison of the formant data of the rounded and unrounded apical vowel shows that F2 for [ɥ] is about 120 Hz lower than that for [ɿ] for the female speakers, and is about 60 Hz lower for the male speakers (refer to Table 2.6).The difference in F3 is even smaller between the rounded and unrounded apical vowels.In the F1/F2 plane, vowel ellipses of the two vowels overlap extensively with each other.Moreover, as shown in Figures 2.11a-2.11t, for half of the individual speakers, both female and male, the vowel ellipses of the rounded and unrounded apical vowels overlap extensively as well.However, based on both the pooled and individual data, it is observed that the vowel ellipse for [ɥ] is located to the right of that for [ɿ], due to a smaller F2 value for [ɥ].This is supportive of the hypothesis of vertical lip protrusion in 2.3.1.4, which has a relatively smaller effect on F2 and F3 lowering than the horizontal lip protrusion.
2.3.4 Vowel normalization and discussion on the sex difference
So far, it has been shown that in general the female and male speakers display a similar pattern of vowel distribution in the F1/F2 vowel plane.However, it is also apparent from the data that there are speaker variations, especially between the female and male speakers’ formant patterns (F-patterns).It is always a fundamental task to seek invariance through substantial surface acoustic variations of speech production in the linguistic phonetic study.Speech variability may result from linguistic context, speech rate, the differences in physical anatomy, sex, age and the emotional state of the speakers (Ladefoged & Broadbent, 1957; Traunmüller, 1988).The procedure of seeking linguistic invariance by eliminating the nonlinguistic factors in speech data is known as “normalization” (Fant, 1968).Since the vowel formants depend on the length and shape of speakers’ vocal tract, a major source of variation in vowel production has been ascribed to speakers’ vocal tract anatomy (Fant, 1960).This kind of physical difference is most salient between female and male speakers, as manifested in the acoustic data discussed so far.Vowel normalization procedures can be roughly divided into auditorily-based (Syrdal & Gopal, 1986; Miller, 1989) and articulatorily-based (Fant, 1973; Nordström & Lindblom, 1975).The auditory normalizations focused on transforming the acoustic formant data into auditory units by using logarithmic scaling, the mel, the Bark, or the König scale.In the previous sections, the vowel formant data were plotted in an auditorily scaled acoustic F1/F2 plane and the vowel ellipses were drawn for the scattered data points to exhibit the distribution of formant values for each vowel.This accounts for part of formant variations and gives us a straightforward and clear picture of how these dispersed formants data with considerable acoustic variations fit into patterns and are of linguistic interest.However, the auditory scaling method does not explain the female vs.male difference very well.In this section, I will discuss the articulatorily-based vowel normalizations and interpret the formant difference between female and male speakers.
There are two different proposals for the articulatorily based normalizations:uniform and nonuniform scaling methods.Uniform scaling is based on the estimate of the length of speakers’ vocal tract (Nordström & Lindblom, 1975).Because the length of a speaker’s vocal tract is inversely proportional to the formant frequencies, the ratio of the length of the average male vocal tract (Lm) to the average female vocal tract (Lf) can be estimated from Equation (2.1):
where F3 female average and F3 male average are an average of the third female and male formants (F3) in vowels with F1 greater than 600 Hz respectively.Then each female formant frequency can be rescaled, i.e., normalized by multiplying the scale factor k.Based on the assumption that overall vocal tract length variations affect all formant frequency values, the same scale factor k is applied to all formants, i.e., in a uniform way.
However, Fant (1968, 1973, 1975) pointed out that the scaling of formant values must be nonuniform because it is observed that male speakers have proportionately greater pharynx length and more pronounced laryngeal cavities than female speakers (see also Chiba & Kajiyama, 1941:188-193).That is, in addition to the overall vocal tract length, the complex formant-cavity relations should also be considered when accounting for the difference of vowel formants between female and male speakers.According to his proposal, a different scale factor, kn, is applied to each individual vowel and individual formant category, as denoted by Equation (2.2).
Using Equation (2.1), the vocal tract ratio of female to male in Ningbo was calculated to be 0.89.The result corroborates previous studies:0.87 for Japanese (Chiba & Kajiyama, 1941), 0.86 for English (Peterson & Barney, 1952), 0.89 for Dutch (van Nierop, Pols & Plomp, 1973; Pols, Tromp & Plomp, 1973), 0.89 for Swedish (Fant, 1975), and 0.87 for Korean (Yang, 1991).However, the uniform scaling method is not adopted in this study, because it is observed that the female to male formant difference in Ningbo is both vowel and formant dependent, i.e., in nonuniform way.
Figure 2.12:Vowel ellipses for the Ningbo normal-length vowels with male data (in thin lines) superimposed to female data (bold IPA symbols in thick lines).
As can be seen in Figure 2.12, the female to male difference varies with different formants in different vowel groups among the Ningbo normal-length vowels, i.e., the high front vowels show little difference in F1 but a considerable difference in F2, the non-high front vowels and low vowels show considerable differences in both F1 and F2, and the back vowels show a considerable difference in F1 while a relatively small difference in F2.The observation is consistent with Fant’s (1973:84) claim that “the female to male relations are typically different in the three groups of (1) rounded back vowels, (2) very open unrounded vowels and (3) close front vowels.”
To examine the female to male formant difference in quantative terms, Fant’s sex factors Kn were calculated using Equation (2.2).The percentage differences (K1, K2 & K3) of F1, F2 and F3 for the Ningbo vowels are shown in Table 2.7 and the results were then plotted in Figure 2.13.
Table 2.7:The difference in percentage (Diff.) (Kn) between female and male formant frequency values (F1, F2 & F3) for the Ningbo Vowels.
Figure 2.13:The female to male formant percentage differences for the Ningbo vowels; K1 (F1 difference, filled circles connected with solid lines), K2 (F2 difference, open circles with dashed lines), and K3 (F3 difference, plus centered circles with dotted lines); [zw] stands for [ɥ].
As can be seen from Table 2.7 and Figure 2.13, the mean difference in K1 is 22.8% (SD = 14.1%), K2 is 20.1% (SD = 5.6%), and K3 is 14.3% (SD = 4.7%).The K values indicate that (i) the first formant has the largest difference between female and male speakers and the greatest vowel-to-vowel variation evidenced by the greatest standard deviation; (ii) the second formant also exhibits a large female to male difference and a certain degree of vowel-to-vowel variation; and (iii) the third formant has the least difference and vowel-to-vowel variation.
The first formant sex factor K1 displays a maximum of about 39% in the mid-high front vowels [e] and [ø]; and the other non-high front vowel [ɛ], low vowels [a aʔ] and the two apical vowels [ɿ ɥ] also show relatively large values of K1.The high vowels generally show the minimal K1 values:-3.2% for [i] (the minus ‘-’ means that the mean F1 value is greater for male speakers than for female speakers), 4.4% for [y], and 1.2% for [u].An exception is that the other rounded high front vowel [ʏ] shows a relatively larger K1 value, 17.1%.The K1 values for the three non-high back vowels vary from 15.5% for [oʔ] to 25.1% for [o].That is, they are, in general, greater than the K-values for the high vowels, and smaller than the K-values for the non-high front vowels and low vowel.
The second formant sex factor K2 is considerably smaller for the back vowels than for the front vowels and low vowel.The two apical vowels also display a large K2 value.Though the third formant, as mentioned above, shows the least sex difference and vowel-to-vowel variation, the data of the third formant sex factor K3 still suggest a vowel class dependency, i.e., the front vowels show a relatively larger K3 value than the low vowel, back vowels and apical vowels.