Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science

See allHide authors and affiliations

Science Advances  11 Dec 2019:
Vol. 5, no. 12, eaaw3916
DOI: 10.1126/sciadv.aaw3916


  • Fig. 1 Source-filter theory.

    Interpreting vowel spectra as the result of the transformation, by the VT, of the glottal source . (A) Source. Top: view from above on the vocal folds; bottom: spectrum of the glottal source signal, with high amplitude in low frequencies. (B) Filter. Top: sagittal section of the VT, with the median line (dotted) where VTL is calculated; middle: VT’s area function; bottom: VT’s acoustic transfer function; black: calculated from the area function; red: extracted by LPC analysis from speech signal. (C) Radiated sound. Top: classic spectrogram of a synthesized vowel, showing formants and their peak frequencies over time [here calculated by Praat (171)]; bottom: vowel’s acoustic spectrum, whose amplitude envelope (dotted line) is imposed on the source signal by the VT transfer function.

  • Fig. 2 Production of the three extreme vowels /i/, /a/, and /u/.

    (Top) Sagittal section (with the median line along which the VTL is measured from glottis to lips) with dots indicating the lips (location of Al) and the vowel’s constriction (with location Xc and area Ac) along the VT. (Middle) Area function, with dots in corresponding locations. (Bottom) Acoustic transfer function, with the lowest formant peaks marked (F1 to F4).

  • Fig. 3 Four-tube MAS, main IPA vowels, and schematic /i a u/ configurations.

    The MAS was set at l = 17.5 cm. In the vowel configuration schemas, the dots show the key points of the VT constriction (for Xc and Ac) and the lip opening (Al). A Helmholtz resonator consists of a body of volume V that is extended by a neck of length L and of area A. Because the frequency of a Helmholtz resonator is proportional to A/(LV), the smaller and longer the neck, and/or the larger the volume, the lower the resonance. The single Helmholtz resonator of /i/ gives it its low F1, and the pair of Helmholtz resonators in /u/ make both F1 and F2 low. Note that the orientation of the F1 and F2 axes in the MAS is standard in speech research to match the preexisting conventional orientation of the vowel triangle in the IPA, defined by tongue position of a speaker facing left. Note also the color scheme of the IPA vowels, which is used here and below for convenience.

  • Fig. 4 American English vowels displayed within the MASs calculated by the four-tube model.

    Dispersion ellipses are for standard vowel values from Peterson and Barney (12) for young speakers, adult females, and adult males, displayed in a MAS for the appropriate VTL, 12.5, 15, and 17.5 cm, respectively.

  • Fig. 5 Anthropomorphic articulatory model.

    (A) Measurement of the VT in the sagittal section with an analysis grid to sample the VT area in (typically) 30 planes transecting the VT. (B) Articulatory command parameters extracted using a principal components analysis of the sagittal section data. These parameters are interpretable with regard to the articulators: for the lips, (1) opening height and (2) protrusion; for the jaw, (3) the opening; for the tongue, the movements of (4) the dorsum, (5) the body, and (6) the apex; and for the larynx, (7) its height.

  • Fig. 6 MVS for the anthropomorphic articulatory model.

    (Top) Vowels of 11 world languages are projected into the maximal vowel space: Chinese (172), Dutch (173), American English (12, 87, 174), French (175, 176), German (177), Japanese (178), Indian Malay (179), Brazilian and European Portuguese (180), Sardinian (181), and Swedish (182, 183). Vowel labels from the original publications are coded according to the colors in Fig. 3. In addition, the low vowels, /æ a ɑ ɒ/, are grouped in a single macro-class. This F1-F2 space was generated by Maeda’s model, and the model is validated by the correct placement of the different languages’ vowels inside the MVS. (Bottom) Fifty sagittal sections (second row) and 50 area functions (third row) for /i a u/ (left to right) were obtained by acoustic-to-articulatory inversion from (F1, F2) values from French. Both are variations around those presented in Fig. 2, with dots in the sagittal sections and the area functions showing the positions of the labial and lingual constrictions, which thus allows computation of mean values for the crucial variables Xc, Ac, and Al. They illustrate, here for an adult male, prototypical VT forms, as well as the differing sensitivities noted in the “Tubes, cavities, constrictions, and acoustic resonances” section for lip area, Al, and tongue constriction position and area, Xc, and Ac. With this method, one can use formant values to generate plausible sagittal sections for all the vowels of the IPA.

  • Fig. 7 Human VT growth.

    (A) Schematic VT illustrating the three pertinent measures: SVTh and SVTv, and the dashed median line where VTL is measured. (B to D) SVTh (B), SVTv (C), and VTL (D), from around birth to 25 years, females in red and males in blue. Results from Barbier et al. (123, 124), as measured from four longitudinal databases, from 1 month to adulthood, using images from the American Association of Orthodontists (68 subjects, 966 x-rays), plus 12 fetal images from Montpellier University are shown. The data are optimized by two double logistic functions to account for growth in two phases, from birth to puberty and then from puberty to adulthood (26, 184). Radiography did not capture the male glottis beyond 15 years, so the function in that range (dotted line) is an estimate. (E) VTs generated by VLAM (cf. VT growth and acoustic normalization section), from a human newborn through an adult male, along with their corresponding MVSs, with the three point vowels /i a u/, plus schwa /Ə/. (F) Color-keyed boundaries of the five MVSs from (E) normalized to 17.5 cm, the mean length for a male at 25 years, with the /i a u/ vowels normalized from: predictions from Goldstein’s model (26) for newborns Embedded Image Embedded Image Embedded Image, the extreme values from an imitation test for infants at 20 weeks (137) Embedded Image Embedded Image Embedded Image, infants at 26 weeks (126) Embedded Image Embedded Image Embedded Image, infants at 28 weeks (129) Embedded Image Embedded Image Embedded Image, infants at 40 weeks (128) Embedded Image Embedded Image Embedded Image, and infants at 66 weeks (127) Embedded Image Embedded Image Embedded Image (plus schwa /Ə/ for reference).

  • Fig. 8 Spectral properties of uniform tubes.

    (A) Acoustic transfer function of a uniform tube where l = 17.5 cm. Its calculation incorporates effects (lip radiation, wall losses, and viscosity) that slightly modify the theoretical formant values (0.5, 1.5, 2.5 kHz, …) (97, 185). The human VT configured as a uniform tube produces the schwa vowel, /Ə/. (B) Wave patterns in the uniform tube. The uniform tube models the VT as a quarter-wave resonator closed at the glottis and open at the lips, so tube acoustics generate formant values at frequencies defined by the odd multiples (1, 3, 5, 7, …) of one quarter of a wavelength, λ, equal to four times the length of the tube. The top of this panel shows the volume velocity wave along the tube for all of the first three formants. Below are the individual wave shapes for each of the first three formants, plotted individually (14). (C) Formant values for pertinent lengths of uniform tubes. This graph shows the formant values for uniform tubes with lengths across a range of VTLs, calculated (in kHz) as Fi = 35(2i − 1)/4ℓ, with lines marked at VTL values known for selected species noted in this paper. These are the formant values that should be expected when a VT of that length is configured as a uniform tube.

  • Fig. 9 F1-F2 MAS plane for men; dispersion ellipses for /u/, /æ/; and the straight line F2 = 3F1 produced by a uniform tube varying across VTL values.

    MAS set to VTL = 17.5 cm, data from Peterson and Barney (12). Points and means correspond to the values for adult males of the formants of /u/ and /æ/ and to the schwa-like /Ə/. This shows that unless the VTL is known, the F2 = 3F1 criterion is insufficient for detecting the schwa-like productions of a uniform tube.

  • Fig. 10 Formant patterns of vocalizations (red lines) diverging from those of uniform tube (green lines).

    The green lines represent calculated estimates of the expected formants from a uniform tube of the appropriate VTL, while the red lines represent either values reported by the authors or means calculated from their figures, adapted here for graphic purposes. VTLs for (A) to (C) were estimated with the Reby and McComb method using five formants for (A) and (B) and six for (C). VTL for (D) was measured from x-rays. (A) Double grunt of Gorilla gorilla beringei [Fig. 1 of (41)]; VTL = 16.4 cm. F1 is much too low for a uniform tube. (B) Grunt of Papio hamadryas adult female [Fig. 2 of (102)]; VTL = 17.3 cm. F2 is much too low for a uniform tube. (C) Roar of Eulemur mongoz [Fig. 1A of (186)] with initial formant transition; VTL = 10.7 cm. Final formant values are compatible with a uniform tube, but the variations of F2, F3, and F4, while F1 stays stable, are not compatible with a uniform tube. (D) Leopard alarm call of male Diana monkey [Fig. 2 of (88)]; VTL = 10 cm. As noted by the authors themselves, neither the F2 values nor the (F1, F2) variations along the trajectory are compatible with a uniform tube.

  • Fig. 11 Phonetic qualities of macaque and baboon articulations within the MAS.

    (Left) Macaque vowel spaces, according to Lieberman et al. (10) (white line) and according to Fitch et al. (142) with the convex hull (black line) enclosing data from vocalizations, facial expressions, and eating. (Right) Vocalizations from Boë et al. (35) with the convex hull (black line) enclosing data from vocalizations. For both sides, all the data were either obtained from or normalized to a reference VT of 11.4 cm [the VTL of the macaque in (142)] and presented along with the dispersion ellipses (color-coded as in Fig. 3) for Peterson and Barney’s data for children (12), also normalized to a VTL of 11.4 cm.

  • Fig. 12 Reframing vintage data in the normalized MAS.

    This figure collects and presents data from a variety of published articles on primate vocalizations. The MAS and all data are normalized to VTL = 11.4 cm [the VTL of the macaque in (142)]. Original (prenormalization) VTLs are determined using the articles’ formant values and Reby and McComb’s method (141) (discussed in the “VTL estimation from formant patterns” section) except VTLs for chacma baboon males: estimated from known VTLs of other baboon species; chacma baboon females: 25% shorter than males, because formants are 25% higher (187); and Diana monkeys: from radiography (88). Vowel dispersion ellipses for /i ɪ æ ɔ ʊ u/ are from Peterson and Barney’s data for children (12). Reframed primate data, starting from /ɪ/ and proceeding clockwise, are as follows: threat calls from rhesus macaque (Macaca mulatta) subjects 73 to 91 (Fig. 4) (43); double grunts of mountain gorillas (G. gorilla beringei) [table II of (41)]; grunts of chacma baboon (Papio hamadryas ursinus) males (larger bold light green ellipse) and females (smaller bold light green ellipse) [Fig. 1B of (88), data collected for (44), first presented by (187)]; grunts of hamadryas baboon (P. hamadryas) subject SI (Fig. 3B) (102); threat calls from rhesus macaque (M. mulatta) subjects 241-95 (Fig. 4) (43); eagle (blue circle) and leopard (blue dot) alarm calls of male Diana monkeys (Cercopithecus diana) [Fig. 1A of (88), data collected for (45)] single exemplar of rhesus macaque (M. mulatta) girney, with lines connecting formant values affected by lip and jaw modulation (Fig. 1G) (85); single exemplar of mongoose lemur (E. mongoz) alarm long grunt, with lines connecting shifting formant values (Fig. 1A) (186). The convex hull (black line) encloses all the primate vocalizations shown, thus representing the collected vowel space of primates as documented to date, and is subject to enlargement in future studies.

Stay Connected to Science Advances

Navigate This Article