Research ArticleHUMAN EVOLUTION

Monkey vocal tracts are speech-ready

Science Advances  09 Dec 2016:
Vol. 2, no. 12, e1600723
DOI: 10.1126/sciadv.1600723
  • Fig. 1 Methodology for constructing a single vocal tract configuration.

    (A) We first made x-ray videos of monkeys and extracted still images of various vocal tract configurations (the example shown is a macaque producing a threat call). (B) We then traced the vocal tract outlines. (C) We used custom Matlab scripts to extract the diameter of the vocal tract along the glottis-to-lip midline (medial axis transform), straightened this diameter function, and converted it to a vocal tract area function. (D) Finally, the resultant area function was used to compute the vocal tract transfer functions for the observed vocal tract configuration [using Flanagan’s lossy tube model (39)], and the first three formant frequencies (F1, F2, and F3) were extracted via peak picking. The set of all 99 vocal configurations, each computed in this fashion, was then used to estimate the monkey’s phonetic space.

  • Fig. 2 Attested macaque monkey formant space.

    Formant plot (F1-F2) for all 99 observed monkey vocal tract configurations, enclosed in a convex hull to show total phonetic space, with corresponding tracings of extreme vocal tract configurations. Eating-related outlines (open circles) are all the outlines in which a food item (banana or orange slice, grape or raisin) was involved. Facial expressions (gray dots) are yawns and various lip smacks. Vocalizations (black dots) involved production of sound through coos and grunts.

  • Fig. 3 Formant plot comparisons.

    Macaque monkey (black dotted line) versus human female vowel space [red dashed line; American English from Peterson and Barney (18)]. The left panel shows F1 against F2, and the right panel shows F1 against F3. For comparison, the blue outline shows the previous macaque monkey estimates from Lieberman et al. (6).

  • Fig. 4 Spectral comparison.

    Spectrograms of original speech (the English phrase “Will you marry me?,” spoken by a human female) (A) and a synthesized version of the same phrase using formant values available, based on observed vocal tract configurations, to a macaque monkey (B).

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/2/12/e1600723/DC1

    audio file S1. Audio file of an adult human female saying “Will you marry me?,” resynthesized with a noisy source.

    audio file S2. Audio file of our macaque vocal model uttering the same phrase “Will you marry me?,” synthesized with the same noisy source.

    fig. S1. Monkey MRIs used to estimate the conversion factor from linear midsagittal diameter measurements to two-dimensional areas.

    fig. S2. Comparison of human female vowels in Dutch, which has vowels not present in English (red dashed line), with the macaque vocal tract model (gray dotted line).

  • Supplementary Materials

    This PDF file includes:

    • Legends for audio files S1 and S2
    • fig. S1. Monkey MRIs used to estimate the conversion factor from linear midsagittal diameter measurements to two-dimensional areas.
    • fig. S2. Comparison of human female vowels in Dutch, which has vowels not present in English (red dashed line), with the macaque vocal tract model (gray dotted line).

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • audio file S1 (.wav format). Audio file of an adult human female saying “Will you marry me?,” resynthesized with a noisy source.
    • audio file S2 (.wav format). Audio file of our macaque vocal model uttering the same phrase “Will you marry me?,” synthesized with the same noisy source.

    Files in this Data Supplement:

More Like This