Research ArticleECOLOGY

Time-space–displaced responses in the orangutan vocal system

See allHide authors and affiliations

Science Advances  14 Nov 2018:
Vol. 4, no. 11, eaau3401
DOI: 10.1126/sciadv.aau3401


One of the defining features of language is displaced reference—the capacity to transmit information about something that is not present or about a past or future event. It is very rare in nature and has not been shown in any nonhuman primate, confounding, as such, any understanding of its precursors and evolution in the human lineage. Here, we describe a vocal phenomenon in a wild great ape with unparalleled affinities with displaced reference. When exposed to predator models, Sumatran orangutan mothers temporarily suppressed alarm calls up to 20 min until the model was out of sight. Subjects delayed their vocal responses in function of perceived danger for themselves, but four major predictions for stress-based mechanisms were not met. Conversely, vocal delay was also a function of perceived danger for another—an infant—suggesting high-order cognition. Our findings suggest that displaced reference in language is likely to have originally piggybacked on akin behaviors in an ancestral hominid.


Language is uniquely human, but the natural world teems with examples of remarkable communication features, such as semantics (1, 2), syntax (3, 4), cultural transmission of signals (5, 6), signal learnability (7, 8), prevarication and deception (9, 10), arbitrariness (11, 12), and audience effects (13, 14). Considerable research effort has been dedicated to the primate order, a clade that pulls together many of the features that compose language and wherein language ultimately evolved.

A feature that has not been naturally observed among our primate relatives, however, is displaced reference—the capacity to transmit information about “things that are remote in space or time (or both),” as defined by Hockett (15), or in other words, about things spatially or temporally absent at the moment of communication. Displaced reference is a ubiquitous and universal feature across all the world’s languages and one of its fundamental hallmarks (15). Vervet monkeys occasionally produce alarm calls in the absence of predators, but these calls have been interpreted as cases of tactical deception, not of displaced reference (16). Moreover, alarm calls in various monkey species can be produced in response to stimuli other than predators (17), can trigger different responses in receivers (18), and alternately refer to predators or trigger movement not associated with antipredator behavior (19). Given a multitude of interpretations, researchers have been unable to convincingly attribute displaced reference to this and other cases in primate calls (16).

Some cases of insect communication (20), most notably the honeybee waggle dance (i.e., informed foragers at the hive signal the direction and distance to resource patches) (21), do qualify as displaced reference, but they represent a case of functional convergence governed by different cognitive processes than human displaced reference. Unlike monkeys, captive great apes have demonstrated the required faculties for displaced reference (e.g., through referential pointing), but always under human priming (22, 23) and never in the wild between conspecifics. This evidence is, however, imperative to understand how displaced reference could have evolved within our lineage in the absence of full-fledged language. Here, we systematically investigated the response of wild orangutan mothers to predator models. Previous studies using predator models have reported alarm calling (in lemurs, monkeys, and great apes) in the presence of predator models (9, 2427), but orangutans delayed alarm calling until the predator model had moved out of sight.

Given the lack of primate phylogenetic clues for the emergence of displaced reference in natural conditions, two possible evolutionary scenarios exist: It arose de novo, the result of one key mutational event exclusive to humans, or it arose through exaptation, an emergent property resulting from numerous “lesser” features that converged and interlocked within the hominid lineage into a coherent and ever-more powerful and versatile communication system. Here, we present data on the observation of a unique communication feature akin to displaced reference in a wild great ape—Sumatran orangutans (Pongo abelii)—that may shed new light on its emergence as one of the cornerstones of language.


We presented four predator models (tiger, patterned, spotted, and white) to seven adult female wild orangutans. Each predator model was presented to the female for 2 min and then removed. Two of the orangutans were each presented with two models, whereas the other five were presented with each of the four models (table S1). Of the overall 24 presentations, 12 failed to elicit any vocal response (Fig. 1 and table S1), and 12 resulted in temporarily suppressed reactions (alarm calling), with an average latency to giving the first call being 420 s (SD = 349.5 s) or 7 min (SD = 5.825 min) (Fig. 1). The maximum observed delay was 1189 s (>19.5 min) by one orangutan for one of the models (patterned) (Fig. 1). Deducting the 2 min when the model was visible, this translates to up to 17 min of displacement in the time between the predator’s presence and the first vocal response (alarm call).

Fig. 1 Vocal delay by Sumatran orangutan females.

Survival plot showing the delay of subjects’ vocal onset after presentation of predator models at time zero (A) and correlational plots between alarm probability and (B) subject height up the understory and (C) infant age. Shaded areas represent 95% confidence intervals. Yellow band denotes exposure time to predator model.

Survival analyses indicated a significant positive effect on the probability of a vocal response with experimental height (Cox proportional hazards model, z = 2.09, P = 0.037; Fig. 1B). The closer the predator model was to the subjects at first sight, the lower the probability of a vocal alarm, and by extension the longer the delay of their vocal alarm (Fig. 1B). This relationship showed that stimulus inadequacy was an unlikely cause for the lack of responses in half of our experiments—the absence of response occurred when the predator was the closest and, thus, the most perceptible. Individual analyses confirmed that absence of response was not associated with closer distances to the predator model (fig. S1), suggesting that it was the result of vocal suppression in at least five of our seven subjects. At the same time, we observed a significant negative effect with infant age (z = −2.12, P = 0.034; Fig. 1C), indicating that vocal delay was a function of the youngsters’ age, namely, with younger infants eliciting more probable and faster responses (Fig. 1C). We observed no effect of female age (z = 1.54, P = 0.12). The striped tiger model elicited more responses than the remaining models (table S1), but across all experiments that elicited a vocal response, the fictitious patterned and spotted models elicited longer vocal displacement (fig. S2).

There was no reason for orangutan females to vocalize after a predator model was removed, but they did so nonetheless. One proximate explanation is that subjects were “petrified” by fear. This does not, however, offer an adequate explanation for the observations for at least four reasons. First, if fear was the prevalent factor driving vocal alarm, but was too overwhelming to trigger the vocal response instantaneously, then female age ought to correlate with alarm probability, with older females responding more often and quicker, as they would have presumably more experience with encountering predators and responding to them with an infant under their care. We did not observe this effect. Second, if subjects were “frozen” in fear, then one would expect a delay similar to vocal displacement across other behaviors that could be recruited for a potential response. Data on the movement of subjects show, however, a gradual escape to higher canopy levels immediately after the first sight of the predator (Fig. 2). Although experiments started when subjects were relatively low in the canopy (see Methods), it would be highly unlikely that, across 24 observations of 1 hour, all subjects happened to start an ascent of more than 10 m at time zero, regardless of their absolute height or whether they engaged in vocal alarm. These observations corroborate, thus, that the observed canopy movement was a response after sighting the model. Third, if fear was the predominant force behind the females’ vocal alarms, then one would predict a correlation with the duration of the alarm response after the females vocalized following their immediate shock. However, vocal delay did not affect subsequent vocal engagement (Fig. 3), with several individuals alarming beyond 3000 s (>50 min) after first sighting the predator model and initial delay. Average call duration was 1519.2 s (>25 min) after an average delay of 420 s, demonstrating that the visual presence of the predator model was functionally mandatory neither for vocal alarm onset nor for sustained vocal alarm after onset. Fourth, if physiological stress was the main reason for the subjects’ vocal alarm, then a similar pattern should be observed in other physiological behaviors associated with high levels of stress. The incidence of these behaviors, namely, urinating and defecating, did not differ, however, between experiments when females did not alarm call versus when they displaced their alarm calls (Fig. 4). If overpowering stress was the determinant factor driving vocal delay, then completely distinct patterns in physiological stress behaviors should be expected between the two types of responses, but none were apparent (Fig. 4B).

Fig. 2 Graphic representation of subjects’ ascent up the canopy during the 60 min after the first sight of the predator model.

Yellow band denotes exposure time to the predator model.

Fig. 3 Bar plot showing the vocal response delay and duration after presentation of predator models at time zero.

Yellow band denotes exposure time to the predator model.

Fig. 4 Density plots of the incidence of stress behaviors (i.e., urinating and defecating) during the 15 min after the first sight of the predator model.

(A) Count (density*n) of stress behaviors. (B) Scaled density estimate (maximum of 1) of stress behaviors, enabling a better visual comparison between the patterns of physiological stress between the two types of response by the subjects.

To conceptually explain the observed vocal delays seems problematic, therefore, without contemplating the mental capacity to entertain the notion or memory of a(n encounter with a) predator and/or the capacity to wage response timing. The results directly suggest this to be the case. Notably, the significant effect of infant age on vocal alarm probability by the mother (with younger infants eliciting more probable and faster responses) indicates that the decision to call or not to call—even after the cause was long gone—derived, in part, as a measure of perceived danger for others. Because vocal alarms inherently reveal a subject’s presence and position, females appear to delay their response to minimize the perceived possibility of a predator attempting a direct assault, particularly in the presence of an unweaned infant.

Although the alarm calls of some social primates can function to deter predators (“I am seeing you!”) (28), only 1 of 24 of our experiments elicited a vocal response while the predator model was still visible (Fig. 1A). This and other predator harassment behaviors (e.g., mobbing) are dangerous and can result in fatal injuries (29). Primate alarm calls in other species can also be directed to groupmates and kin living in close proximity, as is the case of langurs living in sympatry with orangutans (30). In orangutans, however, there is no need to alarm call for others, as they are mostly solitary, with the obvious exception of orangutan mother-offspring dyads. Although orangutan alarm calls can be heard up to 300 m away, no conspecific approached the caller even when they continued to call for more than an hour after seeing the predator model. The reproductive rate of the orangutan is the slowest of any of the primates (31, 32), and it is, moreover, semisolitary and diurnal. It may be that, when detecting a predator, the best option for a mother with an infant is to let it pass by without drawing its attention instead of potentially incurring whatever risks and fitness costs that would result from calling unwanted attention upon herself and her offspring. This would also help justify why females made a relatively gradual, instead of sudden, ascent in the canopy once they sighted the predator model (Fig. 2).

Why, then, did females alarm call at all? It is possible that orangutan mothers felt safe while seeing the predator model but unsafe once they knew that the predator model was nearby but hidden. Forest felids are typically ambush hunters and may therefore pose more risk when they are out of sight than when visible. Although this modified version of the deterrence hypothesis (“I have seen you, wherever you are!”) could explain the time-space displacement of alarm responses by the nulliparous female, it still cannot explain why mothers with younger offspring were more likely to call than those mothers with older offspring. Sumatran orangutans exhibit the longest interbirth intervals of any primate, with infants staying up to 9 years with their mothers (31). This extended period facilitates and assures the transmission of forest skills for survival (33). In the context of predation, our observations suggest that if mothers fully suppressed their vocal alarm responses, then the infant would unlikely have the opportunity to ever learn from safety that such an encounter was dangerous. Vocal displacement seems to be therefore the result of a balance by the mother between minimizing the risk of detection by the predator on the one hand and providing information to their infant about predation on the other hand. Because older infants have putatively accumulated more experience with predator encounters, the mother can change her behavior in favor of risk prevention, further delaying her vocal response. This view implies that the cognitive feat of vocal displacement is as much of the mother as it is of the infant (at least while she is still naïve), as she too must link her mother’s vocal alarm with an absent referent.

Our explanation for mother-offspring information transfer based on displaced reference is, admittedly, merely a working hypothesis requiring further empirical verification. However, making the association between the predator and the displaced alarm call might not be as far-fetched as the classical learning theory (and its law of contiguity) presupposes based on multiple lines of evidence. First, orangutan infants as young as 3 years (the youngest infant in our sample was 5 years old) engage in triadic interactions that include shared perception and goals concerning an outside entity (34). Second, besides the mother’s alarm call, infants probably also use multimodal information about the predation event, including the mother’s gaze direction, muscular tension, body kinetics, and/or posture. Third, the low probability and high unpredictability of encountering mid- to large-sized felid predators, which live at low densities in large ranges, are likely to make an encounter and its corresponding response by the mother an unusual event. Considering that unusual and emotionally laden events are more memorable than usual and neutral events, it is conceivable that orangutan youngsters could make this association (35, 36). Fourth, a growing body of evidence for episodic-like and event memory in nonhuman primates (37) and human infants [who, at the age of 17 months, remember temporally ordered sequences of events and actions after a delay period of 6 weeks (38)] suggests that displaced reference might enable learning in young orangutans.

The lack of evidence for displaced reference in orangutans, and great apes more generally, may not reflect a lack of cognitive capacity but a restricted research focus. For instance, wild flanged orangutan males advertise future travel direction 1 day in advance through long calls that facilitate associations with females (39). Long calls are designed to function as efficient tags of male identity across long distances in the forest (40), making it unlikely that males produce long calls to refer to an outside entity or event. However, rival males are capable, nevertheless, to infer from another male’s long call his presence at specific locations in the future and adjust their behavior accordingly (39). From the receivers’ perspective, this capacity might be analogous to what infant orangutans might do to process time-space–displaced alarm calls produced by their mothers, as reported here. Where infants are required to extract information regarding something in the past, rival adult males do so regarding the future. The cognitive machinery necessary for understanding displaced reference might be present, and in use, in the wild. Because vocal motor control in great apes is not a limiting factor (6, 8, 41, 42), the detection of displaced reference seems largely constrained by the number and type of events that human observers require to detect it, at least more so than by the cognitive and motoric demands of the behavior.

Together, our findings suggest that some form of high-order cognition underpins vocal displacement in Sumatran orangutans, since mother responses are evidently mediated by third-party factors and not simply by a physiological reflex toward a fitness-heavy hazard. In human neurophysiology, involvement of high mental faculties is deduced when reaction times to stimuli are delayed in the order of hundreds of milliseconds (43). Orangutan vocal delay operated thus at an entirely different time scale than reflexive stimuli responses, which run at four orders of magnitude faster. Great apes show remarkable memory capacities (44, 45), advanced communication behavior (e.g., vocal and gestural) underpinned by sophisticated socio-cognition (25, 46) and corresponding apt motor control (41, 42), as well the metaunderstanding of third-party actions (47), with orangutans outperforming other nonhuman primates in social inhibition and behavioral flexibility (48). Together, this cognitive machinery seems to offer a solid cognitive platform to produce vocal responses displaced in time-space by thousands of seconds.

Vocal displacement, as observed here for the first time in a wild great ape, is also related to, but distinct from, common communication features found in the natural world, including nonhuman primates, such as vocal suppression, vocal usage learning, and audience effects. However, none of these features explain why vocalizations did not occur in the presence of the predator model. Postponing behavior in time and space inherently expresses a role of high cognitive processing of the stimulus (43) and general intelligence (49). Our observations, thus, suggest a scenario for language evolution in hominids wherein common communication features transmuted into “higher” forms when combined with advanced cognition capacities. Many such features are present in, and shared with, great apes—long-term memory, intentional communication, fine laryngeal and articulatory motor control, incipient theory of mind—including the capacity of transmitting information about something absent in time-space.


Data collection

Seven females with known reproductive history (one nulliparous, two primiparous, and four multiparous) were tested, composing the total female resident population of the Ketambe forest block (3°41′N, 97°39′E, June 2010 to March 2011, Aceh, Sumatra, Indonesia), all of which were highly habituated to the presence of human observers—a condition unique to this site in the whole island resulting from continuous observations for nearly 40 years. Subjects were presented with single pseudorandomized exposures (to maximize the neutralization of order effects, that is, no two subjects were presented with the same order of models, and any specific model was as equally probable to have been presented first, second, third, or fourth to the maximum extent possible) of four predator models—a human experimenter walking on all fours along the forest floor draped over with a sheet with one of four different types of print: tiger patterned [a natural predator historically known to predate on orangutans at this site (50)], color patterned (abstract pattern), white with multicolored spots, and plain white. Forest felids are silent ambush hunters. Accordingly, during the design of this study, we thought that predator playbacks would not necessarily constitute a more realistic simulation than an actual predator model on the forest floor conspicuously presented in front of the subjects. All subjects were tested individually (see the Supplementary Materials). When the subject was between 5- and 20-m height in the forest understory, feeding, resting, or slowly moving, and having exhibited no vocal behavior or having encountered no conspecific in the previous 5 hours, the model moved past in front of the subject on the forest floor. The model halted for 2 min once the subject viewed it, after which the model moved out of sight. Experiments were never conducted in the same location in the forest to avoid habituation. Each individual was exposed only once to each model, and models were always presented to each individual at least 5 days apart. All calls produced by the subjects after the presentation of the predator model were alarm calls typically produced under conditions of moderate to acute danger (51).

Upon sighting the predator model, subjects typically halted the activity in which they were engaged and exhibited signals of distress, such as urinating and defecating, while monitoring the forest floor while the predator model was present and after it had already left. These behaviors indicated that the subjects had effectively seen the model, allowing the experiment to proceed, even when there was no vocal response.

Data analysis

To uncover potential correlates of vocal delay, we conducted survival analyses using survminer (52) and coxme (53) in R (54). We included three independent factors. Subject height at time zero was used as a measure of predator model proximity at the moment of predator model exposure and, thus, threat level. Female age was used as a proxy of number of previous encounters with predators and motherhood experience. The nulliparous female in our sample was the youngest female, the two primiparous females were the second and third youngest, and the four multiparous females were older. Infant age was used as a proxy of infant vulnerability, motoric competence, and number of previous encounters with predators. We z transformed infant age because infant age for the nulliparous females was zero and not meaningful. Subject ID and model type were inserted as random effects (i.e., each was experimentally tested repeatedly). The correspondent R code used was thusEmbedded Image


Supplementary material for this article is available at

Table S1. Distribution of the different predator model experiments per subject, with name of respective infant in brackets.

Fig. S1. Distribution of experimental height for each subject per experiment that produced response or no response.

Fig. S2. Vocal displacement per predator model for experiments that elicited vocal response.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We thank the Indonesian Ministry of Research and Technology (RISTEK), the Indonesian Directorate General of Forest Protection and Nature Conservation (PHKA), the Gunung Leuser National Park (TNGL), and the Leuser Ecosystem Management Authority (BPKEL) for authorization to carry out research in Indonesia; the Universitas National (UNAS) for supporting the project and acting as counter-partner; the Sumatran Orangutan Conservation Programme (SOCP)–PanEco Foundation and Serge Wich for logical support; C. P. A. Hall and M. E. Hardus for their hard work in the field; R. Moro Rios and S. Townsend for useful discussions at the early stages of the manuscript; and two anonymous reviewers and the editor A. Rylands for their comments and suggestions. Funding: The study was funded by the European Union’s Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie grant agreement no. 702137 attributed to A.R.L. Ethics statement: Our study was approved by the Indonesian Ministry of Research and Technology (RISTEK), the Indonesian Directorate General of Forest Protection and Nature Conservation (PHKA), the Gunung Leuser National Park (TNGL), and the Leuser Ecosystem Management Authority (BPKEL) and complies with the ethics’ guidelines of the Sumatran Orangutan Conservation Programme (SOCP) and PanEco Foundation. Author contributions: Study design: A.R.L.; data collection: A.R.L.; data analysis: A.R.L.; responsibility for validity and correctness of the figures: A.R.L.; and writing of the manuscript: A.R.L. and J.C. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.
View Abstract

Navigate This Article