(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Humans and great apes visually track event roles in similar ways [1]
['Vanessa A. D. Wilson', 'Department Of Comparative Cognition', 'Institute Of Biology', 'University Of Neuchatel', 'Neuchatel', 'Department Of Comparative Language Science', 'University Of Zurich', 'Zurich', 'Center For The Interdisciplinary Study Of Language Evolution', 'Sebastian Sauppe']
Date: 2024-12
Abstract Human language relies on a rich cognitive machinery, partially shared with other animals. One key mechanism, however, decomposing events into causally linked agent–patient roles, has remained elusive with no known animal equivalent. In humans, agent–patient relations in event cognition drive how languages are processed neurally and expressions structured syntactically. We compared visual event tracking between humans and great apes, using stimuli that would elicit causal processing in humans. After accounting for attention to background information, we found similar gaze patterns to agent–patient relations in all species, mostly alternating attention to agents and patients, presumably in order to learn the nature of the event, and occasionally privileging agents under specific conditions. Six-month-old infants, in contrast, did not follow agent–patient relations and attended mostly to background information. These findings raise the possibility that event role tracking, a cognitive foundation of syntax, has evolved long before language but requires time and experience to become ontogenetically available.
Citation: Wilson VAD, Sauppe S, Brocard S, Ringen E, Daum MM, Wermelinger S, et al. (2024) Humans and great apes visually track event roles in similar ways. PLoS Biol 22(11): e3002857.
https://doi.org/10.1371/journal.pbio.3002857 Academic Editor: Leyla Isik, Johns Hopkins: Johns Hopkins University, UNITED STATES OF AMERICA Received: April 18, 2024; Accepted: September 20, 2024; Published: November 26, 2024 Copyright: © 2024 Wilson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Data and code are available at
https://osf.io/47wap/?view_only=8c2b20667fc441178269291fda5262bf. Interactive figures are also available at
https://dataplatform.evolvinglanguage.ch/eventcog-eyetracking/. Funding: The National Center for Competence in Research "Evolving Language" (SNSF agreement number 51NF40_180888, B.B., K.Z., M.M.D.):
https://evolvinglanguage.ch/ Swiss National Science Foundation (project grant numbers 310030_185324, K.Z):
https://www.snf.ch/en Swiss National Science Foundation (100015_182845, B.B.):
https://www.snf.ch/en The National Center for Competence in Research "Evolving Language" Top-Up Grant (grant number N603-18-01, V.A.D.W., K.Z., B.B., M.M.D.):
https://evolvinglanguage.ch/ Foundation for Research in Science and the Humanities at the University of Zurich (grant number 20-014, V.A.D.W., S.S., B.B.):
https://www.research.uzh.ch/en/funding/researchers/stwf.html Seed money grant, University Research Priority Program “Evolution in Action”, University of Zurich (S.S.):
https://www.evolution.uzh.ch/en.html Jacobs Foundation (S.W., M.M.D.):
https://jacobsfoundation.org/ Swiss National Science Foundation (grant number PZ00P1_208915, S.S.):
https://www.snf.ch/en The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.
Introduction Language is considered unique to humans, a distinction which leads to the prevailing question of how it has evolved. An empirical strategy has been to identify the cognitive mechanisms that language relies on and to reconstruct their evolutionary history using comparative research with humans and other animals. One important cognitive mechanism is the propensity for speakers and listeners to decompose events into causally structured agent–patient relations [1]. For example, a sentence like “Alice picked up the caterpillar” has Alice as the agent and the caterpillar as the patient. This distinction is deeply entrenched in meaning, neuroanatomically detectable [2] and responsible for core syntactic phenomena, such as case marking or constituent hierarchies [3], with only very few exceptions across the world’s languages [4]. Furthermore, languages privilege agent over patient roles, preferring the simplest and least specific marking for them (e.g., nominative case) [5], even though this often incurs ambiguity and additional neural activity during sentence planning [6,7]. Correspondingly, agents tend to be named before patients [8,9], a trend only matched in sentence structure by a concurrent preference for placing reference to humans before reference to inanimate things [10]. These biases in linguistic expression build on resilient mechanisms in human event cognition [11]. For example, when apprehending the gist of events from still pictures, people tend to be quicker to identify agents than patients [12] and assign agency almost instantly and with remarkably little variation across cultures and languages [13]. Early agent identification is typically followed by distributed attention between agents and patients in a processing stage known as “relational encoding” during early sentence planning [6]. The same resilience is also apparent in comprehension during sentence processing. When sentences violate expectations about agent and patient roles, for example, when it turns out that a noun referred to a patient instead of an agent, neurophysiological measures indicate that agency is usually assigned as the initial default, even when this goes against usage probabilities and the rules of grammar. This is evidenced by a negativity in event-related potentials (the N400 effect) when ambiguous sentences are resolved towards patient-before-agent order, violating comprehenders’ expectations about the primacy of agents, which accordingly should be expressed first (similar to: The girl … was chased by the dog) [5,14–16]. Together, these findings suggest that language builds on a universal neurocognitive mechanism of event decomposition to make sense of the world and its linguistic representations. This raises the question of how human event cognition has evolved. We are not aware of any evidence in the animal communication literature that demonstrates that signals can refer to agent–patient interactions, neither in natural communication nor with artificial languages [17]. One hypothesis, therefore, is that nonhuman animals (from here on, “animals”) do not possess the cognitive resources for decomposing events into agents and patients. Certainly, animals can comprehend aspects of physical causality (e.g., that pushing causes falling) [18]. Still, it is unclear whether this is due to perceptions of simple co-occurrences or more complex perceptions of events as agent activities causing patient changes. Related to this, although there is little doubt that animals perceive the participants of events, their attention may be absorbed by the participants’ social attributes, such as their identities, social roles [19], or behavioral intentions [18,20], all of which predict large situational and individual variation in how events are processed. The alternative hypothesis, to be tested here, is that animals are capable of human-like event decomposition [1], but do not have the motivation or the resources to communicate about agent–patient relations. To explore this, we tested how participants across closely related species of hominids perceived a range of naturalistic events that would elicit causal processing in humans. We compared gaze responses to short video clips between members of the 4 genera of great apes—humans (Homo sapiens), chimpanzees (Pan troglodytes), gorillas (Gorilla gorilla), and orangutans (Pongo abelii) (see S1 Fig for ape setup). We also tested human infants at 6 months old, before they start to actively use language and while still developing linguistic processing abilities. By this age, infants already show an impressive cognitive toolkit: they are sensitive to goal-directed actions and agency [21], track changes in goal-directed behavior [22], and extract key information from video stimuli to understand events [23]. At the same time, young infants struggle to process goal predictions [24], and neural selectivity of third-party social interactions does not begin to emerge until after 9 months [25]. As infants develop, language and event perception become increasingly intertwined, as documented by the way verbs and actions are processed [26]. Currently, our understanding of agent-privileged event cognition in humans rests mainly on paradigms that use static stimuli, which are often artificial or overly simplistic and do not reflect the complexity of real-life interactions. In this study, we used dynamic scenarios across a broad range of natural events to compare overt visual attention to agents and patients as the actions unfold. Scenarios were presented as N = 84 short (2 to 10 s long), silent color video clips, depicting animate agents and both animate and inanimate patients of (unfamiliar) humans, chimpanzees, gorillas, and orangutans engaged in natural interactions. This included scenes such as grooming, play, eating and object manipulation among apes (both in zoos and the wild), and helpful and mildly agonistic interactions, as well as object manipulation, among humans; the latter were all filmed in the same setting with a primarily white backdrop. Further details can be found in S2 Table and in video examples on the OSF repository (
https://osf.io/47wap/?view_only=8c2b20667fc441178269291fda5262bf). All participants saw the same stimuli. When creating the videos, we deliberately avoided rigidly controlling for low-level perceptual features, as this would have created sterile footage with low socio-ecological validity [27] and, critically, reduced interest for ape participants. Instead, we presented scenes that sought to capture much greater variation of real life. Possible confounding factors, such as differences in the amount of agent motion or relatively larger sizes of agents or patients between videos, were accounted for in the statistical models (see S1 Methods and S5 and S6 Figs). We predicted that event roles would be necessary to interpret the event scenarios—that is, the distinction between agent and patient is needed to explain gaze distribution to the individuals depicted. For human adults, we expected to see early and overall agent biases, consistent with previous findings, but with attention patterns mediated by the progression of the action rather than the need to extract agent–patient information rapidly, as in brief exposure studies [13]. We predicted that if event decomposition were a general feature of great ape cognition and present without language, then visual attention should not differ across the 4 species. In particular, we predicted earlier attendance to agents than patients, in line with the privileged status of agents in language and gist apprehension in still pictures. Alternatively, if event decomposition were uniquely human—or dependent on language—we expected to find this pattern only in adult humans, and large and random variation in the nonhuman primates, which would likely depend on low-level features such as color or contrast of the stimuli. Regarding the infants, if event-role decomposition required experience gained through observing third-party interactions, we expected to see differences in how they attended to agents and patients compared with adults.
Discussion Evolutionary theories of syntax have focused mainly on how formal complexity has emerged [29–31], whereas the underlying cognitive mechanisms have rarely been addressed. Here, we tested a cognitive hypothesis, which proposes that central aspects of human syntax, such as case-marking or constituent hierarchy, build on a prelinguistic cognitive mechanism that decomposes events into causally structured agent–patient relations [1]. To test this, we exposed apes to stimuli that elicit causal processing in humans and compared the gaze patterns between humans and nonhuman apes. Participants across species tracked events in strikingly similar ways, focusing on the action between agents and patients in a manner reminiscent of relational encoding for planning to speak [6]. This finding suggests that apes, like human adults, decompose the causal agent–patient roles depicted. The only noticeable difference was that nonhuman apes showed more visual exploration of background information than human adults, perhaps due to differences in experience with watching and interpreting videos or higher intrinsic interest in scanning the larger environment. This is reflected in S2 and S3 Figs, where human adult attention to agents and patients is more pronounced, because these results do not account for attention to “other” information (S4 Fig). Notably, apes’ looking behavior showed more similarity to human adults than did human infants. If apes were unable to track agent–patient relations, we would expect attention patterns similar to those seen in infants. These observations are compatible with the interpretation that event decomposition did not emerge as a unique form of human cognition together with language. Rather, our findings comprise another piece of evidence suggesting a cognitive continuum between humans and nonhuman apes, albeit in a novel cognitive domain. Unexpectedly, across all event categories, in neither humans nor great apes did we find a strong bias towards agents (with the exception of food scenarios). This is in contrast to findings from a large body of previous research using static images, as well as a recent comparative study examining event role preferences [32]. This difference is likely due to the nature of the task. Previous studies asked for rapid decisions between roles [13,32], similar to when listeners have to come up with on-the-fly predictions on roles while processing a sentence [33,34]. It is likely that an agent bias manifests itself primarily under these high-demand conditions, while it is not as relevant when watching an event that unfolds over time [35]. Additionally, unlike some previous studies [12,36] although see [13], we controlled for size and movement of the areas of interest, as well as event type; size of the agent or patient, in particular, has a strong effect on gaze probability (S5 Fig). Curiously, when considering gaze differences between event categories, we found the strongest agent bias in video scenes depicting interactions with food. One possible explanation is that, in social species, attending to agents who have access to food provides a survival advantage. This points to social learning as a possible precursor for semantic role attribution. An intriguing possibility requiring further studies, therefore, is that the agent bias reported elsewhere has its roots in trophic interactions. A more parsimonious explanation could be that agency bias is strong in these scenes because food items are less interesting as patients compared to objects. Notably, since size of agents and patients was accounted for in the models, size cannot explain the gaze bias that we found towards agents. Further research is needed to differentiate these different possibilities and to more systematically explore different degrees and kinds of cognitive pressure in event cognition and different ecological contexts of events. In contrast to human adults, the 6-month-old infants attended to agents and patients with very low probability. One explanation may be that while infants this young show sophisticated perception and interpretation skills of causality in social interactions in visually and cognitively uncomplicated material [37,38] and with previous familiarization or habituation (i.e., learning) phases, they may still have trouble detecting causality in complex visual material [39], especially in social interactions and with no prior controlled learning phase. Indeed, our stimuli were more complex than typically used in infant studies. Also, they were not directed towards the infants and did not include any communicative information towards the observing infant [38]. However, these were chosen to reflect real-life diversity of events. Key ingredients, such as event parsing [40], causal integration across scenes [24], and triadic awareness [41], are known to develop gradually during the first 12 months, suggesting that our content was too challenging and probably too alien to their existing world experience. Another plausible hypothesis is that processing of dynamic natural scenes requires computational resources and oculomotor control not yet sufficiently developed at this age [42]. As a consequence, integration and interpretation of relevant information is more time-consuming [22] or just not yet possible. Given that event categorization relies on neurally constructed models, which are updated with experience [43], it is likely that, at this age, infants are still developing the event models that will guide their attention to scene information. In sum, our study demonstrates that nonhuman great apes and human adults show similar looking behavior towards agent–patient interactions, consistent with the notion of a shared underlying cognitive mechanism. The fact that infants show a significantly different looking pattern than both human adults and apes suggests that proficiency in language is not driving the observed looking pattern. Earlier explanations that relied on human–animal morphological differences in vocal tracts [44], lack of declarative communication [45], or lack of call composition [46] no longer stand. Our results add to the shared cognitive foundations of language by suggesting that event decomposition, a foundation of syntax, evolved before language, on par with signal combinations [47], theory of mind [48], and joint commitment [49]. What has happened then during human evolutionary history that allowed us to map event roles into verbal expressions? We can think of three probably interlinked evolutionary transitions that may have paved the way from primate-like communication to human language: (a) changes in social cognition; (b) changes in communicative needs; and (c) changes in expressive power. Regarding social cognition, the key step may have been to externalize event cognition through language, by moving from implicit to explicit attributions. For example, compared to chimpanzees, adult humans attend more to an agent’s face following an unexpected action outcome, as if explicitly trying to discern the actor’s mental state [50]. Exploring how apes attend to detailed social scenarios could help to understand the differences that led to this next step in humans. Regarding communicative needs, one argument is that increased levels of cooperation and coordination brought about increased communicative needs, a convergent evolution independent of wider cognitive evolution [51,52]. Comparative studies that delve deeper into the emergence of cooperative behavior could help to further elucidate this hypothesis [53]. Finally, regarding expressive power, modern humans roughly have 3-fold larger brains than chimpanzees with vastly more computational power, allowing for processing of more varied signal structures. Although there are examples of limited compositionality in animal communication, there is no evidence for free variation and creative use [47]. The study of ape communication, however, continues to reveal new insights, which should seek to confirm the degree of difference in flexible communication compared to humans. Testing these hypotheses may provide further answers in the quest for the origins of language by better understanding why nonhuman apes do not communicate in the same way as humans do, despite an increasingly closing gap with human cognitive abilities.
Methods Study methods are described in full in the Supporting information (S1 Methods). Ethics statement Ethical approval for the ape research was provided by the Canton of Basel Veterinary Office (approval numbers 2983 and 3077) and by the Animal Welfare Officer at Basel Zoo. All apes participated voluntarily. They were not separated from their group during testing, nor were they food or water deprived, and could leave at any time. They were rewarded for participation with diluted sugar-free syrup, provided in restricted quantities as approved by the zoo’s veterinary team. Ethical approval for the human research was approved by the local ethics committee of the University of Zurich (approval numbers 18.10.9 and 21.9.18) and performed in accordance with the ethical standards of the 1964 Helsinki Declaration and its later amendments. All adult participants or the infants’ caregivers gave informed written consent before data collection. After completing the task, adult participants received CHF 20 and infant participants were rewarded with a certificate and a small present worth approximately CHF 5.
Acknowledgments For support and assistance at Basel Zoo, we thank Adrian Baumeyer, Fabia Wyss, Raphaela Heesen, Stephan Lopez, Gaby Rindlisbacher, Rene Buob, Roland Kleger, Jonas Schaub, Nicole Fischer, Reto Lehmann, Corinne Zollinger, Markus Beutler, Dominic Hohler, Patrick Wyser, Flurin Baer, Amanda Spillmann, Stephan Argast, and the technician team. We also thank Carla Pascual for help with data collection. For providing footage from apes, we thank Emily Genty, Cat Hobaiter, Jennifer Botting, Erin Stromberg, and Atlanta Zoo, as well as Zurich Zoo for allowing us to film their apes. We thank Sara I. Fabrikant and Tumasch Reichenbacher for providing their eye-tracking laboratory for the human adult data collection and Sina Nägelin, Nina Philipp, and Deborah Lamm for collecting the data from human adults and infants, as well as Marco Bleiker for technical assistance with infant data collection. We thank Sebastien Quigley and Carla Pascual for assistance in data processing. We also thank Shreejata Gupta, Christopher Krupenye, and Josep Call for their advice on establishing the eye-tracking setups for data collection from great apes.
[END]
---
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002857
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/