(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:https://journals.plos.org/plosone/s/licenses-and-copyright

------------



Language statistical learning responds to reinforcement learning principles rooted in the striatum

['Joan Orpella', 'Department Of Psychology', 'New York University', 'New York', 'United States Of America', 'Ernest Mas-Herrero', 'Cognition', 'Brain Plasticity Unit', 'Idibell', 'L Hospitalet De Llobregat']

Date: 2021-09

Abstract Statistical learning (SL) is the ability to extract regularities from the environment. In the domain of language, this ability is fundamental in the learning of words and structural rules. In lack of reliable online measures, statistical word and rule learning have been primarily investigated using offline (post-familiarization) tests, which gives limited insights into the dynamics of SL and its neural basis. Here, we capitalize on a novel task that tracks the online SL of simple syntactic structures combined with computational modeling to show that online SL responds to reinforcement learning principles rooted in striatal function. Specifically, we demonstrate—on 2 different cohorts—that a temporal difference model, which relies on prediction errors, accounts for participants’ online learning behavior. We then show that the trial-by-trial development of predictions through learning strongly correlates with activity in both ventral and dorsal striatum. Our results thus provide a detailed mechanistic account of language-related SL and an explanation for the oft-cited implication of the striatum in SL tasks. This work, therefore, bridges the long-standing gap between language learning and reinforcement learning phenomena.

Citation: Orpella J, Mas-Herrero E, Ripollés P, Marco-Pallarés J, de Diego-Balaguer R (2021) Language statistical learning responds to reinforcement learning principles rooted in the striatum. PLoS Biol 19(9): e3001119. https://doi.org/10.1371/journal.pbio.3001119 Academic Editor: Matthew F. S. Rushworth, Oxford University, UNITED KINGDOM Received: January 13, 2021; Accepted: August 2, 2021; Published: September 7, 2021 Copyright: © 2021 Orpella et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All relevant data can be found in the paper's Supporting Information files and in https://neurovault.org/collections/10421/. Funding: This work was supported by the European Research Council grant ERC-StG-313841 (TuningLang) (RdD-B) and the BFU2017-87109-P Grant from the Spanish Ministerio de Ciencia e Innovación (RdD-B), which is part of Agencia Estatal de Investigación (AEI) (Co-funded by the European Regional Development Fund. ERDF, a way to build Europe). We also thank CERCA Program / Generalitat de Catalunya for the institutional support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Abbreviations: BOLD, blood oxygenation level–dependent; fMRI, functional magnetic resonance imaging; FWE, family-wise error; LLE, log-likelihood estimate; LLR, log-likelihood ratio; MNI, Montreal Neurological Institute; MTL, medial temporal lobe; NAD, nonadjacent dependency; PE, prediction error; RT, reaction time; RW, Rescorla-Wagner; SL, statistical learning; TD, temporal difference; VTA/SNc, ventral tegmental area/substantial nigra pars compacta

Introduction Statistical learning (SL) is the ability to extract regularities from distributional information in the environment. As a concept, SL was most popularized by the work of Saffran and colleagues, who first demonstrated infants’ use of the transitional probabilities between syllables to learn both novel word forms [1] as well as simple grammatical relations (nonadjacent dependencies (NADs)) [2,3]. The idea of a mechanism for SL has since raised a considerable amount of interest, and much research has been devoted to mapping the scope of this cognitive feat. This work has been crucial in describing the SL phenomenon as it occurs across sensory modalities (auditory [4–6], visual [7,8] and haptic [9]), domains [8] (temporal and spatial), age groups [10,11], and even species (nonhuman primates [12] and rats [13]). After all this research, however, little is yet known about the mechanisms by which SL unfolds and their neural substrates. One of the main reasons for this important gap in the literature is that the vast majority of SL studies have focused on the output of learning, generally assessed via offline post-familiarization tests, rather than on the learning process itself [14,15]. It is only recently that work on SL has started to shift toward the use of online measures of learning. Online measures afford a more detailed representation of the learning dynamics and thus offer the possibility of generating hypotheses about the computations underlying SL. Online measures capitalize on the gradual development of participants’ ability to predict upcoming sensory information (e.g., an upcoming syllable or word) as the regularities of the input are learned (e.g., a statistical word form or a grammatical pattern). Indeed, prediction is often understood as the primary consequence of SL [16,17]. Interestingly, however, the status of prediction as the driver of SL, rather than a mere consequence of it, i.e., its causal implication in learning, has not been explicitly investigated. In the current study, we examined the online development of predictions as a fundamental computation for SL. In particular, we used an amply validated algorithm of reinforcement learning—temporal difference (TD) [18,19]—to model participants’ online learning behavior and investigate its neural correlates. Note that, in adopting a model of reinforcement learning, a domain where reward generally plays an important role, we are not assuming (nor discarding) the phenomenological experience of reward (e.g., intrinsic reward [20,21]) during SL. Instead, we assessed whether particular computational principles reflected in TD learning can account for participants’ SL behavior and their brain activity during learning. TD models are based on the succession of predictions and prediction errors (the difference between predicted and actual outcomes) at each time step, by which predictions are gradually tuned. In contrast to models typically used to explain SL (e.g., [22,23]), a vast body of research supports the neurobiological plausibility of TD learning, with findings of neural correlates of predictions and prediction errors both using cellular-level recordings and functional magnetic resonance imaging (fMRI). Several brain areas, notably the striatum, have been implicated in the shaping of predictions over time and the selection of corresponding output behavior [24–29]. Interestingly, activity in the striatum has also been documented in the SL of NADs [30,31] as well as of phonological word forms [32], but the precise role of these subcortical structures in this domain remains unspecified. With the aim of clarifying the mechanisms for SL and their neural underpinnings, we combined computational (TD) modeling with fMRI of participants’ brain activity while performing a language learning task. In particular, participants completed an incidental NAD learning paradigm. In natural languages, NADs are abundant and underly important morphological and syntactic probabilistic rules (e.g., the relationship between un and able in unbelievable). Sensitivity to NADs is therefore important in the early stages of grammar learning, when the relation between phrase elements is tracked at a superficial level and before more abstract representations (syntactic rules) can be created via other mechanisms and brain structures [33]. However, sensitivity to NADs can also be critical for speech segmentation in the early stages of word learning [34], both in prelexical development [2] and beyond (i.e., second language acquisition [35]). The main advantage of this particular SL task over similar tasks (e.g., [2,36]) is that it provides a reliable measure of online learning [37] that we can then model. For modeling, we used a TD algorithm for its greater sensitivity to temporal structure compared to simpler RL models (e.g., Rescorla-Wagner (RW) [38]). Note that this an important prerequisite for NAD learning, since the to-be-associated elements are separated in time. Nonetheless, we additionally compared the adequacy of these simpler algorithms to that of the TD model. We expected the interplay of predictions and prediction errors, as modeled by the TD algorithm, to closely match participants’ online SL behavior. In addition, and in line with the aforementioned research on both reinforcement learning and SL, we expected striatal activity to be associated with the computation of predictions.

Discussion In this study, we provide evidence for the SL of NADs as an instance of reinforcement learning. A TD model of reinforcement learning, which capitalizes on the iteration of predictions and prediction errors, was able to mimic participants’ RT data reflecting gradual SL over trials. This was replicated on 2 independent cohorts, producing similar model fits that were also clearly superior to those of simpler learning models. Functional neuroimaging data of participants’ online learning behavior also allowed us to examine the neural correlates of prediction-based SL. In line with neurocomputational models of TD learning, the trial-by–trial development of predictions from the initial word of the dependencies was strongly related to activity in bilateral striatum. Importantly, striatal activity was unrelated to the overt motor responses required by the task (i.e., button presses) or more general computations, supporting the implication of the striatum specifically in prediction-based SL. Evidence for the adequacy of a TD algorithm in capturing participants’ online NAD learning behavior offers novel insights into the mechanisms for SL. In particular, our results underscore the causal role of predictions for learning, compelling us to reassess the commonly assumed relationship between SL and predictive processing. Indeed, SL not only enables predictions (predictions as a consequence of SL), as generally understood (see p.e. [14]), but also capitalizes on predictions (predictions as a cause of SL). This new understanding of SL can thus offer interesting reinterpretations of previously reported correlations between SL abilities and predictive processing [16], raising questions about the direction of causality. Moreover, our results make an important contribution to the understanding of the neurobiological basis of SL. While previous research [14,42] has shown a similar behavioral development of online SL (cf. Fig 2), brain imaging data and its link to a mechanistic explanation of learning were lacking. Here, we used a measure of online SL behavior in combination with computational modeling and fMRI data to unveil the basic mechanism underlying learning and its brain correlates. A complementary approach to describing online SL, which involves the frequency tagging of participants’ neurophysiological responses over learning [43–45], has recently been used to track the emergence of new representations (in time and neuroanatomical space) as participants learn. We add to these findings by providing a mechanistic account for how these representations (i.e., learning) come to be and a plausible neuroanatomical substrate for its key computations. In particular, we show that the gradual development of predictions for SL is related to robust and widespread activity in bilateral striatum (Fig 4). This finding adds a valuable degree of specificity to the oft-cited implication of these subcortical structures in artificial grammar learning and SL more generally [30–32]. Both the adequacy of a TD model and the involvement of the striatum in prediction-based SL place this cognitive ability squarely in the terrain of reinforcement learning. Indeed, the link between prediction learning and activity in the striatum is one of the most robust findings in the reinforcement learning literature, from intracranial recordings to fMRI studies [25–29,40,46–48]. Activity in the ventral striatum, in particular, has been associated with the delivery and anticipation of rewarding stimuli of different types (i.e., from primary to higher-order rewards) [49]. More specifically, the ventral striatum interacts in complex ways with the dopaminergic system (mainly ventral tegmental area/substantial nigra pars compacta (VTA/SNc)) with responses consistent with the computation of reward prediction error [24,50–52]. Under this light, our reported pattern of activity in the ventral striatum is consistent with the gradual transfer over learning of prediction error–related dopaminergic responses from rewarding to predictive stimuli as found in classic conditioning paradigms [53,54]. That is, a gradual increase in response on A elements may be expected as their predictive value is learned, since these elements can never be anticipated. Alternatively, activity in the ventral striatum could reflect inhibitory signals aimed to attenuate dopaminergic inputs from the VTA/SNc [55] in response to C elements as these become more predictable. From a theoretical standpoint, it may be necessary to distinguish between the response of the reward system for learning and the phenomenological experience of reward [21,56]. Recent evidence [57–59], nonetheless, supports the notion of language learning as intrinsically rewarding [20] and suggests quantitative over qualitative differences between endogenous and exogenous sources of reward [21]. So far, the adequacy of reinforcement learning algorithms for the learning of intrinsically rewarding tasks has mainly remained theoretical [21,60]. Our results now contribute to this literature by showing their suitability in specific instances of SL. Still within the computations of the TD model, activity in the dorsal striatum (caudate and putamen) responds to the updates at each time step of the outcome value representations associated with each stimulus in ventromedial and orbitofrontal areas [61]. By updating the value associated with a particular behavioral option, the dorsal striatum takes a leading role in the selection of the most appropriate behavior [62,63]. In agreement with our results, recent data [64] suggest that this role also applies to the language domain, with evidence for an implication of the caudate nucleus in the selection of linguistic alternatives from left prefrontal perisylvian regions with which it is connected [65]. The caudate could also promote the attentional selection of behaviorally relevant elements from frontal cortical areas [66–68]. In our case, while this attentional selection should initially pertain to C elements (the target of the monitoring task), a shift toward A elements may also be expected as their predictive value increases. This interpretation is consistent with the finding of a gradual increase in an early attentional event-related component (the P2) over the exposure to (the A elements of) NADs but not to similar but unstructured material [69,70]. The appropriate behavioral alternative, in our case, pertaining to the identity of the final C element of each phrase, may be finally concretized as a specific motor articulatory representation of the element selected by the putamen from speech pre/motor areas used to predict upcoming auditory input [41]. This selection of a “covert” motor response is consistent with the attenuated (though not eliminated) activity in the putamen when regressing out overt motor responses (button presses; S3 Fig; cf. Fig 4). This cascade of processes, with the final selection of motor articulatory representations by the putamen, may be used to generate the corresponding auditory predictions [41], ultimately translating into increasingly faster RTs for the predicted C elements. In this view, activity in the posterior superior temporal gyrus (Fig 4) would reflect the downstream (i.e., sensory) consequences of this selection [41,71]. It is unknown at this moment in which representational space (e.g., auditory, motor, somatosensory), or by which mechanism, actual prediction testing takes place. However, the present data suggest that prediction-based SL may be fundamentally linked to such motor engagement as part of a learning mechanism orchestrated by the striatum. This is consistent with the observation that participants that are better at predicting speech inputs embedded in noise, a situation known to involve the speech motor system [72], are also better statistical learners [16], and agrees with the well-accepted role for these structures in procedural learning [73,74] and the managing of motor routines [52,65,75]. We speculate that this prediction mechanism via motor articulatory representations should become of critical importance for learning when putative alternative learning mechanisms (e.g., hippocampus; see below) are weakest, for example, when a temporal separation is imposed between the elements to be associated, as in our NAD learning task. It is interesting that, in contrast to recent findings [76], we did not find an implication of the hippocampus/medial temporal lobe (MTL) related to the online SL of NADs. Hippocampal/MTL and basal ganglia activity are often thought to reflect the workings of 2 distinct (complementary or competing) learning systems, traditionally related to declarative or explicit versus procedural or implicit learning systems, respectively [77]. Although striatal activity in our study is consistent with the incidental nature of our SL task, other SL tasks of incidental learning have also reported MTL/hippocampus [78] engagement. As mentioned previously, the difference may owe to the type of statistical relations present in the material. Specifically, as reflected in biologically inspired computational models (e.g., [78,79]) and in line with recent data [45], the MTL/hippocampus appears to capitalize on the relationship between pairs of adjacent stimuli. This contrasts with the TD model, which learns in part due to the low TPs between adjacent (AX and XC) elements (cf. [80]), and deals explicitly with the NADs through a temporal discounting parameter. It is thus unclear how the aforementioned models of the hippocampus would fare in the SL of NADs, where the relationship between adjacent elements is very weak. As it occurs with declarative and procedural learning, nevertheless, it is likely that these different learning systems be concurrently engaged in new learning situations. Reports of hippocampal or striatal activity in SL tasks will therefore depend on the nature of the materials employed. Finally, hippocampal activity may also respond to memories of the units of SL [81,82] (e.g., words in our case) as well as to its outputs [83]. Under this light, activity in the MTL system should correlate with aggregate measures of SL, as we observe (Fig 5), rather than with online SL. This is also in line with recent data [45] relating hippocampal activity to the encoding of the output units of SL (e.g., 3-syllable words), contrary to activity in prefrontal regions, as part of the frontostriatal system, which tracked the TPs. The striatum, in contrast, appears to be in charge of probabilistic learning [84] and may therefore be required in situations where uncertainty is present [85], as in our task. From this standpoint, the outputs of each system could thus potentially feed into each other; i.e., while the striatal system might utilize representations stored in the hippocampal system for SL, the latter might also come to store the outputs of that SL. In sum, by the combination of an online measure of SL, computational modeling, and functional neuroimaging, we provide evidence for SL as a process of gradual prediction learning strongly related to striatal function. This work, therefore, makes a valuable contribution to our understanding of the mechanisms and neurobiology of this cognitive phenomenon and introduces the provoking possibility of language-related SL as an instance of reinforcement learning orchestrated by the striatum.

Materials and methods Participants Two independent cohorts participated in the study. We first collected data from 20 volunteers from the Facultat de Psicologia of the Universitat de Barcelona as the behavioral group. Data from 1 participant were not correctly recorded, so the final cohort comprised 19 participants (15 women, mean age = 21 years, SD = 1.47). We used the partial η2 obtained for the main effect in the NADs block in the behavioral group to compute a sample size analysis for the fMRI group. To ensure 90% of power to detect a significant effect in a 2 × 2 repeated measures ANOVA at the 5% significance level based on this measure of effect size, MorePower [86] estimated that we would need a sample size of at least 16 participants. However, considering (i) that we expected participants to perform worse inside of the fMRI scanner and (ii) the recommendation that at least 30 participants should be included in an experiment in which the expected effect size is medium to large [87], we finally decided to double the recommended sample size for the fMRI experiment. The fMRI group thus consisted of 31 participants (20 women, mean age = 23 years, SD = 3.62) recruited at the Universidad de Granada. All participants were right-handed native Spanish speakers and self-reported no history of neurological or auditory problems. Participants in the fMRI group were cleared for MRI compatibility. The ERC-StG-313841 (TuningLang) protocol was reviewed and monitored by the European Research Council ethics monitoring office, approved by the ethics committee of the Universitat de Barcelona (IRB 00003099), and conducted in accordance with the Declaration of Helsinki. Participation was remunerated and proceeded with the written informed consent of all participants. Statistical learning paradigm Two different artificial languages were used in the NAD learning task. Each language comprised 28 bisyllabic (consonant-vowel-consonant-vowel) pseudowords (henceforth, words). Words were created using Mbrola speech synthesizer v3.02b through concatenating diphones from the Spanish male database “es1” (http://tcts.fpms.ac.be/synthesis/)) at a voice frequency of 16 KHz. The duration of each word was 385 ms. Words were combined to form 3-word phrases with 100 ms of silence inserted between words. Phrase stimuli were presented using the software Presentation (Neurobehavioral Systems) via Sennheiser over-ear headphones (pilot group) and MRI-compatible earphones (Sensimetrics, Malden, MA, USA; fMRI group). The learning phase consisted of a NADs block and a Random block, each employing a different language. The order of blocks was counterbalanced between participants. We also counterbalanced the languages assigned to NADs and Random blocks. The NADs block consisted of 72 structured phrases (phrases with dependencies) whereby the initial word (A) was 100% predictive of the last word (C) of the phrase. We used 2 different dependencies (A1_C1 and A2_C2) presented over 18 different intervening (X) elements to form AXC-type phrases. Twelve of the 18 X elements were common to both dependencies, while the remaining 6 were unique to each dependency. These 36 structured phrases were presented twice over the NADs block, making a total of 72 AXC-type structured phrases issued in pseudorandom order. The probability of transitioning from a given A element to a particular X was therefore 0.056. Phrases in the Random block were made out of the combination 2 X elements and a final C element (either C1 or C2, occurring with equal probability). Note that, while C elements could be predicted with 100% certainty in the NADs block, these could not be predicted from the previous X elements in the Random block. X elements were combined so that each X word had an equal probability to appear in first and second position but never twice within the same phrase. Forty-eight unstructured phrases were presented twice over the Random block, making a total of 96 pseudorandomized XXC-type unstructured phrases. Each 3-word phrase, in both NADs and Random blocks, was considered a trial for the fMRI analysis. A recognition test was issued at the end of each block to assess offline learning (see S1 Text for further details). To obtain an online measure of incidental learning, participants were instructed to detect, as fast as possible via a button press, the presence or absence of a given target word. The target word for each participant remained constant throughout the block and was no other than one of the 2 C elements of the language (C1 or C2, counterbalanced). A written version of the participant’s target word was displayed in the middle of the screen for reference throughout the entire learning phase. Importantly, participants were not informed about the presence of dependencies, so this word-monitoring task was in essence orthogonal to SL. Yet, if incidental learning of the dependencies occurred over trials in the NADs block, faster mean RTs should be observed for this block compared to the Random block where the appearance or nonappearance of the target word could not be anticipated from any of the preceding elements. Fig 1 details a trial in the SL task. The participant’s target word (e.g., RUNI) appeared on the screen above a fixation cross to signal the beginning of each trial and remained on the screen throughout the trial. A 3-word phrase was delivered auditorily (1,460 ms) 300 ms later, followed by a prompt to respond YES/NO to target presence/absence, respectively. A maximum of 1,500 ms was given for participants to indicate their response before the intertrial interval began. Upon response, the target word disappeared into the intertrial interval, which lasted 1,000 ms (behavioral group) or was jittered between 1,500 ms and 3,000 ms (fMRI group). fMRI analyses (event onsets; see below) were time locked to the onset of each phrase presentation, and the trial duration was defined as the duration of the phrase. Participants in the behavioral group indicated the detection or nondetection of the target word by pressing the left and right arrow keys of the computer keyboard, respectively. They were required to use their left index finger to press the left arrow key, and the right index finger to press the right arrow key. Participants in the fMRI group responded using the buttons corresponding to thumb and index fingers in an MRI compatible device held in their right hand. Response buttons were not counterbalanced for either group. Intertrial interval was fixed at 1,000 ms in the behavioral study and jittered (with pseudorandom values between 1,500 and 3,000 ms) for testing during fMRI acquisition. At the end of a given phrase, a maximum of 1,500 ms was allowed for participants to respond. RTs were calculated from onset of the last word in the phrase until button press. Only trials with correct responses under 1,000 ms were entered into subsequent analyses. Participants’ NAD Effect were calculated as the mean RT difference between unstructured (Random block) and structured (NADs block) trials. A repeated measures ANOVA on participants’ RT data with within-participants factors Structure (NADs/Random) and Target (Target/No Target) and Order as a between-participants factor was initially performed to discard block order effects. A repeated measures ANOVA with factors Structure (NADs/Random) and Target (Target/No Target) was subsequently performed to assess the statistical significance of learning. Linear mixed model analysis In order to assess online SL in the NAD learning task within each experimental group, we used a linear mixed model approach to fit learning slopes reflecting RT gains over trials for the NADs versus Random conditions. The use of mixed models to compare the slope between conditions allows the use of RT data for all trials and participants, which results in a more sensitive measure of the online learning process than a single mean value per participant [15,88]. Analyses were performed using the lme4 [89] and lmerTest [90] packages as implemented in the R statistical language (R Core Team, 2012). Our basic model included RT (rt) and trial as continuous variables, condition (Random, NADs), and TNT (target/no target) as 2-level factors, and participant as a factor with as many levels as participants in each group. (1) (2) As shown in (1), which indicates the specified model, we introduced condition, trial, and their interaction as fixed effects terms. TNT was included as a predictor of no interest. To account for interparticipant variability in basal response speed, we allowed for a different intercept per participant by introducing participant as a random effect. The algebraic expression of the fixed effects part of the model is given in (2). Note that, in this this model, β4 (condition*trial) represents an estimate of the difference in learning slopes between the NADs and Random conditions and can therefore be interpreted as a detrended learning slope estimate for the NADs condition. For the sake of clarity, we have referred to this estimate as βdiff. A statistically significant negative βdiff indicates that online NAD learning effectively took place over and beyond any RT gain that may be attributed to practice effects. Temporal difference model We modeled participants’ learning of the dependencies using a TD model [18,19]. Drawing from earlier models of associative learning, such as the RW model [38], the main assumption of TD models is that learning is driven by a measure of the mismatch between predicted and actual outcome [18,19,40,91] (i.e., prediction error (PE)). For instance, when an X element is presented in a NAD block’s AXC trial, the PE is computed as: (3) where ∂1 is the PE term at element X and trial t, which amounts to the discrepancy between the action value at that state [V(Xi)], and the predictions driven by the previous visited state [V(Ai)]. Computationally, learning through TD is therefore conceptualized (and modeled) as prediction learning [40], where the action values/predictions of the previous visited state (following the previous example, of element A) are then updated according to: (4) where α is a free parameter that represents the learning rate of the participant and determines the weight attributed to new events and the PE they generate [18]. Similarly, when hearing element C, a new PE is generated based on the presence or absence of the target word: (5) where R is specified as +1 if it is the target element or −1 if not. Note that the sign choice represents a convenient yet arbitrary means to distinguish target and no target outcomes within the same model. This could have been inverted (R (target) = −1, R (no target) = 1) with no difference in the model’s results. Element X is then updated accordingly: (6) One of the advantages of TD models over simpler models of learning, such as the RW (see below), is that they account for the sequence of events leading to an outcome, rather than treating each trial as a discrete temporal event. That is, although each trial for the participant (i.e., each 3-word phrase) was equivalently treated as a trial for the TD model, model updates occurred at the presentation of each individual element (see below). TD models are thus sensitive to the precise temporal relationship between the succession of predictions and outcomes that take place in a learning trial [18]. Note that this is particularly valuable in trying to account for the learning of NADs as distinct from adjacent dependencies, making a TD model preferable in such cases. This feature is implemented as a temporal discounting factor; this is an additional free parameter γ that represents the devaluation of predictions that are more distant from the outcome [47,92]. Thus, upon “hearing” the final element of a structured (AXC) phrase, the prediction from the initial element A is also updated according to: (7) The absolute value of V(Ai) reflects its predictive capacities and the associative strength between element A and a particular response, with higher values indicating stronger predictions. Because of this, have replaced the formal term V(A) by the alternative term p(A) throughout the manuscript, as p(A) may be more intuitively related to predictions from element A by the general reader than V(A). As a behavioral index of participants’ predictive capacities, we used RTs, since RTs should be faster when the associations are learnt than when they are not. RTs were first standardized (z scored; zRT) and then normalized between 0 and 1 by the softmax type function: (8) Note that this function will output larger xRT values for lower input zRT values (and, conversely, smaller xRTs values for higher input zRTs), in accord with the idea that better predictive capacities will elicit faster RTs. Importantly, the function also minimizes the effect of extreme RT values. To fit the free model parameters to each participant’s responses, we assumed the following function to minimize the difference between the absolute value of V(Ai) and the transformed RT in a given trial t: (9) We then selected the α and γ values that produced the maximum LLE, indicating the best possible fit between the model predictions and the participant’s transformed RTs. For this, we used Matlab’s (Matlab R2017 by Mathworks) fmincon function, which implements a Nelder–Mead simplex method [93]. The model was then run for each block (NADs and Random) separately, from which trial-wise prediction values [abs(V(A)t) and abs(V(X)t)] for the different phrase elements A and X (resulting in matrices P(A) and P(X), respectively) were computed. In summary, adopting the alternative terminology (p(A) instead of abs(V(A))), the TD algorithm for each trial was implemented as depicted in Fig 6. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 6. TD model’s computations during a NADs block’s trial. The initial A element of the phrase (e.g., jupo) carries a prediction value p(A). A prediction error ∂1, generated when this prediction is not met by the occurrence of the second word X (bade), is used to update the initial prediction p(A), scaled by the learning rate α. A new prediction is issued from this second word p(X), which also generates a prediction error ∂2 on C (runi). ∂2 is then used to update both p(X) and p(A), scaled by the learning rate and (down)scaled by the temporal discounting factor γ in the case of the more distant prediction p(A). AD, nonadjacent dependency; TD, temporal difference. https://doi.org/10.1371/journal.pbio.3001119.g006 To illustrate the consistency between participants’ RTs and model predictions, both which we assume to be proxies for SL, we plotted the development of P(A) computed by the model (inverted as 1-P(A)) averaged across participants against the mean RTs of the participants over trials in the NADs block (both z-scored; main text Figs 2 and 3). To assess the fit of the TD model, we computed for each participant the LLR between the TD model’s LLE and the LLE produced by a model predicting at chance. To make fit assessment more intuitive, a model fit index was then calculated as 1 –LLR, where higher model fit index values equate to a better fit. The overall fit of the TD model was assessed at the group level by averaging across participants model fit indexes. Rescorla-Wagner model The RW model was specified as follows: in each trial, the state values V(Ai)t and V(Xi)t were summed to produce a single prediction V(AiXi)t per trial. No PE was computed on the second element of each phrase (e.g., X in NAD phrases). A PE was computed for each C element as ∂ = R–V(AiXi)t. This was then used to update both V(Ai)t and V(Xi)t as V(Ai)t+1 = V(Ai)t + α · ∂ and V(Xi)t+1 = V(Xi)t + α · ∂, respectively. Note that this is equivalent to the TD model’s updates without the γ term. fMRI acquisition and apparatus The SL task comprised a single run with 830 volumes. Functional T2*-weighted images were acquired using a Siemens Magnetom TrioTim syngo MR B17 3T scanner and a gradient echo-planar imaging sequence to measure BOLD contrast over the whole brain [repetition time (TR) 2,000 ms, echo time (TE) 25 ms; 35 slices acquired in descending order; slice-thickness: 3.5 mm, 68 × 68 matrix in plane resolution = 3.5 × 3.5 mm; flip angle = 180°]. We also acquired a high-resolution 3D T1 structural volume using a magnetization-prepared rapid-acquisition gradient echo (MPRAGE) sequence [TR = 2,500 ms, TE = 3.69 ms, inversion time (TI) = 1,100 ms, flip angle = 90°, FOV = 256 mm, spatial resolution = 1 mm3/voxel]. fMRI preprocessing and analysis Data were preprocessed using Statistical Parameter Mapping software (SPM12, Wellcome Trust Centre for Neuroimaging, University College, London, UK; www.fil.ion.ucl.ac.uk/spm/). Functional images were realigned, and the mean of the images was coregistered to the T1. The T1 was then segmented into gray and white matter using the Unified Segmentation algorithm [94], and the resulting forward transformation matrix was used to normalize the functional images to standard Montreal Neurological Institute (MNI) space. Functional volumes were resampled to 2 mm3 voxels and spatially smoothed using an 8-mm FWHM kernel. Several event-related design matrices were specified for convolutions with the canonical hemodynamic response function. Trial onsets were always defined as the onset of the first word of the phrase. To identify brain regions related to the trial-by-trial development of participants’ predictions/PEs, a model with the conditions NADs Target, NADs No Target, Random Target and Random No Target, and all offline test conditions (see S5 Fig) was specified at the first level. This also included, in first place and for each trial for each of the conditions of interest (NADs Target, NADs No Target, Random Target, and Random No Target), a parametric modulator (a vector) corresponding to the RT (z-scored and inverted); and in second, a parametric modulator (a vector) corresponding to the trial-by-trial prediction/PE (p(A), also z-scored; S5 Fig). Events were time locked to the onset of the phrase auditorily presented in that trial. In all cases, data were high-pass filtered (to a max. of 1/90 Hz). Serial autocorrelations were also estimated using an autoregressive (AR [1]) model. We additionally included, in all the models described above, the movement parameter estimates for each participant computed during preprocessing to minimize the impact of head movement on the data (S5 Fig). For each participant, the following contrasts were calculated at the first level (S5 Fig): NADs zP(A) versus implicit baseline;

NADs zP(A) versus Random zP(X1);

NADs zP(A) versus NADs zinvRT;

NADs zP(A) versus Random zinvRT; and

NADs versus Random. These were subsequently entered into corresponding one-sample t tests at the second level to arrive at the reported fMRI results. We used the Automated Anatomical Labelling Atlas [95] included in the xjView toolbox (http://www.alivelearn.net/xjview8/)) to identify anatomical and cytoarchitectonic brain areas. Group results are reported for clusters at a p < 0.001 FWE-corrected threshold at the cluster level (p < 0.001 uncorrected at the voxel level), with a minimum cluster extent of 20 voxels.

Acknowledgments We thank Sonja A. Kotz and Floris de Lange’s Predictive Brain Lab for helpful comments on this work.

[END]

[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001119

(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL: https://creativecommons.org/licenses/by/4.0/


via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/