(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Local and global reward learning in the lateral frontal cortex show differential development during human adolescence [1]
['Marco K. Wittmann', 'Department Of Experimental Psychology', 'University Of Oxford', 'Radcliffe Observatory', 'Oxford', 'United Kingdom', 'Wellcome Centre For Integrative Neuroimaging', 'John Radcliffe Hospital', 'Headington', 'University College London']
Date: 2023-03
Reward-guided choice is fundamental for adaptive behaviour and depends on several component processes supported by prefrontal cortex. Here, across three studies, we show that two such component processes, linking reward to specific choices and estimating the global reward state, develop during human adolescence and are linked to the lateral portions of the prefrontal cortex. These processes reflect the assignment of rewards contingently to local choices, or noncontingently, to choices that make up the global reward history. Using matched experimental tasks and analysis platforms, we show the influence of both mechanisms increase during adolescence (study 1) and that lesions to lateral frontal cortex (that included and/or disconnected both orbitofrontal and insula cortex) in human adult patients (study 2) and macaque monkeys (study 3) impair both local and global reward learning. Developmental effects were distinguishable from the influence of a decision bias on choice behaviour, known to depend on medial prefrontal cortex. Differences in local and global assignments of reward to choices across adolescence, in the context of delayed grey matter maturation of the lateral orbitofrontal and anterior insula cortex, may underlie changes in adaptive behaviour.
The finding that the behavioural mechanisms of local and reward learning continue to change during adolescence align well with the nonhuman primate literature [ 6 , 7 , 14 , 17 ] and our knowledge about structural brain maturation in humans: Lateral frontal, compared to medial frontal brain regions, appear to continue to mature during adolescence well into adulthood [ 31 , 33 , 34 ], and, hence, we would expect functions that depend on this part of the brain to keep changing during this time period as well. However, only manipulation approaches can provide evidence for a causal reliance of a cognitive function on a neural substrate [ 16 , 48 ]. Study 2 therefore examined the impact of broad lesions to lateral frontal cortex (lesions included and/or disconnected both orbitofrontal and insula cortex) on local and global reward learning. These studies used experimental tasks and analyses pipelines that were tightly matched to study 1 in cohorts of adult patients with medial or lateral frontal lobe lesions. The results indicated that indeed intact lateral frontal cortex is causally important for both local and global reward learning. Finally, in study 3, we reanalysed nonhuman primate data [ 6 , 14 ] that had initially suggested that these lateral frontal regions are important for local reward learning, again using matched experimental tasks and analysis pipelines. This uncovered that lateral lesions in macaques (that likely disconnected both orbitofrontal and insula cortex) also impaired global reward learning. This offers new insights into how the GRS guides choices differently in humans and macaques. While humans showed negative GRS effects, macaques showed positive ones. Together, our results suggest that local and global reward learning mature during adolescence (study 1) and that both learning mechanisms causally depend on (subregions within) lateral frontal cortex (study 2 and study 3). This suggests that the protracted neural maturation in lateral frontal regions [ 31 , 33 , 34 ] is a key driver for the maturation of local and global reward learning during adolescence.
(A) Trial timeline: Participants decided between three choice options (red, green, blue squares; left-hand side) before receiving feedback for 1,500 ms (right-hand side) indicating whether their choice yielded a reward (10 points and smiley face) or no reward (no points and sad face). Both possible outcomes are displayed in this example. (B) Reward probabilities ranged between 0 and .9 and drifted throughout the session with each option being competitive at some time during the session. (C) Age distribution of the final sample with dashed line indexing the age groupings cutoff at 18 years. Participants younger than 18 are referred to as adolescents; participants 18 years and older are referred to as young adults. Data for B and C are available in S1 Data (Figure1 tab).
Here, we combined behavioural and lesion investigations to suggest an important role for neighbouring subregions of the lateral frontal cortex, specifically orbitofrontal and anterior insula cortex in the development of local and global reward learning. We used the same multi-option probabilistic learning task originally developed in macaques to dissociate local and global reward learning ( Fig 1 ) [ 4 , 6 , 14 ]. In study 1, we tested a large online sample (overall n = 422) of adolescents (11 to 17 years) and young adults (age 18 to 35 years) and showed that both local and global reward learning change during human adolescence. We chose this age range in accord with previous work [ 47 ] and with particular reference to the protracted maturation profile of lateral prefrontal cortex [ 30 ]. Our findings suggest that young adults associated choices more strongly with local rewards and were simultaneously negatively influenced by the GRS. The GRS influence became even more negative with age, meaning young adults, more than adolescents, contextualised their choices within the longer-term reward context and were less likely to persist with a choice if the alternatives afforded by this context were attractive. By contrast, decision bias mechanisms that depend on the ventromedial portions of the orbitofrontal cortex showed no relationship with age.
Informed by these animal models and the precise functional localisation of these mechanisms, we consider the development of reward-guided learning in the context of the protracted [ 26 – 33 ] and nonuniform [ 31 , 33 ] structural maturation of the brain. These considerations lead to the hypothesis that specific cognitive abilities, particularly those related to lateral prefrontal cortex, mature later than others, in particular, the more medial regions [ 31 , 33 , 34 ]. This temporal mismatch between protracted structural changes in prefrontal cortex and more rapid maturation of subcortical areas has been suggested to account for increased risk-taking behaviour in adolescence [ 35 , 36 ], and several studies link development of reward-related behaviour to changes in prefrontal–subcortical interactions [ 37 – 39 ]. However, prefrontal cortex has often been treated as a unitary structure, and, consequently, we only have a coarse understanding of the different speeds at which subregions of the frontal cortex and, in parallel, subcomponents of reward learning mature [ 40 ]. With tentative evidence that adolescents differ from adults in terms of local reward learning, for instance, in terms of balancing positive and negative feedback [ 41 – 43 , 43 – 46 ], it becomes critical to understand the development of reward learning mechanisms in combination with developmental maturation of their neural underpinnings.
Here,s we focus on the development of component processes of reward learning that have been strongly linked to neighbouring regions of orbitofrontal and anterior insula cortex in studies of nonhuman primates: local and global reward learning. Local reward learning refers to the ability to form contingencies between choice options and outcomes, repeating choices that led to positive outcomes and omitting choices that led to negative outcomes [ 11 – 13 ] (also referred to as “contingent reward learning” or “contingent credit assignment”). By contrast, global reward learning refers to a parallel mechanism where reward simultaneously reinforces not only the choice that caused it but also unrelated choices made in close temporal proximity [ 6 , 7 , 14 – 17 ]. This noncontingent global reward learning involves forming a representation of the general reward state (GRS), i.e., how much reward was received overall recently independent of the specific choices that caused them [ 7 ]. Lesion studies in macaques and human patients have consistently causally linked local reward learning to lateral orbitofrontal cortex [ 6 , 7 , 14 , 17 ], and this is even engrained in variations of grey matter volume in these regions [ 18 ]. By contrast, global reward learning mechanisms have been associated with BOLD activity in neighbouring anterior insula cortex [ 7 ]. Notably, the function of both these regions contrasts with medial orbitofrontal/ventromedial prefrontal cortex, which harbours a variety of value signals linked to value comparison and decision-making processes, as opposed to learning processes [ 14 , 16 , 19 – 25 ].
A distributed network in the human brain supports learning from reward and making adaptive decisions. This network comprises several regions in lateral and medial prefrontal cortex (PFC), including lateral and medial orbitofrontal/ventromedial prefrontal cortex, as well as other areas such as anterior cingulate cortex, insula cortex, and the amygdala. In concert, they contribute component parts to adaptive behaviour such as contingency learning, value comparison, and value representations [ 1 – 8 ]. However, so far, we only have rudimentary knowledge about the developmental dynamics of this brain network and accompanying behavioural changes during adolescence and early adulthood [ 9 , 10 ].
Study 1: Upper two panels . People must behave adaptively in complex reward environments. They pursue a current choice (banana symbol within circle) that is embedded in the GRS—the global levels of reward afforded by the environment over time (tree symbols on the periphery of the circle). Adolescents switch away from the currently pursued choice if the GRS is high (small arrow pointing outwards). This can be understood as a contrast effect comparing choice and GRS. Adults show such a contrasting effect of the GRS even more strongly. They contextualise the current choice within the set of alternative options. Knowledge that rich alternatives exist makes adults switch away from their current choice more easily. The current choices appear less valuable if the GRS is very high. The increased reliance on the GRS over the course of development coincides with grey matter maturation in lateral frontal lobe regions including the anterior insula and lateral orbitofrontal cortex. Study 2: Lower left panel : Lesions to lateral frontal lobe (affecting multiple subregions) reduces the GRS effect in human adults. Study 3: Lower right panel : Macaques also contextualise current choices within the GRS. However, macaques use the GRS fundamentally differently compared to humans. They show “spread of effect”: The GRS positively affects the value of a current choice, and this makes macaques stay more with a choice if the GRS is high. Strikingly, lesions to lateral frontal lobe (again affecting multiple subregions) increase rather than decrease this effect (thick arrow surrounding small arrow indicates stronger GRS effect after lesions to lateral frontal lobe).
(A) In intact monkeys, we found a small but significantly positive effects of the GRS on stay decisions, which was sign-reversed relative to the negative GRS effect we had found in humans. The inset shows the human GRS effect averaged for all ages (11–35 years) in the matched experimental paradigm (see Fig 3 , study 1). (B) We compared the effects of frontal lobe lesions in the macaque monkey and revealed that the positive GRS effects after lateral frontal lesions were significantly stronger compared to medial lesion groups. The pattern mirrored the human lesion results but in the opposite direction, with lateral lesions in humans abolishing the negative GRS effect (inset: GRS effects from Fig 4 , study 3; “Lat” abbreviates “Lateral,” and “Med” abbreviates “Medial”) (*p < 0.05). Data for A and B are available in S1 Data (Figure 5 tab).
Controlling for local reward learning, we found a small but significantly positive effect of the GRS on stay decisions in the baseline data (intercept-estimate = 0.007, SE = 0.003; χ2(1) = 5.122, p = 0.024; Fig 5A ). Therefore, we indeed found a sign-reversed effect of the GRS in macaques compared to our human participants using matched experimental paradigms and the same “credit assignment GLM.” Moreover, comparing the effects of lateral frontal lobe lesions to medial lesions in macaques, we found that lateral lesions, too, significantly impacted global reward learning. This mirrored the findings from our human lesion study. However, strikingly and in contrast to our human sample, the GRS effects we observed after lateral lesions were significantly stronger (rather than weaker) compared to medial lesion groups (estimate-lateral = 0.027, SE = 0.009; χ 2 (1) = 5.080,p = 0.024; Fig 5B ). This finding further strengthens the idea that both species use the GRS qualitatively differently during learning within the context in which these experiments were conducted. While humans rely negatively on the GRS and this capacity is abolished after lateral lesions, positive GRS effects are amplified in macaques after lateral frontal lobe lesions. This supports the contention that the GRS effect reflects a task-adaptive process in humans, which matures during adolescences and is compromised by lesions, whereas in monkeys, the GRS effect may lead to a suboptimal “spread” of reward, which is even increased by lateral frontal lobe lesions ( Fig 6 ).
Finally, we reanalysed the nonhuman primate data that has initially contributed to the suggestion that a subregion of the lateral frontal lobe, i.e., the lateral orbitofrontal cortex, is causally important for credit assignment and local reward learning [ 6 , 14 ]. We did this for two reasons: first, to follow up and confirm an intriguing effect in our data using more closely matched tasks: GRS effects were negative in the human sample reported here, while they were positive in previous macaque work [ 7 ]. This meant macaques stayed with a choice more when the GRS was high [ 7 ], whereas human participants reported here switched more when the GRS was high. The second reason was to examine, for the first time, whether changes in global reward learning were also apparent after lateral frontal lobe lesions in macaques, like in our human lesion data. We combined our “credit assignment GLM” with several previously published data sets of macaque choice behaviour [ 6 , 14 ]. This allowed our analysis to be optimised towards discovering fine-grained effects of the GRS in a uniquely large data set. We used linear mixed effects (LME) models to account for the fact that multiple sessions belonged to the same individual. We analysed 190 sessions from intact monkeys, 45 sessions from monkeys with lateral prefrontal lesion, 55 sessions from monkeys with medial prefrontal lesion, from an overall of 7 monkeys aged 4 to 10 years. Note, while these lateral lesions targeted orbitofrontal area 11+13, there is strong reason to believe that they also disconnected lateral area 47/12o and likely other neighbouring regions including anterior insula cortex (see [ 18 , 60 ] for discussion and refer to Methods for specifics of the lesions sites and nomenclature). In other words, just as in our human lesion study, we must assume that multiple subregions of the lateral frontal lobe that have dissociable functions were affected by the lateral lesions.
We compared lateral frontal lobe patients to a brain damaged control group of patients with lesions to the medial frontal lobe. We would expect participants with lateral lesions to rely less on local rewards (decreased CxR t effect) and also to exhibit a less negative GRS effect compared to subjects with medial lesions. In other words, we would expect a “lesion site” [lateral,medial] by reward type [CxR t ,GRS] interaction. This was indeed precisely the effect we found (F 1,6 = 7.4; p = 0.035; Fig 4 ). Lateral frontal lobe lesions caused patients to rely less on local rewards when learning about their choice options and at the same time their learning was less influenced by global reward learning. This suggests that lateral frontal cortex is the likely neural substrate that enables intact local and global reward learning.
The finding that the behavioural mechanisms of local and reward learning continue to change during adolescence align well with the nonhuman primate literature [ 6 , 7 , 14 , 17 ] and our knowledge about structural brain maturation in humans. Specifically, compared to medial frontal brain regions, appear to continue to mature during adolescence well into adulthood [ 31 , 33 , 34 ]. Hence, we would expect functions that depend on this part of the brain to keep changing during this time period as well. Our own analysis of Human Connectome data [ 58 , 59 ] in a set of selected reward-sensitive regions of interests (ROIs) confirmed that the greatest age-related differences over our investigated age range existed in lateral and not medial regions of the brain’s reward circuitry (Fig E in S1 Text ). However, only manipulation approaches can provide evidence for a causal reliance of a cognitive function on a neural substrate. Study 2 therefore examined the impact of broad lesions to lateral frontal cortex on local and global reward learning. In study 2, we reanalysed behavioural data in adult patients with lateral (n = 4) and medial (n = 4) frontal lesions using the same experimental paradigm as study 1 (Fig F in S1 Text [ 16 ]) and a matched analysis pipeline. As is often the case in patient lesion studies, the lesions did not adhere to strict anatomical boundaries. The lateral lesions encompassed regions related to local reward learning in lateral orbitofrontal cortex [ 6 , 14 , 17 ] as well as more posterior regions in the anterior insula linked to global reward learning [ 7 ]. While this was a convenience sample, as the data already existed, the developmental behavioural task was designed to specifically align with these previously published experimental paradigms. We also used the same “credit assignment GLM” employed in study 1 ( Methods ).
Note that the effect of local reward learning/contingent reward learning in our GLM is conceptually similar to a learning rate fitted with a reinforcement learning algorithm. Both denote the weight that a new outcome has for updating the value of the corresponding choice [ 7 , 56 , 57 ]. Higher learning rates, just as a higher local reward learning effect sizes, indicate that an outcome changes the future value of a choice more strongly. Correspondingly, there is a strong positive relationship between the learning rate fitted from our reinforcement learning model and local reward learning (r = 0.35, p < 0.001; correlation of learning rate with CxR t ). By contrast, the GRS effect is conceptually different from a reward learning rate, because it indicates the effect of a longer-term average reward that is not specifically linked to a choice. Hence, reinforcement learning rate and GRS effect size are uncorrelated (r = −0.05, p = 0.340). The GRS effect therefore indicates a qualitatively different effect. Also as expected, neither local nor global reward learning were associated with the inverse temperature from the reinforcement learning model, as the latter indices decision noise rather than the weighting of reward outcomes (inverse temperature versus CxR t : r = 0.07, p = 0.168; inverse temperature versus GRS: r = −0.04, p = 0.494).
Interestingly, negative GRS effects and positive CxR effects were negatively correlated across participants (Pearson correlation; R = −0.16, p = 0.002; Fig C in S1 Text ) and both mechanisms correlated with broad task success. Independent of age, there were significant positive correlations between local reward learning and the total rewards earned on task (r = 0.244, p < 0.001) and proportion of best choices (r = 0.543, p < 0.001). This pattern was mirrored for global reward learning with a negative correlation with total rewards earned that trended towards significance (r = −0.10, p = 0.069) and a significant negative relationship with proportion of best choices (r = −0.17, p = 0.001). This pattern of results indicates that participants who performed particularly well in linking local rewards with the specific choices that caused them also had more negative GRS effects. This suggests that both aspects of reward learning, local assignments of reward and the ability to switch away from unrewarded choices more easily if the reward environment was rich, constituted complementary aspects of task-adaptive behaviour with both processes significantly and simultaneously gaining more influence over behaviour during adolescence. Importantly, our GRS effects of interest also remain stable when varying the history length over which the GLM is calculated (Fig D in S1 Text ).
Notably, such a negative directionality of the GRS effect is in line with theoretical predictions from behavioural ecology [ 50 ] and suggests that to maximise rewards over the long run, reward outcomes should be referenced to the background rate of reward available in an environment: Animals should spend longer foraging for reward if alternative options are scarce, whereas they should be quick to abandon a depleting food source if the frequency of high-value alternatives are high. By conceptualising participants’ choices as stay/leave decisions, we were able to identify precisely this choice pattern in our human participants in a 3-option bandit task: A negative GRS effect meant that participants switched away from an option more readily when high-value alternative options were available and they persisted with poor options when the value of the alternatives were low [ 50 – 55 ].
Our results suggest that the GRS alters the behavioural response to rewards received for a current choice over and above the effect of local rewards. To illustrate the effects of the GRS more directly, we plot choice residuals as a function of the GRS and the most recent local reward, CxR t , and age using a 2 × 2 × 2 ANOVA. We rearranged the data as a function of winStay (staying with a choice after a local reward at time point t) and loseSwitch (switching away from a choice after a negative local outcome at time point t; see Methods ). The analysis revealed an interaction of winStay/loseSwitch and GRS independent of age group (winStay/loseSwitch × GRS interaction, F 1,380 = 71.69, p < 0.001) illustrating the GRS effect observed before: While participants were more likely to stay after a reward, they did this even more in a low GRS; in a high GRS, they were quicker to switch away from unrewarded choices. However, critically, the GRSxWinStay/loseSwitch interaction changed with age group in a manner suggesting that adolescents were relatively less influenced by the GRS in value updating (winStay/loseSwitch × GRS x age: F 1,380 = 7.97, p = 0.005). Older participants, by contrast, showed a stronger contrast effect after receiving reward: In low-GRS environments, they were particularly likely to stay with rewarded options and less likely to switch away from unrewarded ones.
Note that in contrast to the developmental changes in local and global reward learning–computations linked to lateral orbitofrontal and anterior insula cortex, we found no evidence for developmental changes associated with some decision variables previously associated with medial orbitofrontal/ventromedial prefrontal cortex. We considered two markers of decision computations: (1) the decision noise as calculated with a reinforcement learning model; and (2) a “bias by irrelevant alternatives” effect. Both have been related primarily to medial orbitofrontal/ventromedial prefrontal cortex functions in the past [ 5 , 7 , 14 , 16 ] and found neither showed developmental changes across the age range tested, potentially suggesting that these have already reached a relatively stable functional maturation point by adolescence (Fig B in S1 Text ).
Next, we examined the influence of the GRS on staying with a currently pursued choice using the same GLM model reported above: This assured that any identified GRS effects were dissociated from those of local reward learning. GRS was calculated by averaging recent rewards irrespective of choice and nonzero effects indicate that the overall average levels of rewards influence decisions to stick with a choice. Previous work has shown that GRS effects are positive in macaque monkeys [ 7 ]. However, in our human sample, strikingly, we found a significantly negative effect of the GRS (one-sample t test on all participants, t 352 = 7.00, p < 0.001). The effects were significantly negative in both the adolescent (one-sample t test, t 154 = −2.78, p = 0.006) and the young adult sample (one-sample t test, t 197 = −6.72, p < 0.001; Fig 3D ). That indicates that irrespective of directly reinforced choices, if participants had observed many rewards in the recent past (high GRS), then they were more likely to switch away from the current choice. By contrast, if the GRS was low, indicating the absence of better alternatives in the past, then participants were more likely to continue pursuing their choice even in the absence of local reward. Importantly, we predicted that if GRS effects are indeed mediated by late maturing regions of cortex, they would change during adolescence. In accordance with our prediction, we found that the GRS effect was more negative in young adults compared to adolescents (independent samples t test, t 351 = −2.89; p = 0.004; Fig 3D ) and correlated negatively with age (Pearson correlation, R = −0.14, p = 0.011; Fig 3E ). Again, follow-up model fit analyses suggested that this relationship was best characterised by a linear function rather than a quadratic one (Table A in S1 Text ).
First, we examined the effects of local reward learning (CxR-history) across our entire sample (regardless of age). As expected, the effects of local reward on choices differed by time point (1-way ANOVA: F 3,1035 = 76.42, p < 0.001) with the most recent local reward at time point t (CxR t ) having a significantly larger effect than the previous ones, even after Bonferroni correction (for all pairwise comparisons of CxR t using paired t tests: t > 9.095, p < 0.001). When a chosen option was rewarded, then there was an increased tendency to stay with the option and choose it again (one-sample t test; t 352 = 10.92, p < 0.001). Comparing the effect sizes of CxR t between adolescents and young adults showed that the size of this effect was bigger in young adults (independent samples t test, t 351 = 4.34, p < 0.001; Fig 3B ) suggesting increasing associability between rewards and local choices. Correlation analyses showed a significant positive relationship between age and CxR t (Pearson correlation, R = 0.22, p < 0.001; Fig 3C ), which follow-up model fits suggested was best characterised by a linear function rather than a quadratic one (Table A in S1 Text ). By contrast, we found no developmental changes in reward-unrelated C-history effects (Fig A in S1 Text ).
(A) In our “credit assignment GLM,” we reframed the 3-choice decision problem as a foraging-style decision between staying and switching away from a currently pursued choice C. For every trial, we considered the chosen option C and analysed whether participants would stay with C on the next trial. We analysed this decision as a function of three sets of regressors: previous local (i.e., contingent) rewards for C (CxR-history), the pure choice history (C-history), and the global reward state (GRS). The right-hand illustration indicates the quantities that encapsulate these three effects: the reward outcome (schematized by a smiley face) contingent on a choice (i.e., the probability of reward given C), the repetition of a choice per se (i.e., the probability of choosing C independent of reward), and the overall recent reward probability irrespective of choice (i.e., the probability of reward independent of C). Panels B, C, D, and E show effect sizes for component parts of this GLM. (B) Considering the effect of the most recent outcome on the tendency to repeat a choice (CxR t ), we found that young adults had a significantly stronger tendency to repeat rewarded choices compared to adolescents. (C) The effect size linearly increased with age. (D) Independent and in addition to local reward learning, the GRS had a negative effect on staying with an option: Participants tended to stick more with a choice if it was encountered in the context of a low overall GRS. Such negative GRS effects were found in both adolescents and young adults with a significant difference between them. This indicates that young adults, even more than adolescents, had the tendency to contextualise rewards by the GRS. (E) This was replicated by a linear decrease of GRS over time. (F) Plot shows residual probability of staying after a win and switching after a loss (i.e., a no-reward) separated by low and high GRS (median split) for adolescents. (G) The same is shown for young adults. Note that in this visualisation, the GRS main effect from panels B and E is expressed as an interaction with WinStay/LoseSwitch strategy in panels F and G. The interaction effect increased for older participants: Participants were even more likely to repeat rewarded choices when encountered in a low GRS (darker bars) and, simultaneously, more likely to switch away from losing choices if encountered in a high GRS (lighter bars). (“x”s indicate individual participants; plots show mean −/+ SEM; solid lines in the right plots indicate the linear trend. Dashed lines represent 95 th % confidence intervals. *p < 0.05). Data for B-G are available in S1 Data (Figure3 tab).
To dissociate local and global reward learning, we used an established general linear model (GLM) approach ( Methods ) originally developed for the study of nonhuman primates [ 7 ]. The analysis captured the temporal dynamics of learning by analysing participants choices in a reference frame of “stay” versus “leave” behaviour. For each trial t, we observed participants’ choice C and quantified their tendency to either stay with or switch away from that choice C on trial t + 1. In this “credit assignment GLM,” we simultaneously accounted for several factors driving choice ( Fig 3A ). This allowed us to discern whether the observed changes in general task performance were driven by specific subcomponents of learning: the previous local rewards that were delivered for choosing C (CxR-history or local reward learning), the pure choice history (C-history) reflecting a tendency to repeat choices irrespective of reward receipt, and, importantly, the GRS, which reflects the overall previous reward history irrespective of choice.
We first assessed developmental differences in broad measures of task performance. We found that overall task performance, as measured by total rewards acquired, increased across age. Young adults earned more total rewards than adolescents (independent samples t test, t 386 = 3.47, p = 0.001; Fig 2A ). This age-dependent difference was confirmed by a linear correlation between total rewards and age between 11 to 35 years (Pearson correlation, R = 0.16, p < 0.001; Fig 2B ). In accord with better overall performance, the frequency with which the highest value option (as defined by value estimates from a Rescorla–Wagner-based reinforcement learning model, see S1 Text was chosen, was significantly higher in young adults compared to adolescents (independent samples t test, t 386 = 7.89, p < 0.001; Fig 2C ) and correlated with age (Pearson correlation, R = 0.29, p < 0.001; Fig 2D ). Follow-up model fits suggest that the relationship of age with total rewards and percentage best choices were best characterised by a quadratic function (Table A in S1 Text ).
Discussion
We investigated the development of component processes of reward learning that have been linked to neighbouring regions of orbitofrontal and anterior insula cortex in studies of nonhuman primates: local reward learning (or “contingent credit assignment”) and noncontingent global reward learning based on the GRS [6–8,14,17,18]. These reward-related brain regions have a particularly protracted maturation profile and continue to change well into adulthood [31,33,34] (Fig E in S1 Text). Therefore, we tested whether cognitive functions that are likely to depend on these regions keep changing during this time period as well. Indeed, we have shown that both local and global reward learning matured across development. We showed that participants’ decision to switch or stay with the current choice was positively influenced by local reward that was received for making a specific choice and negatively influenced by the GRS. These mechanisms increased in their respective influence across adolescent development (study 1; Fig 3). In contrast, we found that reward-guided decision mechanisms linked to more medial frontal lobe regions did not show developmental differences over the same age range (Figs A and B in S1 Text). However, only manipulation experiments such as lesion studies can reveal a causal relationship between a neural substrate and a cognitive process. Therefore, we conducted two lesion studies—one in humans (study 2) and one in macaques (study 3)—that assessed the impact of lesions to broad parts of lateral frontal cortex (likely affecting both the anterior insula and lateral orbitofrontal cortex; see below) to local and global reward learning. The experimental paradigm was a 3-armed bandit task (Fig 1) and was closely matched across studies. We used the same “credit assignment GLM” [7] in all studies to ensure that all three studies measured local and global reward learning in the same way. Both lesion studies showed that lateral prefrontal cortex is indeed causally necessary for intact local and global reward learning in both species (Figs 4 and 5). This suggests that structural changes in lateral parts of prefrontal cortex underlie the developmental changes we observed in behaviour. Strikingly, humans and macaques differed in the way they were guided by the GRS. Humans used the GRS to “contrast” it with the current choice and were likely to switch away from a choice if the GRS was high [51,55,61,62]; macaques showed “spread of effect” [6,11,15] and were more likely to stay with their choices if the GRS was high. Lesions to lateral frontal cortex altered the GRS effect in both species. However, while it abolished the negative GRS effect in humans, it increased the positive GRS effect in macaques (Fig 6).
These results suggest that over development humans are increasingly influenced by local and global reward states in their decision to switch or stay with their current choice (Fig 3). The increased influence over development of the local reward learning mechanism is particularly interesting in the context of its proposed evolutionary adaptive role in reducing costly errors in uncertain and changeable environments, compared to competing striatal-based reinforcement-learning systems [63]. Compatible with previous work, which has shown reduced contingency learning abilities in young children [64], and impaired updating of stimulus–reward associations from probabilistic feedback [65], here, we demonstrate that these mechanisms continue to develop into early adulthood. Critically, we also observed differences in how humans at different ages use the GRS to contrast new rewards with the baseline level of rewards encountered in the past. More broadly though, such a process can support adaptive choice switching and exploration [52,53]. Consistent with this idea and highlighting the utility of a negative influence of the GRS, we found that participants that are strongly influenced by the GRS are also more influenced by local reward learning (Fig C in S1 Text) and perform better, further suggesting complementary neural substrates that contribute to cognitive and behavioural flexibility. These findings may provide additional avenues towards understanding developmental changes in attitudes towards exploration, risk, and uncertainty from a mechanistic perspective [66–68] that have previously been interpreted as differences in feedback monitoring, inhibitory and cognitive control, and risk-taking. Indeed, this may help explain mixed developmental findings in which some studies report increases in risk tolerance between adolescents and adults, while others find no differences [69–73]. For instance, our results indicate that adolescents may display stronger persistence with unrewarded options in cases when the GRS is high. By contrast, young adults may more readily switch away from an unrewarded choice as the high GRS discourages exploring new choice options and incentivizes switching back to previously rewarded options. This weaker reliance on the GRS in adolescents may translate into increased persistence with bad choice options, risk-taking, and may help explain adolescents’ greater tolerance of uncertainty [71,74–76]. However, note that learning processes in adolescents, compared to older people, differ in style and not only in terms of optimality [42,77,78]. For example, there is a shift from model-free mechanisms to model-based and counterfactual learning strategies [45,79] across adolescence. Importantly, global reward learning differs from model-based learning mechanisms [80,81] in that no knowledge about state relationships is needed and its anatomical substrates appear distinctly tied to anterior insula [7,82]. However, in a similar manner to the shift towards model-based strategies [80], the benefits of negative GRS effects, just as the ones of increased local reward learning in our older participants, might turn out to be adaptive only in environments where exploration is relatively discouraged. In such instances, choices should be directed towards options with high values at the expense of sampling more uncertain options that nonetheless might prove more beneficial in the long run [23,83,84]. Indeed, the GRS may be dynamic and dependent on the structure of the reward environment. In the current experiment, reward schedules across all three studies were probabilistic and variable. In more blocked designs, the GRS may be less informative than in quickly changing environments, and so be less influential on the current choice.
Our results also contribute to the debate about the development of reinforcement learning [9,44,85]. Studies indicate that overall, the learning rate, i.e., the speed of updating the value of a choice, increases during adolescence [41–43,43–46]. We find the same in our study 1 (Fig B in S1 Text). Indeed, the increase in local reward learning in our “credit assignment GLM” could be interpreted along similar lines—as an increase of the weight that an outcome has on changing an option’s value. The strong positive correlation between the learning rate from our reinforcement learning model and local reward learning effect size is a further indication of this. However, studies have begun to examine increases in reward learning rate in more detail, and, as highlighted above, the particular task context plays a big role in whether increased learning rates are observed and if they are desirable to optimise rewards [42]. Another consideration is that learning from outcomes might differ depending on whether that outcome is positive or negative, although these effects, again, appear context-dependent [43,45,86,87]. Our findings that the GRS exerts an increasingly negative effect during development adds to these ideas and highlights influences on reward learning that go beyond changes in a unitary reward learning rate. GRS effects were unrelated to a simple reward learning rate in previous work [7] and also in our current data set. Instead, they contextualise a current choice based on the global reward environment. This mechanism can add to the changes in value observed for a choice and can, in effect, lead to different effective learning rates for positive and negative outcomes [7,52,53]. The negative GRS effects observed here predict higher learning rates for positive outcomes if the GRS is low, and higher learning rates for negative outcomes if the GRS is high. A promising avenue for future research would be to follow up on these predictions and conduct a more formal modelling analysis of the developmental GRS effects. It might help explain diverging results by suggesting that analyses of reward learning rates should take into account the global reward levels present in the experiments. Neurally, our results suggest that subregions within lateral frontal cortex, specifically anterior insula and orbitofrontal cortex, are particularly promising target regions to look for neural correlates of the development of reinforcement learning mechanisms. These subregions have been shown to integrate rewards with different time constants in adults [7,61,82], and, in adolescents, higher learning rates for negative outcomes are linked to greater activity in the anterior insula [43].
Our hypothesis that local and global reward learning would increase during adolescence was very much guided by studies of human brain maturation, suggesting that lateral parts of prefrontal cortex mature later than medial ones [31–33]. Indeed, a selective analysis of complementary HCP imaging data indicated that lateral area 47/12o, as well as the anterior insula, showed a longer developmental maturation profile compared to the medial orbitofrontal/ventromedial prefrontal cortex, anterior cingulate cortex, and amygdala. These regions showed a significant decrease in grey matter across adolescence that continued well into young adulthood (Fig E in S1 Text). Lateral orbitofrontal cortex and the anterior insula cortex are strong potential candidates to underlie the behavioural differences in local and global reward learning seen across the same period of development. While this conjecture is indirect, it is well known that localised regional grey matter volume correlates with motor, cognitive, and social skills [18,27,88,89]. Indeed, we recently demonstrated in macaques that grey matter around the principal sulcus is causally altered by extended training in discrimination reversal learning, with grey matter variation in this region related to individual variation in training speed [18]. Longitudinal studies examining grey matter maturation and the development of reward learning in the same sample are needed to provide more direct evidence about this link between cognitive and neural development.
However, even longitudinal studies are usually correlative and as such can only provide limited evidence about causal relationships. Study 2 and study 3 therefore used finely matched experimental paradigms and analysis techniques to directly assess the causal importance of lateral frontal regions in local and global reward learning. In humans, we show that the GRS effect is reduced (i.e., closer to zero) after lateral prefrontal lesions, compared to medial orbitofrontal lesions, in combination with decreased local reward learning (Fig 4). Note that the direction of change in both local and global reward learning after lesions is highly compatible with the correlation between those two variables in the sample from study 1 (Fig C in S1 Text). This suggests that both learning processes are at least partly supported by neural mechanisms in lateral frontal cortex. In study 3, we confirm in macaque monkeys that the GRS effect is altered after lateral lesions (Fig 5). However, the macaque lesion effects appear qualitatively different. First, as we know from the same data in past work [6,14], local reward learning is reduced after lateral lesions. This means that both humans and macaques show a decrease of contingent/local reward learning after broad lesions to lateral prefrontal cortex. However, rather than decreasing the negative GRS effect as in humans, lateral lesions increased the positive GRS effect in macaques (Fig 6). One interpretation could argue that the GRS effect, together with the decline of local reward learning, reflects a task-adaptive process in humans that matures during adolescence and is compromised by lesions. By contrast, in monkeys, the GRS effect could reflect a suboptimal “spread” of reward, which is even increased by lateral frontal lobe lesions. However, an alternative account could argue that the overall positive shift in GRS influence on choice after lesions in both humans and monkeys reflects a general role for the lateral frontal cortex in contextualising the GRS to avoid or suppress the influence of spread of effect mechanisms. In humans, this suppression is strong enough to produce a negative GRS effect, but it is less influential in macaque choices.
In all analysed macaque data sets here and consistent with previous work on the GRS [7] and credit assignment [6,17], in macaques, the effect of the GRS on choice was positive. This contrasted with the negative GRS effect in human participants (study 1 and study 2). This potential species difference is striking, particularly considering that we used a variant of a widely used probabilistic learning task that was matched across studies. However, species comparisons are inherently difficult to interpret. For example, despite the matched tasks, clear differences persisted in the way subjects were introduced to the study (verbal instructions versus weeks of training) and the setting in which the experiments were conducted. Nevertheless, one interpretation of the observed GRS differences is that human behaviour was more in line with ideas from optimal foraging theory, which suggest that a value’s choice should be contrasted with the background reward rate of the environment [50,51,55,61,90]. This can promote optimal choice switching and exploration [52,53,55]. However, this view assumes that participants treated the trials in the experiment as discrete, unrelated instances. In contrast, the positive GRS effect in macaques might suggest that nonhuman primates do not perceive the task as a series of discrete and unrelated trials. Instead, they might expect intertrial contingencies. For example, macaque might assume that an action on trial n has an influence on the outcome that is received on trial n+1 (which is not the case; only the action on trial n+1 determines the outcome of trial n+1). A positive GRS effect indicates that, in line with these considerations, reward on trial n can increase the unrelated choice that is made one trial later, on trial n+1. Such positive GRS effects are therefore not optimal for this task. However, it can be beneficial in environments that do have such dependencies across trials. Often natural environments are structured in multistep action sequences [91], and in such a setting, positive GRS effects might be adaptive.
However, it is also important to acknowledge that our lesion results are spatially limited in the precision with which they can pinpoint the functional roles of the anterior insula as the lesion in both species are either relatively unspecific (in the case of human patients) or as a likely result of disconnected fibres of passage (in the macaques). Despite this, there are several reasons to believe the GRS effects localise to the anterior insula. First, macaque anterior agranular insula BOLD signals encode the GRS strongly and bilaterally [7,82], and human anterior insula also carries similar reward signals [61]. Furthermore, it is the bilateral agranular insula that undergoes the most profound grey matter volume changes when training macaques in reversal learning tasks such as ours [18]. Therefore, anterior insula cortex and lateral orbitofrontal cortex are likely to harbour complementary reward learning computations that jointly mature during adolescence as they gain influence over learning and choice.
Lateral prefrontal regions, beyond orbitofrontal cortex, are also all late to mature across adolescence [33,92]. The more dorsolateral prefrontal regions are associated with intelligence, fluid cognition, working memory, and attentional control [93–95]. A critical direction for future work will be to examine the interactions between these developing cognitions, brain network dynamics, and the learning mechanisms described here. Recent advances in network neuroscience offer exciting methods to characterise individual differences in complex cognitions as a function of local and global brain network topology and community structure [96–98]. Characterising these interactions could ultimately improve predictions of transdiagnostic features of neurodevelopmental and behavioural trajectories.
In summary, our multimodal approach suggests that lateral frontal cortex is a particularly dynamic locus of neural maturation driving cognitive changes in both local and global reward learning during adolescence and into young adulthood. Evidence of heterogeneity across the developmental profiles of the reward-guided component processes and the underlying neural network highlight the importance of understanding and quantifying the development of the whole prefrontal cortex at a functionally meaningful resolution. Future longitudinal studies should examine multimodal changes in lateral orbitofrontal and anterior insula cortex and the respective parallel changes in the adaptive influence of local and global reward learning. Understanding how and why reward learning mechanisms develop across adolescence could not only begin to explain the frustrations of parents and carers of teenagers who perpetually remind adolescents to consider the consequences of their choices, but also impact their ability to adaptively learn from feedback in social, health, and educational contexts.
[END]
---
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002010
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/