(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Neural responses in macaque prefrontal cortex are linked to strategic exploration [1]

['Caroline I. Jahn', 'Wellcome Centre For Integrative Neuroimaging', 'Department Of Experimental Psychology', 'University Of Oxford', 'Oxford', 'United Kingdom', 'Motivation', 'Brain', 'Behavior Team', 'Institut Du Cerveau Et De La Moelle Epinière']

Date: 2023-02

Humans have been shown to strategically explore. They can identify situations in which gathering information about distant and uncertain options is beneficial for the future. Because primates rely on scarce resources when they forage, they are also thought to strategically explore, but whether they use the same strategies as humans and the neural bases of strategic exploration in monkeys are largely unknown. We designed a sequential choice task to investigate whether monkeys mobilize strategic exploration based on whether that information can improve subsequent choice, but also to ask the novel question about whether monkeys adjust their exploratory choices based on the contingency between choice and information, by sometimes providing the counterfactual feedback about the unchosen option. We show that monkeys decreased their reliance on expected value when exploration could be beneficial, but this was not mediated by changes in the effect of uncertainty on choices. We found strategic exploratory signals in anterior and mid-cingulate cortex (ACC/MCC) and dorsolateral prefrontal cortex (dlPFC). This network was most active when a low value option was chosen, which suggests a role in counteracting expected value signals, when exploration away from value should to be considered. Such strategic exploration was abolished when the counterfactual feedback was available. Learning from counterfactual outcome was associated with the recruitment of a different circuit centered on the medial orbitofrontal cortex (OFC), where we showed that monkeys represent chosen and unchosen reward prediction errors. Overall, our study shows how ACC/MCC-dlPFC and OFC circuits together could support exploitation of available information to the fullest and drive behavior towards finding more information through exploration when it is beneficial.

Funding: This research was supported by the Université Paris Descartes (doctoral and mobility grants to CIJ), the Medical Research Council UK (MR/K501256/1 and MR/N013468/1 to JG), St John’s College, Oxford (JG), the Wellcome Trust (096587/Z/11/Z to SC, 090051/Z/09/Z and 202831/Z/16/Z to MEW, WT1005651MA to JS and the Wellcome Centre for Integrative Neuroimaging: 203139/Z/16/Z), the BBSRC (AFL Fellowship: BB/R01803/1 to NK), as well as the LabEx CORTEX of the Université de Lyon (ANR-11-LABX-0042 to JS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

We found that rhesus monkeys engaged in strategic exploration by decreasing their reliance on expected values (random exploration) when it was useful for the future (long horizon) and that active sampling was the only way to obtain information (partial feedback). Neurally, we found prefrontal strategic exploration signals in the anterior and mid-cingulate cortex (ACC/MCC) and dorsolateral prefrontal cortex (dlPFC). However, we did not find a significant modulation by the horizon and feedback type of the effect of uncertainty (directed exploration) on choices. When making choices in a sequence (long horizon), we found evidence that macaques used counterfactual feedback to guide their choices. Complementing this activity at the time of decision, in the complete feedback condition, we found overlapping chosen and unchosen outcome prediction error signals in the orbitofrontal cortex (OFC), at the time of receiving the outcome. The counterfactual prediction errors in the OFC are particularly interesting as they point to the neural system that allowed the macaques to forgo having to make exploratory choices in the complete condition, which could also change how the MCC-dlPFC network represented the value of the chosen option.

As in Wilson and colleagues’ original study, we manipulated whether the information could be used for future choices by changing the choice horizon [ 12 ]. By comparing exploration in both conditions, we could test whether the animals reduced their reliance on value estimates (random exploration) and increased their preference for more uncertain options (directed exploration) when gathering information was useful for future choices in the long horizon [ 12 ]. In addition, we manipulated the contingency between the choice and the information by varying the type of feedback that monkeys received. In the complete feedback condition, information was freely available, and we could probe whether monkeys decreased their exploration compared to the classic partial feedback condition. In humans, providing the complete feedback decreases decision noise [ 19 ] and improves learning [ 20 – 23 ], both of which are consistent with reduced exploration. A strategic explorer would only actively explore—and forgo immediate rewards—when it is useful for the future (long horizon) and that it is the only way to obtain information (partial feedback). In addition to behavioral data, neural data were collected using fMRI to probe the neural substrates of strategic exploration. Our analysis was focused on regions previously identified in fMRI studies on reward valuation and cognitive control in monkeys [ 24 – 29 ]. Finally, we took advantage of the different feedback conditions to explore how monkeys update their expectations based on new information. Specifically, we investigated the behavioral and neural consequences of feedback about the outcome of their choice and—in the complete feedback condition—on counterfactual feedback from the alternative.

Humans have been shown to strategically explore [ 12 – 15 ], but there is little evidence in other species. Inspired by Wilson and colleagues’ “horizon” exploration task [ 12 ], we developed a task to investigate whether monkeys mobilize strategic exploration based on whether that information can improve subsequent choice. Importantly, non-human primate models provide insights into the evolutionary history of cognitive abilities, and of the neuronal architecture supporting them [ 16 ]. Given the rhesus monkeys’ ecology (including feeding), they should also be able to use strategic exploration, but the extent to which they can mobilize strategic exploration might be different from that of humans. Based on the similarities in circuits supporting cognitive control and decision-making processes in humans and macaques [ 17 , 18 ], one could further hypothesize that the same neurocognitive processes (the same computational model) might be recruited but not to the same extent (different weights).

In many species, most behaviors, including foraging, can be accounted for by simple behaviors—approach/avoidance of an observed and immediately available source of food—that require no mental representations. Exploration is, by definition, a non-value maximizing strategy [ 1 , 2 ], so in those models, exploration is often reduced to a random process, where noise in behavior can lead animals to change behavior by chance [ 1 , 3 – 6 ]. However, in species relying upon spatially and temporally scattered resources, such as fruits, exploration is thought to be aimed at gathering information about the environment in order to form a mental representation of the world. Work in monkeys and humans has shown that primates are sensitive to the novelty of an option when deciding to explore [ 7 – 9 ]. They sample novel options until they have formed a representation of their relative values compared to the available options. This work clearly showed that monkeys have a representation of the uncertainty and actively explore to reduce it. Similar results had been shown in humans, whose exploration is driven by the uncertainty about the options [ 10 , 11 ]. However, it is still unknown whether monkeys have a specific representation of potential future action and outcomes that enables them to organize their behavior over longer time or spatial scale. We design a novel paradigm in monkeys—based on work in humans—to assess whether on how monkeys engage in strategic exploration, which is exploring only when it is useful for the future. Strategic exploration enables an animal to adapt to a specific context and is essential to maximize rewards over a longer time and spatial scale. For frugivorous animals such as primates, it might be critical for survival.

Results

Probing strategic exploration in monkeys Three monkeys performed a sequential choice task inspired by Wilson and colleagues [12]. In this paradigm called the horizon task, monkeys were presented with one choice (short horizon) or a sequence of four choices (long horizon) between two options (Fig 1A). Each option belonged to one side of the screen and had a corresponding touch pad located under the screen (see Materials and methods for details). Both types of choice sequence (long and short horizon) started with an “observation phase” during which monkeys saw four pieces of information randomly drawn from both options and reflecting outcome distribution of each option. They received at least one piece of information per option (Fig 1B). Each piece of information was presented exactly like subsequent choice outcomes as a bar length (equivalent of 0 to 10 drops of juice) drawn from each option’s outcome distribution. The animals had been trained that the length of the orange bar on a yellow background indicated the number of drops of juice associated with that specific option on a given trial (Fig 1B). One option was associated with a larger reward (more drops of juice) on average than the other. The means of the distributions were fixed within a sequence but unknown to the monkey. Monkeys only received the reward associated with the option they chose at the end of each choice. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Task and model. (A) During the task, we manipulated whether the information could be used in the future by including both long and short horizon sequences. In both trial types, monkeys initially received four samples (“observations”) from the unknown underlying reward distributions. In short horizon trials, they then made a one-off decision between the two options presented on screen (“choice”). In long horizon trials, they could make four consecutive choices between the two options (fixed reward distributions). On the first choice (highlighted), the information content was equivalent between short and long horizon trials (same number of observations), whereas the information context was different (learning and updating is only beneficial in the long horizon trials). (B) Example short and long horizon trials. The monkeys first received some information about the reward distributions associated with choosing the left and right option. The length of the orange bar indicates the number of drops of juice they could have received (0–10 drops). The horizon length of the trial is indicated by the size of the grey area below the four initial samples. The monkeys then make one (short horizon) or four (long horizon) subsequent choices. As monkeys progressed through the four choices, more information about the distributions was revealed. Displayed here is a partial information trial where only information about the chosen option is revealed. (C) Ideal model observer for the options of the example trial shown in B (color code corresponds to the side of the option). The distributions correspond to the probabilities to observe the next outcome for each option. The expected value corresponds to the peak of the distribution and the uncertainty to the variance. Thick lines correspond to post-outcome estimate and thin lines to pre-outcome estimates (from the previous trial). (D) We also modulated the contingency between choice and information by including different feedback conditions. In the partial feedback condition, monkeys only receive feedback for the chosen option. In contrast, in the complete feedback condition, they receive feedback about both options after active choices (not in the observation phase). (E) Example partial and complete feedback trials (both short horizon). Here, the observation phase shown in (B) is broken up into the components the monkeys see on screen during the experiment. Initially, the samples were displayed on screen, but a red circle in the center indicates that the monkeys could not yet respond. After a delay, the circle disappears, and the monkeys could choose an option. After they responded, the chosen side was highlighted (red outline). After another delay, the outcome was revealed. In the partial feedback condition (top), only the outcome for the chosen option was revealed. In contrast, in the complete feedback condition (bottom), both outcomes were revealed. After another delay, the reward for the chosen option was delivered in both conditions. https://doi.org/10.1371/journal.pbio.3001985.g001 First, we manipulated whether the information gathered during the first choice could be useful in the future. During a session, we varied the number of times monkeys could choose between the options (horizon length). The horizon length was visually cued (Fig 1A and 1B). On short horizon trials, the information provided by the outcome of the choice could only be used for the current choice and was then worthless going forward. On long horizon trials, it could be used to guide a sequence of four choices. Second, we manipulated the contingency between choice and information by varying the type of feedback monkeys received after their active choices (the observation phase was identical for partial and complete feedback conditions). In the partial feedback condition, they only saw the outcome for the option they chose. In the complete feedback condition, they saw the outcome of both the option they chose and the alternative option (Fig 1D and 1E). In the latter case, the information about the options could be learned from the counterfactual outcomes—the outcome that would have been obtained had a different choice been made. This type of feedback is sometimes referred to as “hypothetical” [30] or “fictive” feedback [31]. The feedback condition was not cued but was fixed during a session. To assess monkeys’ sensitivity to the expected value and the uncertainty about the options, we set up an ideal observer Bayesian model (see Materials and methods for model details), which estimates the probability of observing the next outcome given the current information (Fig 1C). This model uses only the visual information available on the screen to infer the true underlying mean value of each options but does not use the horizon nor the feedback type as those were irrelevant for this inference. We extracted the expected value (peak of the probability distribution of the next observation, i.e., most likely next outcome) and the uncertainty (variance) of the options from the model to evaluate monkeys’ sensitivity to these variables. If monkeys did not engage in strategic exploration, the effect of expected value should be unaffected by the manipulations of horizon and feedback as was the case for the model.

The horizon length and the type of feedback modulate monkeys’ exploration We first focused our analysis on the first choice of the trial, as the information about the reward probability of two options was identical across horizons and feedback conditions, such that choices should only be affected by the contextual manipulations (horizon and feedback type). If monkeys were sensitive to whether the information could be used in the future, they would explore more in the long compared to the short horizon. This is because information obtained early in a trial can only beneficial for subsequent choices in long horizon trials. Moreover, exploration should only occur when obtaining information is instrumentally dependent upon it, i.e., in the partial feedback condition (Fig 2A). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. First choice. (A) In our experimental design, on the first choice of a horizon, directed exploration is only sensible in long horizon trials in the partial feedback condition. This is because in short horizon trials, the information gained by exploring is of no use for subsequent choices, so a rational decision-maker would only choose based on the expected value of the options. Moreover, in the complete feedback condition, all information is obtained regardless of which option is chosen, so an ideal observer would again always choose the option with the highest expected value. (B) The proportion of trials in which the monkeys chose the option with the higher expected value is above chance level (0.5) across both feedback conditions and horizons. Mean across sessions (partial feedback: 41 sessions, complete feedback: 40 sessions). (C) Monkeys’ choices are sensitive to nuanced differences in expected value. Mean across all sessions (81 sessions). (D) According to the logistic regression model predicting monkeys’ first choices in a horizon (see main text and methods for details), monkeys’ first choices are less driven by expected value in the partial than in the complete feedback condition. Within the partial feedback condition, they are less driven by expected value in long than in short horizon trials. No such difference was found in the complete feedback condition. This is evidence that monkeys deliberately modulate their exploration behavior to explore more on partial feedback long horizon trials, where exploration is sensible (see (A)). Error bars indicate standard error to the mean in B and C and standard deviation in D. Data and code to reproduce the figure can be found at https://doi.org/10.5281/zenodo.7464572. https://doi.org/10.1371/journal.pbio.3001985.g002 We first ensured that monkeys’ choices were influenced by the expected value computed by the Bayesian model. We looked at the accuracy (defined as choosing the option with the highest expected value according to the model) during the first choice. For the two horizon lengths and in both feedback conditions, accuracy was above chance level (t test compared to a distribution with a mean at 0.5; partial feedback short horizon: t (40) = 10, p < 0.001, partial feedback long horizon: t (40) = 8.930, p < 0.001, complete feedback short horizon: t (39) = 7.9, p < 0.001, complete feedback long horizon: t (39) = 8.963, p < 0.001, Fig 2B). Therefore, monkeys used the information provided by the informative observations on each trial to guide their choices. Monkeys also adjusted their choices to variations in expected value, as can be seen when pooling together both feedback conditions and horizon lengths (Fig 2C; see statistical significance in Fig 2D). Although choices were guided by the expected value of the options above chance level, monkeys still sometimes chose the less valuable option in both conditions and horizons (Fig 2B and 2C). We examined whether monkeys were less driven by expected value on partial feedback long horizon trials, as exploration is only a relevant strategy on these trials (Fig 2A). To test this hypothesis, we ran a single logistic regression predicting responses during first choices in the partial and complete conditions with the following regressors: the expected value according to our Bayesian model, the uncertainty according to our Bayesian model, the horizon (short/long), and the interactions of expected value and uncertainty with horizon. In the same model, we added two potential biases, a side bias and tendency to repeat the same action. We fitted regressors to vary by condition (partial or complete feedback) and by monkey, and modelled sessions as random effects for each monkey, with all regressors included as random slopes. We confirmed that in both feedback conditions, monkeys tended to choose the option with the highest expected value (p < 0.000001 in the partial condition and p < 0.000001 in the complete; one-sided test, based on sample drawn from Bayesian posterior; see Materials and methods). We identified that monkeys relied more on the difference in expected value in the complete than in the partial feedback condition (p = 0.0024; one-sided test), and in short horizon than in the long horizon in the partial condition only (p = 0.0163 in the partial condition and p = 0.6598 in the complete; one-sided test). Thus, animals engaged in strategic exploration by reducing their reliance on expected value. In other words, animals strategically modulated the degree to which they used random exploration both depending on the horizon length and feedback type (S3A Fig). We next looked at the effect of uncertainty. Exploratory behaviors should be sensitive to how much they can reduce uncertain, i.e., the animals should optimally pick the most uncertain option when they explore [12]. We found that monkeys were sensitive to uncertainty overall, avoiding options that were more uncertain in the partial and the complete feedback conditions (p = 0.0081 in the partial condition and p = 0.00025 in the complete; one-sided test) (see S1 Fig for full model fit and the posteriors for each individual subject). This risk aversion was driven by the difference in number of information presented as when we restricted our analysis to the trials where they received 2 information about each option, monkeys showed a small preference for the more uncertain option (p = 0.077 in the partial condition and p = 0.066 in the complete; p = 0.02 when combined; one-sided test) (see S2 Fig for full model fit and the posteriors for each individual subject). However, we found no statistically reliable difference in the sensitivity to the uncertainty across the experimental conditions. We also ran a second model that used the number of available information (which is mathematically equivalent to the model used by Wilson and colleagues [12]) rather than uncertainty and found identical results, both for the effects on expected value and the absence of an effect on the sensitivity to the number of available information (S3B and S4 Figs for full model fit and the posteriors for each individual subject). Therefore, uncertainty did not play a key role in strategic exploration in our task. This indicates that our macaques did not use directed exploration to strategically guide their choices (S3 Fig). Finally, we checked whether the decision variables were stable over trials and across sessions. First, in the above regression model, we added interaction terms with the trial number. We found no significant interaction with our regressors of interest (expected value, expected value interaction with horizon, uncertainty, and uncertainty interaction with horizon). Second, we fitted each session separately and looked for a linear trend in the session number. We found no linear nor clear trend for the regressors over sessions. Overall, we found no evidence that monkeys’ decision variables changed throughout the recording.

Monkeys learn from chosen and counterfactual feedbacks We next assessed whether monkeys used the information they collected during their previous choices to update their choice, and how the nature of the feedback affected this process. To this end, we focused our analysis on choices from long horizon trials. On such trials, monkeys’ accuracy (defined as choosing the option with the highest expected value according to the model) was always above chance level (t test compared against a mean of 0.5; all p < 10−10) and increased as they progressed through the sequence (t test compared against a mean of 0 of the distribution regression coefficients of the trial number onto the accuracy (both z-scored) for each session; partial feedback condition: t (40) = 11.3653, p < 0.001, complete feedback condition: t (39) = 5.6590, p < 0.0001) (Fig 3A). We inferred that this improvement was due to the use of the information collected during the choices. To examine this, we isolated the change in expected value compared to the initial “observation phase” (see Materials and methods). We found that monkeys were sensitive to the change in expected value both for the chosen option (in the partial and complete feedback conditions) and the unchosen option (counterfactual feedback in the complete feedback condition only) (Fig 3B and 3C; see statistical significance in Fig 3E). Monkeys displayed a significant tendency to choose the same option (t test compared against a mean of 0.5; all p < 10−6), which sharply increased after the first trial (paired t test between the first choice and the subsequent choices; all p < 10−10) and kept increasing after the first choice (t test compared to a distribution with a mean at 0 of the distribution regression coefficients of the trial number onto the probability to choose the same option (both z-scored) for each session; partial feedback condition: t (40) = 5.3026, p < 0.001, complete feedback condition: t (39) = 3.1265, p = 0.0033) (Fig 3D). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Behavioral update. (A) As monkeys progressed through the long horizon, they were more likely to choose the option with the higher expect reward in both the partial and complete feedback condition. Mean across sessions (partial feedback: 41 sessions, complete feedback: 40 sessions). (B) Monkeys were sensitive to changes in the expected value compared to the baseline expected value they experienced during the observation phase both for the chosen option (mean across all sessions (81 sessions)) and (C) the unchosen option (mean across all complete feedback sessions (40 sessions)). (D) Monkeys were also more likely to repeat their choice as they progressed through the long horizon. Mean across sessions (partial feedback: 41 sessions, complete feedback: 40 sessions). (E) Results of the single logistic regression model predicting second, third, and fourth choices in the long horizon. In both the partial and complete feedback, monkeys were sensitive to the expected value at observation but more so in the complete than the partial feedback condition (left). Monkeys tended to repeat previous choices in both conditions but more so in the partial than in the complete feedback condition (center left). In both conditions, monkeys were sensitive to the change in expected value compared to the observation phase with no significant difference between conditions (center right). In the complete feedback condition, monkeys were also sensitive to the change compared to baseline of the additional information they received. Error bars represent standard error to the mean in A-D and standard deviation in E. *p < 0.05, **p < 0.01, and ***p < 0.001. Data and code to reproduce the figure can be found at https://doi.org/10.5281/zenodo.7464572. https://doi.org/10.1371/journal.pbio.3001985.g003 We investigated the determinants of these effects by performing a single logistic regression for all non-first choices with the following regressors: the expected value and uncertainty during the observation phase (which served as a baseline for subsequent choices), the change in these baselines as new information was revealed as they progressed through the horizon. We also added in the same model three potential biases in choices: a side bias, the tendency to repeat the same action, and a bias for choosing the option most often chosen (see S5 Fig for full model fit and the posteriors for each individual subject). Just as with the previous regression model for first choices, we again allowed regressors to vary by condition and monkey and modelled sessions as random effects. We confirmed that monkey remained sensitive to the difference in expected value during the observation phase and that guided the first choice (p < 0.000001 in the partial condition and p < 0.000001 in the complete; one-sided test). Consistent with the choice behavior on the first choice, monkeys relied more on this difference in the complete than in the partial feedback condition in subsequent choices (p = 0.0192, one-sided test; Fig 3E). Monkeys were biased towards repeating the same choice (p < 0.000001 in the partial condition and p < 0.000001 in the complete; one-sided test), but this bias was also more pronounced in the partial feedback condition (p = 0.0018, one-sided test; Fig 3E) as can already be seen in Fig 3B. Monkeys also preferred to choose the option most chosen (p < 0.000001 in the partial condition and p < 0.000001 in the complete; one-sided test), which explained the increase in repetition bias over time, but this was not affected by the feedback type (partial > complete: p = 0.309) (S5 Fig). Monkeys were sensitive to the change in expected value when the information was related to the chosen option (p < 0.000001 in the partial condition and p < 0.000001 in the complete; one-sided test), with no statistical difference between the partial and complete feedback conditions (partial > complete: p = 0.6913). Finally, in the complete feedback condition, monkeys were sensitive to the change in expected values obtained from the counterfactual feedback (p < 0.000001; one-sided test; Fig 3E). Overall, we found that on top of being more sensitive to the expected value difference during the initial evaluation, monkeys were less likely to be biased towards repeating the same action when they had counterfactual feedback to further guide their choices in the complete feedback condition. They were able to learn about the options, using both the chosen and the counterfactual feedback when it was available.

Strategic exploration signals in ACC/MCC and dlPFC To identify brain areas associated with strategic exploration, we ran a two-level multiple regression analysis using a general linear model (GLM). For each individual session, we used a fixed-effects model. To combine sessions and monkeys, we used random effects as implemented in the FMRIB’s Local Analysis of Mixed Effects (FLAME) 1 + 2 procedure from the FMRIB Software Library (FSL). We focused our analysis on regions previously identified in fMRI studies on reward valuation and cognitive control in monkeys [24–29]. Thus, to only look at the regions we were interested in and to increase the statistical power of our analysis, we only analyzed data in a volume of interest (VOI) covering frontal cortex and striatum (previously used by Grohn and colleagues [29]). We used data from 75 (41 partial feedback and 34 complete feedback) of the 81 (41 partial feedback and 40 complete feedback) sessions we had acquired (fMRI data from 6 sessions were corrupted and unrecoverable). Details of all regressors included in the model can be found in the Materials and methods section. In addition to the analysis in the VOI, we examined the activity in the functionally and anatomically defined regions of interest (ROIs). These ROIs were not chosen a priori but were selected based on the activity in the VOI. The goal of these analyses was either (i) to examine the effect of a different variable than the one used to define the ROI in our VOI, which is an independent test so we could look for statistical significance of this different variable on the activity in the ROI, or (ii) to illustrate an effect revealed in the VOI, which is not an independent test, so we did not do any statistical analysis. To examine how monkeys use initial information displayed during the observation phase of the task differently depending on the horizon and the feedback condition, we examined the brain activity when the stimuli were presented on the first choice (“wait” period; Fig 1D). Crucially, there was no difference in the visual inputs between the partial and the complete feedback condition, as the nature of the feedback was not cued and fixed for blocks of sessions and monkeys only received the counterfactual feedback after an active choice (not in the observation phase). We first investigated the main effects of our two manipulations: the overall effect of the horizon and feedback type on brain activity. We combined all sessions and looked for evidence of different activations in the long and short horizon. We found a significantly greater activity for the long horizon in 3 clusters (cluster p < 0.05, cluster forming threshold of z > 2.3; Fig 4A, see S1 Table for coordinates of cluster peaks). One cluster was centered on the pregenual anterior cingulate cortex (pgACC), and the striatum and two clusters of activities were centered on the dlPFC and extended in the lateral orbitofrontal cortex (lOFC, area 47/12o; see Materials and methods for more details about OFC subdivisions) with one on each hemisphere. In an independent test, we placed ROIs by calculating the functional and anatomical overlap for each Brodmann area 24, 46, and 47/12o and extracted the t-statistics of the regressor to examine the effect the contingency between choice and information (feedback condition). We observed no effect of the feedback type in ACC (p = 0.19) and lOFC (p = 0.53), but we found a main effect of feedback type in the dlPFC (two-way ANOVA, F(144, 147) = 4.86, p = 0.029) and no interaction anywhere (ACC: p = 0.29, dlPFC: p = 0.9 and lOFC: p = 0.78). This revealed that a subpart of the pgACC and the lOFC were sensitive to the horizon length, while the dlPFC showed an additive sensitivity to the horizon length and the feedback type, such that it was most activated in the long horizon and partial feedback, when exploration is beneficial. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. First choice neural results. (A) When combining partial and complete feedback sessions, we found clusters for a differential in activity in long horizon than short horizon in the pgACC, the dlPFC, and the lateral OFC. Cluster p < 0.05, cluster forming threshold of z > 2.3. (B) We placed ROIs (in yellow) in the overlap of the functional cluster and anatomical region and extracted t-statistics for the difference between long horizon and short horizon. Mean across sessions (partial feedback: 40 sessions, complete feedback 34 sessions). (C) We looked for differences in how the contingency between choice and information (complete vs. partial feedback) modulates the initial information that was presented before first choices. Within our VOI, we found clusters of activity in MCC both for the main effect of feedback type and a greater sensitivity to expected value in the complete feedback condition. We also found a cluster of activity in dlPFC for a greater sensitivity to expected value in the complete feedback condition. (D) We placed an ROI (in yellow) in the part of MCC that is activated by the main effect of feedback type and extracted the t-statistics of the regressor for every session. We found that the effect we observe in the VOI is driven by increased activity in the complete feedback condition, whereas there is no activity in the partial feedback condition. Mean across sessions (partial feedback: 40 sessions, complete feedback 34 sessions). (E) We also placed ROIs (in yellow) in the parts of MCC and dlPFC where we found significant clusters in the VOI for the interaction of feedback type and expected value and extracted the t-statistics for the expected value regressor of every session. Plotting these regressors separately for feedback type reveals that both MCC and dlPFC were more active when an option with high expected value was chosen in the complete feedback condition, whereas they were more active when an option with low expected value was chosen in the partial feedback condition. Mean across sessions (partial feedback: 40 sessions, complete feedback 34 sessions). Error bars represent standard error to the mean. *p < 0.05. Data and code to reproduce the figure can be found at https://doi.org/10.5281/zenodo.7464572. dlPFC, dorsolateral prefrontal cortex; MCC, anterior and mid-cingulate cortex; OFC, orbitofrontal cortex; pgACC, pregenual anterior cingulate cortex; ROI, region of interest; VOI, volume of interest. https://doi.org/10.1371/journal.pbio.3001985.g004 We next examined the effect of the feedback in our VOI. We found one cluster around the MCC that was significantly modulated by the difference between the activity during the complete and partial feedback conditions during stimuli presentation on the first choice (Fig 4C, yellow contrast; see S1 Table for coordinates of cluster peaks). To examine this effect further, and although it is not an unbiased test, we defined an ROI by taking the overlap between our functionally defined cluster and Brodmann area 24′. Extracted the t-statistics of each session from the regressor from this ROI revealed that the MCC is more active at the time of choice in the complete feedback condition but not in the partial feedback condition (Fig 4D). We found no interaction between the horizon length and the feedback type in our VOI. Thus, a different subpart of the MCC that was sensitive to the horizon length was sensitive to the type of feedback. Behaviorally, we observed that strategic exploration was implemented by decreasing the influence of expected value on the choice. We therefore next looked for evidence of stronger expected value signals in complete feedback condition compare to the partial feedback condition. We tested the expected value of the chosen option, the unchosen option, and the difference in expected values between the chosen and unchosen options. We only found activity related to the expected value of the chosen option. We found two clusters of activities bilaterally in the MCC (area 24′) and the left dlPFC (area 46) that were modulated by the contingency between choice and information (Fig 4C; see S1 Table for coordinates of cluster peaks). We again placed two ROIs by calculating the functional and anatomical overlap for Brodmann areas 24′ and 46 and extracted the t-statistics of the regressor. Although this is not an unbiased test, we can see that the MCC and dlPFC seemed to be active when an option with a low expected value was chosen, whereas in the complete feedback condition, they were more active when choosing high expected value options (Fig 4E for illustration). We found, however, no difference of the strength of this sensitivity between short and long horizons. Thus, we found that the availability of the counterfactual feedback in the complete feedback condition decreased—and potentially even inverted—the sensitivity of the MCC and dlPFC to the expected value of the chosen option. We conducted additional exploratory brain–behavior correlations but found no significant relationships to behavioral sensitivity (see “Author Response” file within the Peer Review History tab for additional details). Finally, we looked for signals that were related to the expected outcome of the chosen option and that were common to both feedback conditions. Consistent with previous studies [32–35], when we combined the partial and complete feedback conditions session and took all trials in the “wait” period, we found a large activation related to the expected value of the chosen option (which is the same as the chosen action in our task) spanning from the motor cortex/somatosensory cortex, the dlPFC, the OFC, and striatum, as well as an inverted signal in the visual areas in the whole brain (without mask; S6A Fig). We also found a clear representation of the uncertainty about the chosen option on the first choice (when the magnitude of the uncertainty about the chosen option is equivalent in the partial and complete feedback conditions as no counterfactual feedback has yet been provided) in the right medial prefrontal cortex (24c and 9m) that extended bilaterally in the frontal pole (10mr) (S6B Fig). We conducted additional exploratory brain–behavior correlations but found no significant relationships to behavioral sensitivity (see “Author Response” file within the Peer Review History tab for additional details). Overall, we found that pgACC and MCC reflected the horizon length and the type of feedback, respectively. The dlPFC was linearly modulated both, with the strongest activation in the long horizon and partial feedback, when exploration is beneficial. Additionally, the feedback type modulated the effect of the chosen expected value on the activity of the MCC and the dlPFC, such that it was more active for low value choices only when obtaining information was contingent on choosing an option.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001985

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/