(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Anterior cingulate cortex causally supports flexible learning under motivationally challenging and cognitively demanding conditions [1]

['Kianoush Banaie Boroujeni', 'Department Of Psychology', 'Vanderbilt University', 'Nashville', 'Tennessee', 'United States Of America', 'Michelle K. Sigona', 'Vanderbilt University Institute Of Imaging Science', 'Department Of Biomedical Engineering', 'Robert Louie Treuting']

Date: 2022-09

Abstract Anterior cingulate cortex (ACC) and striatum (STR) contain neurons encoding not only the expected values of actions, but also the value of stimulus features irrespective of actions. Values about stimulus features in ACC or STR might contribute to adaptive behavior by guiding fixational information sampling and biasing choices toward relevant objects, but they might also have indirect motivational functions by enabling subjects to estimate the value of putting effort into choosing objects. Here, we tested these possibilities by modulating neuronal activity in ACC and STR of nonhuman primates using transcranial ultrasound stimulation while subjects learned the relevance of objects in situations with varying motivational and cognitive demands. Motivational demand was indexed by varying gains and losses during learning, while cognitive demand was varied by increasing the uncertainty about which object features could be relevant during learning. We found that ultrasound stimulation of the ACC, but not the STR, reduced learning efficiency and prolonged information sampling when the task required averting losses and motivational demands were high. Reduced learning efficiency was particularly evident at high cognitive demands and immediately after subjects experienced a loss of already attained tokens. These results suggest that the ACC supports flexible learning of feature values when loss experiences impose a motivational challenge and when uncertainty about the relevance of objects is high. Taken together, these findings provide causal evidence that the ACC facilitates resource allocation and improves visual information sampling during adaptive behavior.

Citation: Banaie Boroujeni K, Sigona MK, Treuting RL, Manuel TJ, Caskey CF, Womelsdorf T (2022) Anterior cingulate cortex causally supports flexible learning under motivationally challenging and cognitively demanding conditions. PLoS Biol 20(9): e3001785. https://doi.org/10.1371/journal.pbio.3001785 Academic Editor: Ben Seymour, University of Cambridge, UNITED KINGDOM Received: January 7, 2022; Accepted: August 9, 2022; Published: September 6, 2022 Copyright: © 2022 Banaie Boroujeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All data supporting this study and its findings can be find using the link: https://figshare.com/projects/TUS_PlosBiology/144330. Custom MATLAB code generated for analyses, are available from the GitHub link: https://github.com/banaiek/TUS_PlosBiology_2022.git. Funding: This work was supported by the National Institute of Mental Health of the National Institutes of Health under Award Number R01MH123687 (TW) and 1UF1NS107666 (CC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Abbreviations: ACC, anterior cingulate cortex; GLM, general linear model; GTI, gross token income; LME, linear mixed effect; MI, mechanical index; STR, striatum; tDCS, transcranial direct current stimulation; TMS, transcranial magnetic stimulation; TUS, transcranial ultrasound stimulation

Introduction It is well established that the anterior cingulate cortex (ACC) and the anterior striatum contribute to flexible learning [1–4]. Widespread lesions of either structure can lead to nonadaptive behavior. When tasks require subjects to adjust choice strategies, lesions in the ACC cause subjects to shift away from a rewarding strategy even after having obtained reward for a choice [5] and reduce the ability to use error feedback to improve behavior [6–8]. Lesions in the striatum likewise reduce the ability to use negative error feedback to adjust choice strategies, which leads to perseveration of non-rewarded choices [9], or inconsistent switching to alternative options after errors [10]. A common interpretation of these lesion effects is that both structures are necessary for integrating the outcome of recent choices to update the expected values of possible choice options. According to this view, the ACC and striatum keep track of reward outcomes of available choice options in a given task environment. However, it has remained elusive what type of reward outcome information is tracked in these structures and whether outcome information in ACC or the striatum affects learning even when it is not associated with specific actions. In many learning tasks used to study ACC or striatum functions, subjects learned associating reward outcomes with the direction of a saccadic eye movement or the direction of a manual movement [11–14]. Succeeding with these tasks requires computing probabilities of action-reward associations. However, recent studies have documented neurons in ACC and striatum that not only tracked action-reward association probabilities, but also the expected reward value of specific features of chosen objects [11,15–19]. Neuronal information about the value object features emerged slightly earlier in ACC than in striatum [15,16,19] and rapidly synchronized between both structures suggesting they are available across the frontostriatal network at similar times [20]. These findings suggest that the ACC or striatum may be functionally important to learn expected values of objects’ specific visual features and thereby mediate information-seeking behavior and visual attention [17,21–25]. There are at least 3 possible ways how feature-specific value information in these areas may support flexible learning. A first possibility is that either of these brain areas uses feature-specific value information for credit-assigning reward outcomes to goal-relevant object features. Such a credit assignment process is necessary during learning to reduce uncertainty about which features are most relevant in a given environment. Support for this suggestion comes from studies reporting of neurons in ACC that show stronger encoding of task variables in situations with higher uncertainty [26], respond to cues reducing uncertainties about outcomes [27], and form subpopulations encoding uncertain outcomes [16]. These prior studies predict that ACC or striatum will be important for learning values of object features when there is high uncertainty about the reward value of object features. A second possibility is that the ACC or striatum uses feature-specific value information to determine whether subjects should continue with a current choice strategy or switch to a new strategy [28,29]. According to this framework, the ACC’s major role is to compute, track, and compare an ongoing choice value for going on with similar choices or switching to alternative choice options. This view predicts that when errors accumulate during learning, the estimated choice value for continuing similar choices is reduced relative to alternatives and ACC or striatum will activate to either directly modify choice behavior [28,29] or indirectly affect choices by guiding attention and information sampling away from recently chosen objects and toward other, potentially more rewarding objects [25]. A third possible route for value information in ACC and striatum to affect behavior assumes that information about the value of object features is not used to compute a choice value, but a motivational value for the subject that indicates whether it is worth to put continued effort into finding the most valuable object in a given environment [4,30]. According to such an effort-control framework, value signals across the ACC-striatum axis are used to compute the value of controlling task performance [31]. Similar to the choice-value framework, this motivational value framework predicts that ACC and striatum should become more important for learning when the number of conflicting features increases. While in the choice-value framework, an increase in conflicting features will evoke more ACC activity by increasing the number of comparisons between an ongoing choice value and the value of the other available options, in the effort-control framework, more feature conflict with the same motivational payoff requires more ACC activity to decide whether it is worth to put effort in the task or not [29,30]. Key motivational factors determining whether subjects put more effort into learning even difficult problems are the amount of reward that can be gained or the amount of punishment or loss that can be avoided by putting effort in the task. Support for suggesting an important role of the ACC to mediate how incentives and disincentives affect learning comes from studies showing ACC neurons responding vigorously to negative outcomes such as errors [32], respond to negatively valenced stimuli or events irrespective of error outcomes [33–35], and that fire stronger when the subject anticipates aversive events [27]. These insights suggest that the ACC and possibly the striatum will be relevant for learning particularly when it involves averting punishment or loss. Here, we set up an experiment to test the relative roles of cognitive and motivational contributions of the ACC and striatum for the flexible learning of feature values. We adopted a transcranial ultrasound stimulation (TUS) protocol to temporarily modulate neuronal activity in the ACC or the striatum of rhesus monkeys while they learned values of object features through trial-and-error. The task independently varied cognitive load by varying the number of unrewarded features of objects and motivational context by varying the amount of gains or losses subjects could receive for successful and erroneous task performance (Fig 1A and 1B). Subjects learned a feature-reward rule by choosing 1 of 3 objects that varied in features of only 1 dimension (low cognitive load, e.g., different shapes), or in features of 2 or 3 dimensions (high cognitive load, e.g., varying shapes, surface patterns, and arm types) (Fig 1D). Independent of cognitive load, we varied how motivationally challenging task performance was by altering whether the learning context was a pure gain-only context or a mixed gain-loss context. In the gain-only contexts, subjects received 3 tokens for correct choices, while in the gain-loss contexts, subjects received 2 tokens for correct choices and lost 1 already attained token when choosing objects with non-rewarded features (Fig 1A–1C). Such a loss experience has been reported in previous studies to impose a motivational conflict [36], inferred also from vigilance responses triggered by experiencing losses [37]. The task required monkeys to collect 5 visual tokens before they were cashed out for fluid rewards. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Task paradigm and TUS protocol. (A) A trial started with central gaze fixation and appearance of 3 objects. Monkeys could then explore objects and choose 1 object by fixating it for 700 ms. A correct choice triggered visual feedback (a yellow halo of the chosen object) and the appearance of green circles (tokens for reward) above the chosen object. The tokens were then animated and traveled to a token bar on the top of the screen. Incorrect choices triggered a blue halo, and in the gain-loss condition 1 blue token was shown that traveled to the token bar where one already attained token was removed. When ≥5 tokens were collected in the token bar, fluid reward was delivered, and the token bar reset to zero. (B) In successive learning blocks, different visual features were associated with reward and blocks alternated randomly between the gain-only (G) and gain-loss (GL) conditions. (C) In the gain-only motivational context, monkeys gained 3 tokens for correct and 0 penalty for each incorrect choice, whereas in the gain-loss context, they gained 2 tokens for each correct and lost 1 token for each incorrect response. The axes (right) show the 3 orthogonal independent variables of the task design (cognitive load, motivational context, and TUS conditions). (D) Cognitive load varied by increasing the number of object features from 1 to 3 and from block to block. (E) In each sonication or sham session, the experiment was paused after 6 learning blocks. There are 4 experimental conditions; TUS in ACC (ACC—TUS; red), or anterior striatum (STR-TUS; green), or sham ACC (ACC-Sham; dimmed red), or sham anterior striatum (STR-Sham; dimmed green); 30-ms bursts of TUS were delivered every 100 ms over a duration of 80 seconds (40 seconds each hemisphere). (F) Proportion of correct choices over trials since block begin for different cognitive loads (left panel; 1–3D, light to dark gray) and motivational contexts (gain-only, blue; gain-loss, red). (G) The average fixation duration on objects prior to choosing an object (information sampling) in the same format as in (F). The lines show the mean and the shaded error bars are SE. Data associated with this plot could be found at: https://figshare.com/projects/TUS_PlosBiology/144330. ACC, anterior cingulate cortex; STR, striatum; TUS, transcranial ultrasound stimulation. https://doi.org/10.1371/journal.pbio.3001785.g001 With this design, we found that sonication of the ACC, but not the anterior striatum, with TUS led to a learning deficit when subjects experienced losses. This loss-triggered deficit was accompanied by inefficient information sampling and was most pronounced at high cognitive load.

Discussion We found that sonicating the ACC, but not the anterior striatum (head of the caudate), slowed down learning, prolonged visual information sampling of objects prior to choosing an object, and reduced overall performance when learning took place in a context with gains and losses and with high cognitive load. These behavioral impairments were specific to the ACC-TUS condition when comparing behavior to baseline performance prior to TUS and to sessions with sham-controlled TUS and STR-TUS. Moreover, the changes in behavioral adjustment were found when comparing average performance between individual sessions (S10 Fig), for performance variations across blocks of different sessions (Fig 2), and on a trial-by-trial level in impaired behavior when the loss of tokens accumulated over trials (Fig 3). Taken together, these findings provide evidence that primate ACC supports the guidance of attention and information sampling in contexts that are motivationally challenging and cognitively demanding. Prior to discussing the implications of these findings, it should be noted that the sonication effects on neuronal activity in vivo are not well investigated. The sonication protocol we applied may involve changes in the excitability of neural tissue within the hot spot of the sonicated area [41] or effects on the fibers of passage through the sonication areas. Our discussion assumes a putative disruptive effect of TUS on neural activity within ACC. This assumption is based on a prior study that used the same TUS protocol and reported reduced functional connectivity of the sonicated brain area [38]. There is also the caveat that the sonication effects could indirectly follow altered activity in other areas through modulating connectivity with those areas. Extending existing functional accounts of ACC functions Our main findings suggest extensions to theoretical accounts of the overarching function of the ACC for adaptive behavior. First, the finding of prolonged information sampling after ACC-TUS supports recent electrophysiological and imaging studies showing that activity in the ACC predicts information sampling of visual objects and attention to maximize reward outcomes [17,18,25,42]. However, we found a functional deficit of information sampling only in the gain-loss learning context in which subjects expected lower payoff, suggesting that the ACC contributes to information sampling, particularly in motivationally challenging conditions. Secondly, our core finding of compromised learning in motivationally challenging conditions partly supports and extends the view that ACC is essential to control effort [30]. The functional impairment in the gain-loss motivational context was most pronounced when subjects faced high cognitive load, i.e., in the condition that overall was most difficult. While this result pattern suggests that the overall difficulty may be the primary driver of the ACC-TUS effects, we outline below that the specific impairments after loss experiences (irrespective of cognitive load) and the overall reduced plateau performance with ACC-TUS in the gain-loss context indicate a predominant role of motivational processes over cognitive processes to drive the behavioral ACC-TUS effects. At a psychological level, the pronounced deficit in the loss context at high cognitive load is consistent with a neuroeconomic view of ACC function based on prospect theory, which suggest that losses induce a particularly strong internal demand for adjusting actions, which is intensified when the adjustment of the action itself gets more demanding by an increase in cognitive load [37,43,44]. A particular role of the ACC in mediating the motivational consequences of negative loss experiences is consistent with a wealth of studies about affective processing, mostly from human imaging studies, that show systematic ACC activation in the face of aversive or threatening experiences [33,34,45]. Our finding that experiencing losses was necessary to observe behavioral deficits from ACC-TUS shows that the presence of a cognitive conflict due to an increased number of features in the high cognitive load condition was not sufficient to alter performance. This result might seem at odds with literature implicating the ACC to be particularly relevant to resolving conflict [46,47]. However, increasing cognitive load in our task entailed increased perceptual interference from object features, it did not entail a conflict of sensory-motor mappings that is the hallmark of conflict-inducing flanker, Stroop, or Simon tasks, used to document a role of the ACC to mediate the resolution of conflict [26,46,48]. Therefore, our results do not oppose studies using those paradigms, which gave rise to the view that neuronal signaling in ACC contributes to resolving sensorimotor conflicts [2]. Our results instead point to the importance of the ACC in contributing to overcoming situations in which motivational challenges require enhanced encoding of task-relevant variables. Taken together, our observed result pattern supports theories suggesting the ACC contributes to information sampling as well as to controlling motivational effort during adaptive behavior. The results suggest that these functions of the ACC are recruited when the task requires overcoming motivational challenges from anticipating losses and when learning takes place in a cognitively demanding situation. We discuss the specific rationale for this conclusion in the following. ACC modulates the efficiency of information sampling We found that ACC-TUS increased the duration of fixating objects prior to choosing an object in the loss contexts. Longer fixation durations are typically considered to reflect longer sampling of information from the foveated object. This has been inferred from the longer fixational sampling of objects associated with higher outcome uncertainty [49] or that explicitly carry information that reduces uncertainty about outcomes [50,51]. We, therefore, consider fixational sampling of visual objects to index information sampling. Consistent with this view, we found that ACC-TUS prolonged information sampling at high cognitive load when objects varied in more features from trial to trial and hence carried higher uncertainty about which feature was linked to reward (Figs 2D and 2E and S7). These results suggest that TUS might disrupt or modulate the activity of neuronal circuitries in ACC that would have responded to the demand for information about the feature values by controlling the duration of gaze fixations on available objects. Such activity has been documented to be encoded in ACC [16–19]. These studies found that neurons in the ACC encoded the pre-learned value of objects when they were fixated [18] or peripherally attended [16,19]. Moreover, subpopulations of ACC neurons also encode the value of objects that are not yet fixated but are possible future targets for fixations [17]. Disrupting these signals with TUS in our study is a plausible reason causing uncertainty about the value of objects and for prolonging information sampling durations. In our study, ACC_TUS prolonged information sampling most prominently in the gain-loss context when subjects were uncertain not only about the possible gains following correct choices, but additionally about the possible losses following incorrect choices (Fig 3D and 3E). This is consistent with recent findings of neurons in the intact ACC that fire stronger when subjects fixate on cues that reduce uncertainty about anticipated losses [27]. In that study, trial-by-trial variations of ACC activity during the fixation of punishment-predicting cues correlated with trial-by-trial variations of seeking information about how aversive a pending outcome will be [27]. In our study, disrupting this type of activity with transcranial ultrasound may thus have disrupted the efficacy of acquiring information about the loss association of features while fixating objects. According to this interpretation, the TUS-induced prolongation of fixation durations indicates that an intact ACC critically enhances the efficiency of visual information sampling to reduce uncertainty about aversive outcomes. ACC mediates feature-specific credit assignment for aversive outcomes The altered information sampling after ACC-TUS was only evident in the gain-loss context and depended on experiencing losing tokens. We found that losing 1, 2, or 3 tokens significantly impaired performance in ACC-TUS compared to sham or striatum sonication (Fig 3F). This valence-specific finding might be linked to the relevance of the ACC for processing negative events and adjusting behavior after negative experiences. Imaging studies have consistently shown ACC activation in response to negatively valenced stimuli or events [33–35]. Behaviorally, the experience of loss outcomes is known to trigger autonomous vigilance responses that reorient attention and increase explorative behaviors [37,52,53]. Such loss-induced exploration can help avoid threatening stimuli, but it comes at the cost of reducing the depth of processing the loss-inducing stimulus [36]. Studies in humans have shown that stimuli associated with loss of monetary rewards or aversive outcomes (aversive images, odors, or electrical shocks) are less precisely memorized than stimulus features associated with positive outcomes [54–56]. Such a reduced influence of loss-associated stimuli on future behavior may partly be due to loss experiences curtailing the engagement with those stimuli and reducing the evaluation of their role for future choices, as demonstrated, e.g., in a sequential decision-making task [57]. One behavioral consequence of a reduced evaluation or memorization of aversive or loss-associated stimuli is an over-generalization of aversive outcomes to stimuli that only share some resemblance with the specific stimulus that caused the loss experience. Such over-generalization is evolutionary meaningful because it can support the faster recognition of similar stimuli as potentially threatening even if these stimuli are not a precise instance of a previously encountered, loss-inducing stimulus [36,58]. For our task, such a precise recognition of features of a chosen object was pivotal to learning from feature-specific outcomes. Our finding of reduced learning from loss outcomes, therefore, indicates that ACC-TUS exacerbated the difficulty to assign negative outcomes to features of the chosen object. Neurophysiological support for this interpretation of the ACC to mediate negative credit assignment comes from studies showing a substantial proportion of neurons in ACC encode feature-specific prediction errors [15]. These neurons responded to an error outcome most strongly when it was unexpected and when the chosen object contained a specific visual feature. This is precisely the information that is needed to learn which visual features should be avoided in future trials. Consistent with this importance for learning, the study also reported that feature-specific prediction error signals in the ACC predict when neurons updated value expectations about specific features [15,59]. Thus, neurons in ACC signal feature-specific negative prediction errors and the updating of feature-specific value predictions. It seems likely that disrupting these signals with TUS will lead to impaired feature-specific credit assignment in our task. This scenario is supported by the finding that ACC-TUS impaired flexible learning most prominently at high load, i.e., when there was a high degree of uncertainty about which stimulus feature was associated with gains and which features were associated with losses (Fig 2C). This finding suggests that outcome processes such as credit assignment in an intact ACC critically contribute to the learning about aversive distractors. A role of the ACC to determine learning rates for aversive outcomes So far, we have discussed that ACC-TUS has likely reduced the efficiency of information sampling and this reduction could originate from the disruption of feature-specific credit assignment processes. These phenomena were exclusively observed in the loss context, and ACC-TUS did not change learning or information sampling in the gain-only context. Based on this finding, we propose that the ACC plays an important role in determining the learning rate for negative or aversive outcomes and thereby controls how fast subjects learn which object features in their environments have aversive consequences. Such learning from negative outcomes was particularly important for our task at higher cognitive load [60]. At high load, a single loss outcome provided unequivocal information that all features of the chosen object were non-rewarded features in the ongoing learning block. This was different for positive outcomes. Receiving a positive outcome was linked to only 1 feature out of 2 or 3 features of the chosen object, making it more difficult to discern the precise cause of the outcome. The higher informativeness of negative than positive outcomes can explain why ACC-TUS caused a selective learning impairment in the loss context when assuming that TUS introduced uncertainty in the neural information about the cause of the outcome. This interpretation is consistent with the established role of the ACC to encode different types of errors [32,61,62] and with computational evidence that learning from errors and other negative outcomes is dependent on a dedicated learning mechanism separately of learning from positive outcomes [60,63–65]. The suggested role of the ACC to determine the rate of learning from negative outcomes describes a mechanism for establishing which visual objects are distractors and should be actively avoided and suppressed in a given learning context [66]. When subjects experience aversive outcomes, the ACC may use these experiences to bias attention and information sampling away from the negatively associated stimuli. This suggestion is consistent with the fast neural responses in ACC to attention cues that trigger inhibition of distracting and enhancement of target information [16,19]. In these studies, the fast cue-onset activity reflects a push-pull attention effect that occurred during covert attentional orienting and was independent of actual motor actions. This observation supports the general notion that ACC circuits are critical for guiding covert and overt information sampling during adaptive behaviors [18,25]. Versatility of the focused-ultrasound protocol for transcranial neuromodulation Our conclusions were made possible using a TUS protocol that was developed to interfere with local neural activity in deep neural structures which likely imposes a temporary functional disconnection of the sonicated area from its network [38,40]. We implemented an enhanced protocol that entailed quantifying the anatomical targeting precision (S3A and S3B Fig) and confirmed by computer simulations that sonication power reached the targets (S3C–S3E Fig). We also showed that the TUS pressure in ACC was comparative to the maximum pressure in the brain and noticeably higher than in other nearby brain structures such as the orbitofrontal cortex (see Materials and methods and S3F Fig). We also documented that the main behavioral effect (impaired learning) of ACC-TUS was evident relative to a within-task pre-sonication baseline and throughout the experimental behavioral session (S9 Fig) and that the main TUS-ACC effects were qualitatively consistent across monkeys (for monkey specific results, see S1–S10 Figs; there were no significant random effects of the factor monkey in the LME models). Importantly, the main behavioral effects of ACC-TUS in our task are consistent with the effects from more widespread, invasive lesions of the ACC in nonhuman primates. Widespread ACC lesions reduce affective responses to harmful stimuli [67], increase response perseverations [68], cause failures to use reward history to guide choices [7,8], and reduce control to inhibit prevalent motor programs [69,70]. Our study, therefore, illustrates the versatility of the TUS approach to modulate deeper structures such as the ACC that so far have been out of reach for noninvasive neuromodulation techniques such as transcranial magnetic stimulation (TMS) or transcranial direct current stimulation (tDCS) [71]. Despite these suggestions, it should be made explicit that there is so far a scarcity of insights about the specific effects of the TUS protocol on neural circuits. Therefore, future studies will need to investigate the neural mechanisms underlying the immediate and longer-lasting TUS effects on ACC neural circuits. In summary, our results suggest that the ACC multiplexes motivational effort control and attentional control functions by tracking the costs of incorrect performance, optimizing feature-specific credit assignment for aversive outcomes, and actively guiding information sampling to visual objects during adaptive behaviors.

Materials and methods Ethics statement All procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals, the Society for Neuroscience Guidelines and Policies, and approved by the Vanderbilt University Institutional Animal Care and Use Committee (M1700198-01). Experimental procedures Two male macaque monkeys (monkey I 13.6 kg and monkey W 14.6 kg, 8 to 9 years of age) contributed to the experiments. They sat in a sound-proof booth in primate chairs with their head position fixed, facing a 21” LCD screen at a distance of 63 cm from their eyes to the screen center. Behavior, visual display, stimulus timing, and reward delivery were controlled by the Unified Suite for Experiments (USE), which integrates an IO-controller board with a unity3D video-engine-based control for displaying visual stimuli, controlling behavioral responses, and triggering reward delivery [72]. Prior to the ultrasound experiment, the animals were trained on the feature learning task in a kiosk training station [73]. Monkeys first learned to choose objects to receive an immediate fluid reward before a token system that provided animals with green circles per correct choice that symbolized tokens later cashed out for fluid reward. Tokens were presented above the chosen object and traveled to a token bar where they accumulated with successive correct performance. When 5 tokens were collected, the token bar blinked red/white, fluid was delivered through a sipper tube, and the token bar reset to 5 empty token placeholders (Fig 1A). The animals effortlessly adopted the token reward system as documented in [36]. Here, we used in separate blocks of 35 to 50 trials a condition with “gains-only” (3 tokens for correct choices, no penalties) and with “gains-and-losses” (2 tokens for correct choices and 1 token lost, i.e., removed from the token bar, for incorrect choices). The introduction of gains-and-losses effectively changed behavior. Animals learned slower, showed reduced plateau accuracy, enhanced exploratory sampling, and more checking of the token bar (S2B–S2H Fig). Task paradigm The task required monkeys to learn feature-reward rules in blocks of 35 to 60 trials through trial-and-error by choosing 1 of 3 objects. Objects were composed of multiple features, but only 1 feature was associated with reward (Fig 1A–1C). A trial started with the appearance of a black circle with a diameter of 1 cm (0.5° radius wide) on a uniform gray background on the screen. Monkeys fixated the black circle for 150 ms to start a trial. Within 500 ms after the central gaze fixation registration, 3 objects appeared on the screen randomly at 3 out of 4 possible locations with an equal distance from the screen center (10.5 cm, 5° eccentricity). Each stimulus had a diameter of 3 cm (approximately 1.5° radius wide). To choose an object, monkeys had to maintain fixation onto the object for at least 700 ms. Monkeys then had 5 s to choose 1 of 3 objects or the trial was aborted. Choosing the correct object was followed by a yellow halo around the stimulus as visual feedback (500 ms), an auditory tone, and either 2 or 3 tokens (green circles) added to the token bar (Fig 1A). Choosing an object without the rewarded target feature was followed by a blue halo around the selected objects, a low-pitched auditory feedback, and in the loss conditions, the presentation of a gray “loss” token that traveled to the token bar where one already attained token was removed. The timing of the feedback was identical for all types of feedback. In each session, monkeys were presented with up to 36 separate learning blocks, each with a unique feature-reward rule. Across all 48 experimental sessions, monkeys completed on average 23.6 (±4 SE) learning blocks per session (monkey W: 20.5 ± 4, monkey I: 26.5 ± 4). For each experimental session, a unique set of objects was defined by randomly selecting 3 dimensions and 3 feature values per dimension (e.g., 3 body shapes: oblong, pyramidal, and ellipsoid; 3 arm types: upward pointy, straight blunt, downward flared; 3 patterns: checkerboard, horizontal striped, vertical sinusoidal; Watson and colleagues, 2019 [74] have documented the complete library of features). Of this feature set, 3 different task conditions were defined: One task condition contained objects that varied in only 1 feature while all other features were identical, i.e., the object body shapes were oblong, pyramidal, and ellipsoid, but all objects had blunt straight arms and uniform gray color. A second task condition defined objects that varied in 2 feature dimensions (“2D” condition), and a third task condition defined objects that varied in 3 feature dimensions (“3D” condition). Learning is systematically more demanding with increasing number of feature dimensions that could contain the rewarded feature (for computational analysis of the task, see Womelsdorf and colleagues, 2021). We refer to the variations of the object feature dimensionality as cognitive load because it corresponds to the size of the feature space subjects have to search to find the rewarded feature (Figs 1E and S1), and 1D, 2D, and 3D conditions varied randomly across blocks. Experimental design Each session randomly varied across blocks 2 motivational conditions (gain/loss and gains-only) and 3 cognitive load conditions (1D, 2D, and 3D). Randomization ensured that all (6) combinations of conditions were present in the first 6 blocks prior to sonication and that all combinations of conditions were equally often shown in the 24 blocks after the first 6 blocks. After monkeys completed the first 6 learning blocks (on average 12 min after starting the experiment), the task was paused to apply TUS. After bilateral placement of the transducer and sonication (or sham sonication) of the brain areas, the task resumed for 18 to 24 more blocks (Fig 1D). Experimental sessions lasted 90 to 120 min. Transcranial ultrasound stimulation (TUS) For transcranial stimulation, we used a single element transducer with a curvature of 63.2 mm and an active diameter of 64 mm (H115MR, Sonic Concepts, Bothell, Washington, United States of America). The transducer was attached to a cone with a custom-made trackable arm. Before each session, we filled the transducer cone with warm water and sealed the cone with a latex membrane. A conductive ultrasound gel was used for coupling between the transducer cone and the shaved monkey’s head. A digital function generator (Keysight 33500B series, Santa Rosa, California, USA) was used to generate a periodic burst of 30-ms stimulation with a resonate frequency of 250 kHz and an interval of 100 ms for a total duration of 40 s and 1.2 MPa pressure (similar to [39,75]) (Fig 1D). A digital function generator was connected to a 150-watt amplifier with a gain of 55 dB in the continuous range to deliver the input voltage to the transducer (E&I, Rochester, New York, USA). We measured the transducer output in water using a calibrated ceramic needle hydrophone (HNC 0400, Onda Corp., Sunnyvale, California, USA) and created a linear relationship between the input voltage and peak pressure. To avoid hydrophone damage, only pressures below a mechanical index (MI) of 1.0 were measured and amplitudes above this were extrapolated. We have previously measured transducer output at MI > 1.0 with a calibrated optical hydrophone (Precision Acoustics, Dorchester, United Kingdom) to validate the linearity of this relationship at higher MI, but this calibrated device was not available during these studies. During stimulation, the bi-directionally coupled (ZABDC50-150HP+, Mini Circuits Brooklyn, New York, USA) feedforward and feedback voltage were monitored and logged using a Picoscope 5000 series (A-API; Pico Technology, Tyler, Texas, USA) and a custom written python script. Four different sonication conditions were pseudo-randomly assigned to the experimental days per week for a 12-week experimental protocol per monkey. These 4 conditions consisted of high energy TUS in anterior striatum (TUS-STR), high energy TUS in anterior cingulate cortex (TUS-ACC), sham anterior striatum (Sham-STR), and sham anterior cingulate cortex (Sham-ACC) (Fig 1D). We sequentially targeted an area in both hemispheres (each hemisphere for a 40-s duration) with real-time monitoring of the distance of the transducer to the targeted area (S3A Fig) and monitoring of the feedforward power (S3B Fig). Sham conditions were identical to TUS conditions, only no power was transmitted to the transducer. Data analysis Trial-level statistical analysis. We tested TUS effects on behavior at the trial level using LMEs models [10] with 4 main factors: cognitive load (Cog Load ) with 3 levels (1D, 2D, and 3D distractor feature dimensions, ratio scale with values 1, 2, and 3), trial in block (TIB), previous trial outcome (Prev Outc) ) which is the number of tokens gained or lost in the previous trial, motivational token condition, which we call the motivational gain/loss context (MCtx Gain/Loss ) with 2 levels (1, for the loss condition, and 2, for the gain condition, nominal variable), TUS condition (TUS Cnd ) with 4 levels (Sham-ACC, TUS-ACC, Sham-STR, and TUS-STR), and time relative to stim (T2Stim) with 2 levels (before versus after stimulation). We used 3 other factors as random effects, a factor target features (Feat) with 4 levels (color, pattern, arm, and shape), weekday of the experiment (Day) with 4 levels (Tuesday, Wednesday, Thursday, and Friday), and the factor monkeys with 2 levels (W and I). We used these factors to predict 3 metrics (Metric): accuracy (Accuracy), reaction time (RT), and information sampling (Sample Explr ). The LME is formalized as in Eq 1. (1) Block-level analysis of behavioral metrics. We used LME models to analyze across blocks how learning speed (indexed as the “learning trial” (LT) at which criterion performance was reached), post-learning accuracy (Accuracy), and metrics for choice and information sampling were affected by TUS. In addition to the fixed and random effects factors used for the LME in Eq 1, we also included the factor block switching condition (Switch Cnd ) with 2 levels (intra- and extra-dimensional switch). The LME had the form of Eq 2: (2) We extended the model to test for interactions of TUS Cnd , MCtx Gain/Loss , and T2Stim: (3) for interactions of T2Stim, Cog Load , and TUS Cnd : (4) and for interactions of T2Stim, Cog Load , MCtx Gain/Loss , and TUS Cnd : (5)

Acknowledgments The authors thank Dr. Marcus Watson for help with the experimental software, Huiwen Luo for assistance with calibrating optically tracked devices using MRI, and Adrienne Hawkes for technical assistance with voltage monitoring. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001785

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/