(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
Context coding in the mouse nucleus accumbens modulates motivationally relevant information
['Jimmie M. Gmaz', 'Department Of Psychological', 'Brain Sciences', 'Dartmouth College', 'Hanover', 'United States Of America', 'Matthijs A. A. Van Der Meer']
Date: 2022-05
Neural activity in the nucleus accumbens (NAc) is thought to track fundamentally value-centric quantities linked to reward and effort. However, the NAc also contributes to flexible behavior in ways that are difficult to explain based on value signals alone, raising the question of if and how nonvalue signals are encoded in NAc. We recorded NAc neural ensembles while head-fixed mice performed an odor-based biconditional discrimination task where an initial discrete cue modulated the behavioral significance of a subsequently presented reward-predictive cue. We extracted single-unit and population-level correlates related to the cues and found value-independent coding for the initial, context-setting cue. This context signal occupied a population-level coding space orthogonal to outcome-related representations and was predictive of subsequent behaviorally relevant responses to the reward-predictive cues. Together, these findings support a gating model for how the NAc contributes to behavioral flexibility and provide a novel population-level perspective from which to view NAc computations.
( A) Mice were trained in a head-fixed biconditional discrimination task where they learned to discriminate between different pairings of “context” and “target” cues. Mice were first presented with the context cue (1 s), followed by a delay (2 s), followed by target cue presentation (1 s), and an additional response period (1 s). Whether a target cue was rewarded depended upon the identity of the preceding context cue. For instance, licking in response to O3 was rewarded when preceded by O1, but not O2, while for O4 was rewarded when preceded by O2, but not O1. Note, this means that by design, each context cue was rewarded on half of the trials it was presented on. ( B) Trial structure of the task. Purple arrows indicate trials with context cue 1 (O1); orange arrows indicate trials with context cue 2 (O2). Dark arrows after context cue presentation indicate rewarded trials; light arrows indicate unrewarded trials. This color scheme is used throughout the text. ( C) Example learning curve showing proportion of trials with a lick over the course of full-task training. Data are shown for each trial type, in 20 trial blocks, with gaps separating individual training sessions. This learning curve shows that by day 6 (trial approximately 400), there was a clear difference in responding to rewarded versus unrewarded trials. ( D) Number of training sessions before each mouse reached the criterion of 3 consecutive sessions with >80% correct responding, after which recordings began. ( E) Behavioral performance during recording sessions showing the average proportion of trials with a licking response during the ITI, context cue period, delay period, and target cue period for each trial type for each mouse, demonstrating that mice discriminated between rewarded (green) and unrewarded trial types (red). Asterisks denote significant differences (p < 0.05 based on a bootstrap with trial labels shuffled; see Methods ). Data:
https://gin.g-node.org/jgmaz/BiconditionalOdor . ITI, intertrial interval.
To address this issue, we trained mice to perform a biconditional discrimination task, where we model context as a task state signaled by one of 2 discrete odor cues. Specifically, animals were presented with 2 different “context” cues that determined whether a subsequent “target” cue would be rewarded ( Fig 1A ). Thus, in context O1, O3 but not O4 is rewarded, whereas in context O2, O4 but not O3 is rewarded. We recorded ensembles of NAc neurons and tested whether there is coding of the (equally valenced) context cues at the single cell and population level. Next, we used contemporary population analysis tools to test if this context signal can be used to inform subsequent behaviorally relevant processing of target cues.
However, in complex, dynamic behavioral tasks, lesions or inactivations of the NAc lead to deficits that are not straightforward to explain from a purely value-centric perspective [ 6 , 39 ], such as the implementation of conditional rules [ 40 ], or switching to a novel behavioral strategy [ 41 ]. In addition, prominent inputs from brain regions such as orbitofrontal cortex (OFC), medial prefrontal cortex (mPFC), and hippocampus [ 42 – 44 ] suggest that the NAc has access to nonvalue signals that would be expected to not only inform its function but help shape its neural activity. Indeed, a study in primates suggests that elements of task structure, which are orthogonal to value, but nonetheless crucial for successful behavior on the task, are represented in NAc [ 45 ]. In rodents, there have been hints of task structure too, but this has been hard to show conclusively due to the difficulty in cleanly dissociating task structure from value [ 46 , 47 ] (see also related work on dopamine neuron and OFC activity representing task structure [ 42 , 48 ]). The distinct states that make up the structure of these and other tasks are often referred to as “rules” or “contexts” that are learned from experience and require the inference and/or maintenance of information not currently presented as a sensory cue. Thus, it is currently unknown if, and how, task structure is encoded in rodent NAc, and if found, how such a signal relates to the subsequent processing of motivationally relevant information.
These value-centric accounts are supported by a vast literature demonstrating that NAc manipulations can exert bidirectional control over motivated behaviors such as conditioned responding to reward-predictive cues and regulate how much effort to exert [ 7 , 17 – 21 ] as well as the observation that electrical or optogenetic stimulation of the NAc itself, or dopaminergic terminals in the NAc, is sufficient for inducing behavioral preferences [ 16 , 22 – 27 ]. Similarly, unit recording studies in rodents and fMRI work in humans consistently report widespread, sizable value signals in NAc single units, populations, and the NAc blood-oxygen-level-dependent (BOLD) signal [ 28 – 38 ]. Thus, there seems to be widespread agreement that the major dimension (principal component) of NAc activity is some form of value signal.
The nucleus accumbens (NAc) is an important contributor to the motivational control of behavior, acting directly through output pathways involving brainstem motor nuclei (“limbic-motor interface”) [ 1 – 3 ] and indirectly through return projections within cortico-striatal loops [ 4 , 5 ]. Accordingly, leading theories of NAc function, and the mesolimbic dopamine (DA) system it is tightly interconnected with, tend to focus on the processing of reward (and punishment) and its dual role in energizing and directing ongoing actions as well as in learning from feedback [ 1 , 6 – 9 ]. These proposals attribute to the NAc a role in motivational and reward-related quantities such as incentive salience, value of work, expected future reward, economic value, risk and reward prediction error. In more formal reinforcement learning models, the NAc-dopamine system is typically cast as an “evaluator” or “critic,” tracking state values that are useful to set the value of work as well as a source of a teaching signal in the form of reward prediction errors [ 10 – 14 ]. Although the specifics are the subject of vigorous debate, these prominent theories all share a fundamentally value-centric focus: Notwithstanding substantial heterogeneity in NAc cell types and circuitry [ 6 , 15 , 16 ], this brain structure as a whole is typically cast as tracking a relatively low-dimensional quantity: a value signal that at its simplest is just a single number, reflecting how good or bad the current situation is.
Results
Mice learn to perform a biconditional discrimination task using odor cues We sought to test whether NAc encodes information about task structure that is independent of reward. To do this, we used a biconditional discrimination task in which the identity of a “context” cue determines whether a subsequent “target” cue is rewarded or not [49–51]. We use the term “context” here to mean a cue that modifies the meaning of a subsequently presented target cue (i.e., whether that target cue predicts reward or not; see Discussion). Briefly, a trial began with presentation of one of the 2 context cues for 1 s, followed by a 2-s delay, followed by presentation of one of the 2 target cues for 1 s, followed by an additional 1-s response period (Fig 1). Animals had to make a licking response either during presentation of the target cue or the subsequent response period to get a sucrose reward for rewarded cue pairings. For example, given context cue O1, target cue O3 but not O4 is rewarded, but following context cue O2, O4 but not O3 is rewarded. Thus, by design, rewarded trial types O1 to O3 and O2 to O4 both had the same outcome value (future expected reward), while unrewarded trial types O2 to O3 and O1 to O4 also had the same outcome value, with the specific odor associations counterbalanced across mice. Importantly, a 2-s delay separated the 2 odor cues in a trial such that mice had to maintain a representation of the context cue while waiting for the target cue. Mice (n = 4) completed a total of 7 to 28 training sessions to reach criterion before recording sessions began (see Fig 1C for an example learning curve; Fig 1D for number of training sessions for each mouse). During recording sessions, mice licked for a significantly larger proportion of rewarded trials than unrewarded trials (Fig 1E; proportion of rewarded trials with a lick response: 0.82 +/− 0.08 SD; proportion of unrewarded trials with a lick response: 0.26 +/− 0.08 SD; z-score across mice and sessions: 11.05; p < 0.001), but licked similarly across context cues for both rewarded trial types (O1 to O3: 0.83 +/− 0.11 SD; O2 to O4: 0.81 +/− 0.06 SD) and unrewarded trial types (O1 to O4: 0.27 +/− 0.08 SD; O2 to O3: 0.25 +/− 0.09 SD; z-score across mice and sessions: 0.51; p = 0.61). Furthermore, individual mice showed a similar level of correct responding to the target cues during recording sessions (M040: 70% +/− 9% SD; M111: 78% +/− 4% SD; M142: 80% +/− 5% SD; M146: 83% +/− 8% SD), and minimal licking to the context cues themselves (Fig 1E, S1 Fig; proportion of trials with a lick response during context cue presentation: M040: 10% +/− 12% SD; M111: 7% +/− 4% SD; M142: 3% +/− 1% SD; M146: 6% +/− 2% SD). Therefore, mice learned the appropriate context-target cue associations in the task.
[END]
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001338
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/