(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Individualistic reward-seeking strategies that predict response to nicotine emerge among isogenic male mice living in a micro-society [1]

['Sophie L. Fayad', 'Sorbonne University', 'Inserm', 'Cnrs', 'Neuroscience Paris Seine Institut De Biologie Paris Seine', 'Nps Ibps', 'Paris', 'Espci Paris', 'Psl Research University', 'Brain Plasticity Laboratory']

Date: 2024-10

Individual animals differ in their traits and preferences, which shape their social interactions, survival, and susceptibility to disease, including addiction. Nicotine use is highly heterogenous and has been linked to the expression of personality traits. Although these relationships are well documented, we have limited understanding of the neurophysiological mechanisms that give rise to distinct behavioral profiles and their connection to nicotine susceptibility. To address this question, we conducted a study using a semi-natural and social environment called “Souris-City” to observe the long-term behavior of individual male mice. Souris-City provided both a communal living area and a separate test area where mice engaged in a reward-seeking task isolated from their peers. Mice developed individualistic reward-seeking strategies when choosing between water and sucrose in the test compartment, which, in turn, predicted how they adapted to the introduction of nicotine as a reinforcer. Moreover, the profiles mice developed while isolated in the test area correlated with their behavior within the social environment, linking decision-making strategies to the expression of behavioral traits. Neurophysiological markers of adaptability within the dopamine system were apparent upon nicotine challenge and were associated with specific profiles. Our findings suggest that environmental adaptations influence behavioral traits and sensitivity to nicotine by acting on dopaminergic reactivity in the face of nicotine exposure, potentially contributing to addiction susceptibility. These results further emphasize the importance of understanding interindividual variability in behavior to gain insight into the mechanisms of decision-making and addiction.

Funding: This work was supported by the Centre National de la Recherche Scientifique CNRS UMR 8246 and 8249, INSERM U1130, the Foundation for Medical Research (FRM https://www.frm.org/fr , Equipe FRM DEQ2013326488 to PF), the French National Cancer Institute ( https://www.e-cancer.fr ) and the French Institute for Public Health Research (IReSP) ( https://iresp.net/en/presentation-english/ ) for Grant TABAC-16-01, TABAC-19-020, SPAV1-21-002 and SPAV1-23-005 (to PF), French state funds managed by the ANR ( https://anr.fr/en/ ) for ANR-19-CE16-0028 Bavar to PF and NR). LMR was supported by a NIDA–Inserm Postdoctoral Drug Abuse Research Fellowship. Fourth-year PhD fellowship from Fondation pour la Recherche Médicale (FDT201904008060 to SM). Fourth-year PhD fellowship from the Biopsy Labex (CN). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Here, we aim to demonstrate whether the way individuals adapt to their environment is related to the nicotinic modulation of their DA networks, and, consequently, whether this relationship defines their initial sensitivity to nicotine, a critical element that may define susceptibility to nicotine addiction [ 42 , 43 ]. For that purpose, we used a habitat called “Souris-City” that combines a large social environment where mice live together with a modular testing platform where animals individually perform cognitive tests. In this environment, mice have individual access to water by performing a specific task in a T-maze, while social, circadian, and cognitive behaviors are continuously monitored over time using multiple sensors [ 5 ].

Behavioral trait components of decision-making, such as impulsivity, exploration, or novelty seeking, are thought to predict vulnerability to drugs of abuse [ 20 , 21 ]. While these traits have indeed been linked with smoking and addiction to nicotine in humans and animal models, certain traits, like impulsivity or sensation-seeking, have been more strongly associated with initial nicotine sensitivity [ 22 – 24 ], suggesting that they are a measure of vulnerability to nicotine. However, whether the processes leading to nicotine addiction and the mechanisms of decision-making share mechanistic underpinnings remains elusive. Altered dopamine circuit function is a promising mechanistic candidate [ 10 , 25 ], as dopaminergic signaling is implicated in decision-making, in social behaviors, and in nicotine addiction, where the initial stage critically involves activation of mesolimbic dopamine neurons [ 26 – 28 ]. As such, investigating variations in nicotinic control over the DA system represents a particularly promising avenue for linking interindividual differences in decision-making and vulnerability to nicotine. Nicotine initiates reinforcement by increasing the firing rate and bursting activity of DA neurons through direct actions on nicotinic acetylcholine receptors (nAChRs), a family of pentameric ligand-gated ion channels with 12 different types of subunits expressed in the mammalian brain. It has been shown that the transition between tonic and phasic activity of DA neurons induced by nicotine is essential for the reinforcement [ 29 , 30 ], and that the expression of nicotine-sensitive nAChR subtypes in the VTA is necessary for both the cellular and behavioral effects of nicotine [ 29 , 31 – 33 ]. Under nicotine-free conditions, nAChRs in the VTA are also key modulators of DA activity through basal cholinergic signaling, and they regulate specific aspects of reward-seeking behaviors, in particular, exploration and reaction to uncertainty [ 34 , 35 ]. Environmental manipulations that alter nAChR-mediated control of DA neurons may therefore lead to changes in downstream behaviors. For example, nicotine exposure modifies exploratory behavior [ 36 , 37 ], by increasing, in mice, DA neuron activity and biasing individual strategies toward reduced exploration [ 38 ]. In addition, specific social contexts (i.e., repeated aggression) have been shown to induce a marked remodeling of the dopaminergic and nicotinic system, leading to increased VTA DA neuron activity [ 39 , 40 ], social aversion, and modified nicotine response [ 41 ]. This crosstalk at the level of the DA system between responses to drugs and modifications of decision-making could explain the observed correlation between novelty seeking and susceptibility to nicotine.

Capturing and interpreting interindividual variability in animal experiments can be challenging [ 5 , 9 – 12 ]. Longitudinal and complex quantification of individual behaviors is necessary, alongside careful consideration of experimental design. We studied resource foraging, a very important aspect of animal life and a fundamental focus of neuroeconomic studies [ 13 – 15 ]. Our behavioral paradigm utilized a closed-economy setup [ 16 – 18 ], where food and liquids are always present, 24 h a day. Animals live in groups (approximately 10 mice) representing a “micro-society” [ 5 , 9 , 11 ]. Rodents, known for their social nature, exhibit a range of interactive behaviors—such as physical contact, vocal communication, aggression, social recognition—that can be considered as hallmarks of sociability [ 19 ]. Importantly, when mice live in micro-societies within a closed and enriched naturalistic environment, strong and stable interindividual variability in behavior related to foraging, decision-making, and exploration emerges, even among isogenic animals [ 5 , 9 ].

Interindividual behavioral variability refers to consistent differences in behavior between members of a population or group. This variability is observed in both humans and mice [ 1 – 7 ]. It is seen not only in how individuals adapt to their environment but also in their susceptibility to diseases. This is particularly evident in the context of addiction, as not all individuals will develop drug abuse despite equal exposure to a given psychoactive substance [ 8 ].

Results

Souris-City: Continuous tracking of individual mice living within a micro-society Souris-City is a semi-naturalistic environment composed of a large and complex housing space in which groups of mice (N = 32 groups) live together (5 to 10 male mice per group, mean = 8.8) for extended periods of time (1 to 3 months) and are able to express sophisticated social and non-social behaviors. The environment includes a test-area (individual zone), separated from the main environment (social zone) by a gate which selectively controls the passage of mice, one at a time, based on a radio frequency identification (RFID) (Fig 1A). The test area consists of a T-maze leading to 2 drinking areas at the end of the left and right arms, where mice can perform a self-initiated cognitive task individually, spontaneously, and isolated from their cage mates. Thus, Souris-City associates a zone for individual liquid consumption and a social zone (the main cage) where food is always available. The experimental paradigm involves several consecutive periods with modified rules regarding access to and the nature of the liquids (Fig 1B). During a 1-week habituation period, mice explore Souris-City with free access to the T-maze. The gate is always open, so several mice can access the T-maze simultaneously, and water is delivered from both sides. In the second period (WW, mean duration = 9.5 days, n = 281 mice), mice continue to have water on both sides of the T-maze, but its access is now restricted by the gate so that mice can only enter the T-maze one at a time. Choice is restricted so that if the animal chooses one side, access to the opposite arm closes. During the subsequent weeks (WS, mean duration = 25.2 days, n = 281 mice), water and a 5% sucrose solution are respectively delivered on each side of the gate-restricted T-maze, thus introducing a choice (choosing left or right) which modifies the reward associated with liquid consumption. The positions of water and sucrose bottles are swapped twice a week, which allows for the mice to stabilize their choice. Overall, in 32 experiments (or groups of mice) in the water-sucrose (WS) test period, the behavior of 281 mice and more than 100,000 choices were analyzed. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Longitudinal profiling of individual and group behavior among mouse micro-societies within Souris-City. (A) Souris-City is divided into 2 main parts: a social zone and a test zone. The social zone includes a square cage measuring 1 m × 1 m, which is further divided into 4 compartments: the nest (N), the food (F) area, where mice have unrestricted access to food, and the central (C) zone that serves as a hub, connecting the social compartments with a stair (S) leading to the test zone. The test zone is a T-maze, which is separated from the stair by a controlled access gate (G). Mice are tagged with RFID chips and detected using floor-mounted circular or tube-shaped RFID antennae, which connect compartments of SC to capture transitions between zones. Two infrared beams (red dashed lines) are used to detect which arm mice choose in the T-maze. (B) The experimental paradigm involves several consecutive sessions with modified rules regarding access to the maze and the nature of the liquids available at each arm. During the free access period (top), mice are allowed unrestricted access to the T-maze for 1 week. The gate remains open, allowing multiple mice to enter the T-maze simultaneously, and water is delivered from both sides. In a second step (middle), mice choose between water on both sides of the T-maze (WW, mean duration = 9.5 days); however, access to the T-maze is restricted by the gate, and mice may only enter the T-maze one at a time. Choice is restricted so that if the animal chooses one side, access to the opposite arm is closed. Finally, water and 5% sucrose solution (bottom, WS, mean duration = 25.2 days) are respectively delivered at each side of the gate-restricted T-maze, introducing a choice (choosing left or right, choosing water or sucrose). The positions of the water and sucrose bottles are then swapped twice a week. (C) Overall activity of mice captured from their movement in Souris-City reflects their circadian rhythm. (Top) Tube detection events for 8 consecutive days (n = 20 mice, 2 group of 10 in parallel). (Bottom) Daily tube detection events per hours averaged for all mice (mean ± SEM, n = 281). (D) Residency time in each sub-compartment can be captured by floor antennae. (Top) Histogram (bin per hour) of the number of residencies in the nest zone longer than 2 h. (Bottom) Density of residency time in each sub-compartment (log-scale, bandwidth = 0.1), with indicated mean value. (E) Tube antennae provide information about the movement of mice between sub-compartments. Flow diagram of all possible transitions between sub-compartment, density graph above each transition indicates the distribution of conditional transition probability among the n = 281 mice, with indicated median value. (F) (Left) Distribution of mean number of T-maze entries per day for n = 281 mice in the SW session. Vertical dashed line indicates mean value. (Right) Cumulative number of T-maze entries per hours after the beginning of the SW session for n = 55 mice (6 experiments). (G) Estimation of daily consumption on a subpart of the experiment (n = 132 mice, see text) during SW session: Mean daily fluid change per animal distinguishing the chosen side (CS) from the non-selected side (NS) and the difference between the two (Δ) (pairwise comparisons using Wilcoxon rank sum test with continuity correction and Holm p-value adjustment correction, n = 132 mice). Data can be found here https://zenodo.org/api/records/13374058/draft/files/Fig 1G.csv/content. Data are represented as mean ± SEM. ns p > 0.05, ** p < 0.01, *** p < 0.001. https://doi.org/10.1371/journal.pbio.3002850.g001 The data obtained in Souris-City are based on the tracking of animals implanted subcutaneously with RFID chips and detected by antennae placed throughout the floor and the tubes connecting the different compartments of the environment (see Methods). Mice have free access to a nest compartment (N), a food compartment (F), a central compartment that provides access to all other compartments (C), the stair (S), and finally the T-maze (T). These subdivisions allow animal trajectories in Souris-City to be represented as a sequence of residency times within a compartment and transitions between sub-compartments [35,44]. The circadian rhythm of the group emerged from the measurement of pooled activity estimated from RFID detection at the level of the transition tubes during the WS period (Fig 1C). As expected, the mice are more active (and therefore more frequently detected moving between sub-compartments) during the dark phase (7 PM to 7 AM) than during the light phase (7 AM to 7 PM). The time spent by mice in a given compartment during the WS period varied between tens of seconds to hours, with the shortest visits, mainly found in the central compartment, corresponding to transition episodes (Fig 1D, bottom). Time residency in the nest sub-compartment shows a bimodal distribution, with the longest occupancies observed in the environment lasting more than 2 h. The distribution of long occupancy episodes, which took place mainly in the nest compartment, shows that they occurred mostly during the light period (Fig 1D, top), thus they can be interpreted as sleeping episodes. The distribution of transition probabilities from one compartment to another (Fig 1E) reveals a preponderance of transitions from the central to the food compartment (median = 38%) over transitions from the central towards the nest compartment (33%, Wilcoxon signed rank test p < 2.2e-16) or towards the stair (29%, Wilcoxon signed rank test p < 2.2e-16). Furthermore, when animals are in the stair, their probability of entering the T-maze is only 34%. This relatively low rate reflects the fact that animals enter the stair without necessarily succeeding to enter the T-maze, and then return to the main environment. Mice enter the T-maze an average of 14.7 times per day during the WS period (Fig 1F, left); however, the distribution is skewed with a median at 13.7 and a peak at 11 times per day and a long tail indicating that some mice can enter more than 30 times per day. Interindividual variability in the number of entries is also illustrated by the divergence in the cumulative number of trials over time (Fig 1F, right, example of n = 55 individual), which showed stability and consistency in the temporal frequency of T-maze entries. In restricted access sessions, when a mouse chooses one side the access to the bottle on the other, non-chosen, side is closed, so that the mouse has access to only the bottle on the chosen side. The mouse will only be able to access the bottle on the non-chosen side if it leaves the T-maze and returns for another trial, which will reopen access to both bottles, as well as reopening access for other mice to enter. In half of the experiments (16/32 corresponding to n = 132 animals), fluid consumption was estimated for each trial (i.e., each passage of a mouse in the T-maze). By comparing the average difference in liquid change between the chosen side and the non-chosen side per day and per animal (Fig 1G, n = 132), we find a significant difference in the amount of liquid dispensed depending on the arm chosen. We thus estimate the consumption of the animals by subtracting the loss of liquid measured by the system in the non-chosen side (resulting from evaporation, noise, etc.) from the change in fluid volume measured from the bottle in the chosen side, which results in an average consumption of approximately 3.9 ml per day per mouse (Fig 1G). These first analyses describe a set of average behaviors, accessible from the analysis of events captured by RFID antennae or consumption sensors. They begin to reveal an organization of behaviors with important variations in their expression depending on the individual mouse.

Multidimensional analysis of reward-seeking behavior reveals that mice adopt idiosyncratic strategies in the T-maze In the T-maze, mice (n = 281) voluntarily performed a relatively simple decision-making task: whether to make a left or right turn to access a liquid reward. In the WS test sessions, one drinking area at the end of one of the T-maze arms contains water, and the other contains a sucrose solution. Each entrance into the T-maze, and the subsequent choice of which side to access, is considered a trial (Fig 2A, top). The sides of the sucrose and water bottles are swapped every 3 to 4 days, with each swap defining the beginning of a new session. The behavior of the mice in the T-maze was assessed by 5 variables that quantify the animals’ choice across different time scales throughout the entire WS experimental period. The level of global switching is estimated by the variable Switch, which takes all WS sessions into account and gives an overview of the probability of choosing of one side compared to the other. This probability is renormalized so that 100% corresponds to an equivalent number of visits to both sides, while 0 corresponds to an individual who visits only one side. The variables SwWat and SwSuc evaluate the choices of the animals at the trial level (i.e., going left or right). They represent the probability of switching sides if the previous choice was water or sucrose, respectively. Finally, the Pref and SideBias variables assess sucrose preference (probability of sucrose choice) and side bias (probability of choices on one side) by comparing the choices between each session (i.e., whether the sucrose is on the left or right). Despite some recurring patterns, there are considerable variations in these parameters between mice (Fig 2A, bottom). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Mice exhibit interindividual differences in choice strategies in the T-maze. (A) Top: one trial is considered to be one choice between left or right side in the T-maze. Bottom: Value of the 5 parameters that describe mice sequence of choice in the T-maze during SW sessions (n = 281 mice): the level of global switching (Switch), the probability of switching sides if the previous choice was water (SwWat) or sucrose (SwSuc), the preference (Pref) and side bias (SideBias) on each session. Top and bottom value correspond to the min (bottom) and max (top) value for each parameter. (B) Archetypal analysis of the choice strategies based on the 5-dimensional data space. Top: Visualization of the α coefficients using a ternary plot. Each point represents the projection of an individual (n = 281 mice) onto the plane defined by a triangle where the 3 apices represent the 3 archetypes: Tracker (Tr, purple), Explorer (Ex, blue), and non-Switcher (NS, green). Points are color-coded according to their proximity to the archetypes. Bottom: Histograms showing the 3 archetypes’ percentiles for each choice parameter. Right: Examples of 3 sequences of choice made by 3 mice close to the archetype. Sucrose position alternates across sessions between the left (light purple) and the right (light orange) side. Cumulated choices across trials are calculated with a positive (+1) or negative (−1) increment when the left or right side is chosen, respectively. The mouse i, j, k (from top to bottom corresponds respectively to a Tr, Exp and NS profile (see their projection in the ternary plot)). (C) Number of trials per days (left) and percentage of sucrose side choice (right) for the 3 archetypes (pairwise Wilcoxon tests with Holm correction). (D) Daily sucrose consumption for the 3 archetypes (pairwise Wilcoxon tests with Holm correction). (E) Repartition of archetypes per experiment showed that they are not evenly represented in each group (N = 32, red dot indicated mean values, left) and built theoretical densities expected for each archetype based on a random draw from mean groups sizes (Bandwidth = 0.1, n = 10,000, right). Data can be found here https://zenodo.org/uploads/13374058. https://doi.org/10.1371/journal.pbio.3002850.g002 To better describe this variability, we used archetypal analysis, an unsupervised approach for identifying behavioral clusters [38,45,46]. It depicts individual behavior as a continuum within an archetypal landscape defined by extreme strategies: the archetypes. The five-dimensional data set characterizing individual responses was used to identify 3 archetypal phenotypes. Individual data points are thus represented as linear combinations of extrema (vertex corresponding to archetypal strategies) of the data set, i.e., each mouse is represented by a triplet of α coefficients describing the archetypal composition and can be visualized with a ternary plot (Fig 2B, top). The 3 archetypes distinguish Trackers (Tr) who track the sucrose position (Fig 2B, top right, see mouse i as an example), from Explorers (Ex) who choose almost randomly between the left and right side on each trial (Fig 2B, middle right, see mouse j) and Non-Switchers (NS) who choose the same side throughout the majority of the sessions (Fig 2B, bottom right, see mouse k). Subsequent analysis highlights that these 3 profiles are distinguished not only by the choice parameters in the T-maze (used in their construction), but also by sucrose consumption and number of entries in the T-maze, reinforcing the definition of the profiles as personality-like categories. Trackers enter the T-maze frequently (i.e., high number of trials per day), whereas Non-Switchers rarely enter it (Fig 2C, left). In terms of choice, Trackers go most often to the sucrose side (Fig 2C, right) and consume more sucrose than the others (Fig 2D). Finally, the 3 profiles are distributed across the different experimental groups (N = 32) tested (mean = 8.8 mice per group, min = 5, max = 10) with an average proportion in a group of 37.4% for Trackers, 44.6% for Explorers, and 18.0% for Non-Switchers. The observed distribution of proportions within a group (Fig 2E, left) is consistent with a random sampling (Fig 2E, right) of a profile for each animal (with the corresponding probability) within group sizes similar to those obtained experimentally.

Reward-seeking strategy in isolation correlates with behavioral trait variation in the social compartment Behavior in the T-maze is fundamentally different from behavior in the main environment. In the T-maze, animals are isolated from any direct influence from other animals and are left to make their own decisions. This is not the case for behavior in the main environment, where all behaviors are potentially subject to the consequences of social interactions. Because of these strong differences in context, we wondered whether the differences in strategy observed in the T-maze would also correspond to behavioral differences in the main environment, suggesting that mice strategies can serve as a marker of individual profiles across multiple levels of analysis. The analysis of mouse behavior in the main environment is based on their detection by antennae located on the 3 transition tubes between the compartments. Compared to mice with other archetypally defined profiles, Trackers showed an increase in the average number of tube antenna detections per day (NbD, Fig 3A). They also have a reduced probability of transitioning from Nest to Food compartments (%NtoF, Fig 3B). A strong inverse correlation was observed between NbD and the %NtoF across all mice, regardless of their archetypal profile (Fig 3C). This suggests a more nuanced relationship between these variables than what can be captured by simple group statistics and reinforces the idea that mice can be individually defined by their behavioral repertoires, indicative of a profile for each mouse. Because the archetypal framework defines each individual as a linear combination of the 3 possible profiles, this analysis can be further refined by introducing the notion of distance from the archetype. The archetypal composition (i.e., given by α k with k the archetype, Fig 3D) reflects this distance: its value is between 0 and 1, with 1 being if the mouse is exactly on the archetype, and 0 if it is on the opposite side of the archetypal space. We found that NbD increases across mice as their composition approaches the pure Tracker archetype, while their %NtoF decreases (Fig 3E, top). However, these 2 relationships are reversed for the Explorer archetype composition, such that NbD decreases and %NtoF increases as the composition of the mice approaches the pure Explorer archetype (Fig 3E, bottom). These correlations reflect both profile differences and environmental constraints (i.e., the structure of the settings) on behavioral expression. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Archetypes defined by individual choices capture variation in the social cage behavior. (A) Activity in the main environment, estimated by the number of transitions between compartment (NbD), for the 3 archetypes (pairwise Wilcoxon tests with Holm correction, 3 points above 500 were not plotted). (B) Probability of nest to food transition (NtoF) for the 3 archetypes (pairwise Wilcoxon tests with Holm correction). Data can be found here https://zenodo.org/api/records/13374058/draft/files/Fig 3A-B.csv/content. (C) Correlation between NtoF and NbD. (D) Principle of archetypal composition measurement: the archetypal composition (i.e., given by α k with k the archetype) would be equal to 1 if the mouse is exactly at the point of the archetype, and 0 if it is on the opposite side. (E) Correlation (linear regression, a indicating the slope estimate, R2 the Adjusted R-squared and p the p-value) between Tracker (Tr) composition and pNtoF (left) and NbD (right), respectively (top), and between Explorer (Ex) composition and pNtoF (left) and NbD (right), respectively (bottom). (F) Left: Correlation matrix (Pearson correlation coefficient) of main environment variables and archetypal profile. Right: p-value for correlations. Green: p < 0.05, Black: p > 0.05. Variables: Activity Levels: Number of Detections (NbD), Entropy (EnA); Probability of Transitions: Stair to T-maze (StoT), Nest to Food (NtoF), Center to Food (CtoF), Center to Nest (CtoN), Food to Nest (FtoN), Nest to Stair (NtoS), Food to Stair (FtoS), Center to Stair (CtoS), Stair to Nest (StoN), Stair to Food (StoF); Occupancy: percent time in Food compartment (%F), percent time in Nest compartment (%N), percent time in Center compartment (%C), percent time in T-Maze compartment (%T), percent time in Stair compartment (%S); Archetypes: Explorer (Ex), Non-Switcher (NS), Tracker (Tr). * and ° indicates correlation between Tr and Ex composition with pNtoF and NbD shown in (E). X indicates correlation shown in C. https://doi.org/10.1371/journal.pbio.3002850.g003 We next systematically analyzed the linear correlation between an individual’s archetypal composition and specific behaviors in the main environment (Fig 3F, right). Three categories of variables were used to describe activity levels (Fig 1C), compartment occupancy (Fig 1D), and transitions, respectively (Fig 1E). Robust correlations were found between the variables describing these categories and the archetypal compositions, with the pattern of these correlations also discriminating between archetypal profiles (Fig 3F, left). Interestingly, Explorer and Tracker archetypes often exhibit correlations with the same behaviors; however, these correlations are inversely related. For example, Explorer composition was positively correlated with the transition from the central to the food or the nest compartment, and negatively correlated with the transition from the central to the stair compartment. In contrast, Tracker composition was positively correlated with the transition from the central compartment to the stairs, and negatively correlated with the transition between the central and nest or food compartments. This suggests that Explorer and Tracker could be construed as contrasting profiles within the primary environment. Non-Switchers display a profile that is markedly distinct from the other 2 profiles (Fig 3F, right). They are characterized by a preference for the nest compartment (S1A Fig). Comparing the archetypes reveals that Trackers are the least likely to go to the food compartment, either directly from the central compartment (S1B Fig) or by passing through it from another compartment (S1C Fig) where they spend less time than the other archetypes (S1A Fig). Explorers are the most likely to transition from the Stair to the T-maze, although the time spent in the stair (%S) is not different, suggesting they wait less in the Stair to enter the T-Maze (S1A and S1B Fig). Non-Switchers, in particular, showed a propensity to transition from the Stair to the Nest or Food compartments (via the Central compartment), indicating that these mice are the least likely to re-enter the Stair to engage in the task (S1D Fig). Archetypal groups did not differ in their entropy, despite differences in the number of tube detections, suggesting that while Trackers move more between compartments, all mice share roughly the same territory within Souris-City (S1D Fig). Overall, these analyses suggest that individual mouse profiles extend beyond variations solely within reward seeking strategies in the T-maze to also encompass differences in activity within the main compartment. These relationships between individualistic strategy development and trait expression can be considered as a foundation of mice “personalities,” which emerge as adaptive responses to their complex environment.

Distinct reward-seeking profiles are defined by individual differences in learning rate and sensitivity to value Because decision-making strategy is a good marker for individual profiles of mice in Souris-City, we next aimed to decompose the latent variables that individual mice use to define their strategy. The decision-making process of an individual mouse in the T-maze can be seen as a series of binary choices between going left or right in an unpredictable environment. It is assumed that the animal learns the value assigned to each option (left or right) and that it adapts to the change (every 3 to 4 days) in the position of the rewards (sucrose or water). In this context, we fitted each individual’s choice data with a standard reinforcement learning model [47], which uses the sequence of choices to estimate the expected value of each option for each trial. The reward value for the water option was set to 1, and the sucrose option was set to 3, for a ΔV of 2. The value of the chosen option (V L or V R for left and right side, respectively) was updated after each trial using a reward prediction error rule (see methods) and a learning rate α that sets how rapidly the estimate of expected value is updated on each trial. Given expected values for both options, the probability of choosing the right option P R (t) is computed using a SoftMax rule with 2 parameters: the inverse temperature parameter β which represents the sensitivity to the difference of values and a choice perseveration parameter χ that captures short-term tendencies (previous choice) to perseverate or alternate (when positive or negative, respectively; Fig 4A, left). This propensity to alternate is independent of the reward history [48], and thus does not depend on ΔV. We fitted the choice data of each mouse with this model and obtained triplet of latent variable values (α, β, and χ; Fig 4A, right) for each individual (see Methods). The 3 archetypes extracted from the sequence of choice corresponded to different combinations of α, β, and χ (Fig 4B). These parameters also correlate with the number of trials in the T-maze (S2A Fig, α: p = 0.002; R2 = 0.03; β: p = 7e-8, R2 = 0.1; χ: p = 2e-5, R2 = 0.06), indicating that these latent variables capture information that is not directly linked to the decision process. Finally, when we use this model to simulate data under the constraints of experimental trial sequences (same number of trial) and rewards (water/sucrose, right or left), we can differentiate the same 3 types of profiles that we find in the experimental data (Fig 4C). One question, however, is whether the estimated latent variables (Fig 4A) or the dynamics of the choices (Fig 4C) simply reflect a difference in the number of trials, as individuals who entered the T-Maze the least were indeed less likely to find the sucrose. To test this hypothesis and to decorrelate our results from a possible difference due to a variation in the number of trials, we modeled the behavioral profiles (n = 281 mice) from the latent variables (α, β, and χ, 1 triplet per mice) with 6 alternating sessions (sucrose/water left or right) of 50 trials each. Our model with 3 latent variables explained the phenotypic variables (Switch, SwWat, SwSuc, Pref, and Side Bias) very well, regardless of the number of trials (S2B Fig). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. Computational modeling suggests that decision and learning parameters differ between the 3 archetypes. (A) Left: Principle of the reinforcement learning and SoftMax model, with 3 Latent variables α (the learning rate), β (inverse temperature or sensitivity to the difference of values ΔV), and χ (the choice perseveration). The theoretical value of water and sucrose are set to 3 and 1, respectively. Right: Estimated values of α, β, and χ for n = 281 mice. (B) Latent variables according to the Tr, Ex, and NS archetype (left symbol: mean ± SEM, Wilcoxon tests with Holm correction, right: individual value per mice). Data can be found here https://zenodo.org/api/records/13374058/draft/files/Fig4A-B.csv/content. (C) The model recapitulates the profiles drawn from experimental data (same example as in Fig 2B) when fitted with individual triplets values for the latent variables of each individual of a specific archetype. (D) Comparison of the mean of 5 variables (Switch, SwWat, SwSuc, Pref, SideBias) for Tr, Ex, and NS archetype obtained for 5 differences in the value (ΔV, one side as a value of 1) associated with the choice (6 sessions of 50 choices, simulated with fitted values of α, β, and χ, n = 281). https://doi.org/10.1371/journal.pbio.3002850.g004 Having demonstrated the validity of our model, we next questioned the relative importance of each latent variable in explaining the observed results by designing an attribution study in which each of the latent variables (α, β, or χ) is manipulated independently from the other. We then simulated the data and compared the results obtained using (i) the latent variable values estimated for each mouse associated with each profile; (ii) by randomly picking latent variable triplets from the set of estimated values; and (iii) by using 2 estimated values for each individual and randomly picking the third from the set of possible values obtained (S2C Fig). This analysis reveals that the χ parameter plays a minimal role in the differentiation of the profiles. In other words, the choice perseveration parameter does not contribute significantly to the observed variations within the profiles. Finally, when we simulate data with randomized parameters (α, β, and χ) while keeping the experimental numbers of trials from each individual’s experimental data, we do not replicate the difference between profiles. This simulation supports the conclusion that the behavioral profile cannot be attributed to variations in the number of trials (S2C Fig, gray points). Overall, the Tracker archetype is associated with a high α value and low β, consistent with individuals who are able to quickly update their value representation and thus favor sucrose tracking. The Explorer archetype is characterized by an intermediate α and β, and thus an important level of switching from one trial to the other. In contrast, the Non-Switchers are associated with low α and high β, a combination that favors profiles in which the animals remain mainly on one side, in particular the side with the highest initial value representation (this representation is updated slowly due to a low α). Indeed, the majority of Non-Switchers showed a bias for the side on which they first encountered the sucrose (33/51) or the side where they found sucrose for the majority of their first choices and (2/51) (S2D Fig). The other Non-Switcher mice (16/51) chose the side corresponding to their initial preference during the WW session.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002850

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/