(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.

(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:https://journals.plos.org/plosone/s/licenses-and-copyright

------------

Ergodicity-breaking reveals time optimal decision making in humans

['David Meder', 'Danish Research Centre For Magnetic Resonance', 'Copenhagen University Hospital Amager', 'Hvidovre', 'Copenhagen', 'Finn Rabe', 'Neural Control Of Movement Lab', 'Eth Zurich', 'Zurich', 'Tobias Morville']

Date: 2021-10

Ergodicity describes an equivalence between the expectation value and the time average of observables. Applied to human behaviour, ergodic theories of decision-making reveal how individuals should tolerate risk in different environments. To optimize wealth over time, agents should adapt their utility function according to the dynamical setting they face. Linear utility is optimal for additive dynamics, whereas logarithmic utility is optimal for multiplicative dynamics. Whether humans approximate time optimal behavior across different dynamics is unknown. Here we compare the effects of additive versus multiplicative gamble dynamics on risky choice. We show that utility functions are modulated by gamble dynamics in ways not explained by prevailing decision theories. Instead, as predicted by time optimality, risk aversion increases under multiplicative dynamics, distributing close to the values that maximize the time average growth of in-game wealth. We suggest that our findings motivate a need for explicitly grounding theories of decision-making on ergodic considerations.

How people take risks is central to our understanding of how they make decisions. Theories of decision making commonly assume that preferences for risk are like personality traits, being both idiosyncratic to individuals and stable over time. A new theory based on the thermodynamic concept of ergodicity predicts that risk preferences should be determined by the dynamical settings that people make decisions in. We show that a simple manipulation of the dynamics of a gambling game exerts a strong and systematic effect on people’s willingness to take risks. The level of risk taking and how this changed with different dynamics was quantitatively predicted from first principles within ergodic theory. We show that existing theories of decision making cannot adequately account for these changes in risk preference. This work is relevant across the behavioral sciences insofar as it challenges the validity of one of the most widespread assumptions in modern decision theory.

These examples highlight the fact that time optimal behavior relies on agents adapting their utility functions according to the dynamics of their environments. Time optimality here refers to the optimality of a behavioral strategy in maximising the time average growth rate of wealth. A strategy or utility function that affords maximising the time average growth rate of wealth is thus said to be time optimal (this is distinct from notions of optimality which refer only to the consistency of choices). In contrast to time optimality, prevailing formulations of utility theory, including expected utility theory[ 5 , 6 ] and prospect theory[ 7 , 8 ], are not premised on the dynamics of the environment. In treating all possible dynamics as the same, these formulations imply that utility functions are indifferent to the dynamics. Since standard decision theories assume stable but idiosyncratic utility functions, whereas time optimality prescribes specific utility functions for specific dynamics, the two classes of theory make substantively different predictions. Here we manipulated the ergodic properties of a simple gambling environment, by switching between gambling for additive increments of money versus gambling for multiplicative growth factors, evaluating the effect this has on the utility functions that best account for choices. We found evidence that gamble dynamics impose a consistent effect on utility functions, with participants shifting from approximately linear utility functions in the additive session, to approximately logarithmic utility functions in the multiplicative session. This effect was better predicted by a time optimal model compared to mainstream utility models.

However, not all dynamics that individuals face are additive. Some dynamics in the environment are multiplicative, for instance. Examples of multiplicative dynamics include stock market investments, compound interest on savings, and the spread of infectious diseases. The time average growth rate of wealth under a multiplicative dynamic is calculated as the exponential growth of wealth per unit time (Eq 5 in Methods ). Settings with multiplicative wealth dynamics have non-ergodic wealth changes, which means that the expectation value of changes in wealth no longer reflects the time-average growth. Indeed, there are gambles in which changes in wealth have a positive expectation value, but have a negative time average growth rate[ 4 ]. A simple example is a fair coin gamble: heads to gain 50% of one’s current wealth, tails to lose 40% of one’s current wealth. Counterintuitively, whilst this gamble has a positive expectation value (where wealth grows by a factor of 1.05 per trial: [1.5 + 0.6]/2), it has a negative time average growth rate (where wealth grows by a factor of ~0.95 per trial: sqrt[1.5*0.6]). Whilst a full explanation of this discrepancy requires invoking the behavior of different types of limit, a more intuitive explanation is that at any one time, the majority of players will have experienced negative time average growth, but a minority of players will experience such extreme wealth growth that this dominates the expectation value. For this gamble however, maximizing expected value eventually leads to ruin for all players. In such multiplicative settings a logarithmic utility function is time optimal, since maximizing changes in expected utility per unit time then maximizes the time average growth rate of wealth[ 3 ] ( Fig 1G ). The reason is trivial in the sense that the time average growth rate is calculated as the average change in logarithmic wealth per unit time.

A , two-sheets (blue and pink) summarise the repeated protocol for both days, which only differ in the dynamics of wealth changes. Numbers indicate durations in minutes. Although three stimuli are shown for illustration, a total of 9 stimuli were used in each session. B , single trial from a passive session, where durations are in seconds and ranges depict a uniformly distributed temporal jitter. C , single trial from an active session. D , wealth trajectories in real time over the course of each passive session. The trajectory for Passive × is plotted on a log scale, appropriate to the multiplicative dynamics. Eight randomly selected trajectories are plotted. Dotted line shows initial endowment level of 1000DKK. E , discrepant trials are a subset of trials, where agents with linear and logarithmic utility functions would be predicted to make different choices. In the example here, an agent with linear utility would choose the left-hand gamble whereas an agent with logarithmic utility would choose the right-hand gamble. F, wealth trajectories of synthetic agents with different utility functions (prospect theory and isoelastic) repeatedly playing the set of additive gambles over one week (For details see S2 Text ). The agent with linear utility has the highest time average growth rate (green). G, equivalent simulations for multiplicative gambles. The agent with log utility has the highest time average growth rate (green). The time optimal agent is an agent with linear utility for additive dynamics, and log utility for multiplicative dynamics, and thus also experiences both wealth trajectories depicted in green (in F and G). We note that the time optimality of an agent’s behavior is independent of the timescale, even if the consequences of time optimality may require a particular timescale to become visible in noisy plots. In F and G, the time optimal utility function is visibly advantageous on the timescale of hours.

In the behavioral sciences, decision making is studied predominantly using experiments with additive dynamics, where choice outcomes exert additive effects on wealth. An agent might gamble on a coin toss for a gain of $1 each time they win, they might score a point each time they correctly execute a motor action, and so on. In these examples, changes in wealth are ergodic, and in such settings a linear utility function is optimal for maximising the growth of wealth over time[ 3 , 4 ]. In other words, for this utility function, when changes in expected utility are maximized per unit time, this maximizes the time average growth rate of wealth ( Fig 1F ). The time average growth rate of wealth under an additive setting is calculated simply as the additive change in wealth per unit time (Eq 4 in Methods ), either over a finite or infinite time horizon.

Ergodicity is a foundational concept in models of physical systems that include elements of randomness[ 1 , 2 ]. A physical observable is said to be ergodic if the average over its possible states, is the same as its average over time. For instance, the velocity of gas molecules in a chamber is ergodic if averaging over all molecules at a fixed time (an expectation value) yields the same value, as averaging a single molecule over an extended period (a time average). In other words, ergodicity ensures an equality between the time average and the expectation value. The relevance of ergodicity to human behavior is that it provides important constraints for thinking about how agents should compute averages when making decisions[ 3 ].

Methods

Ethics statement Informed written consent was obtained from all subjects as approved by the Regional Ethics Committee of Region Hovedstaden (protocol H-17006970) and in accordance with the declaration of Helsinki.

Methods summary We asked whether switching between additive and multiplicative gamble dynamics systematically influences decision making under risk. Specifically our objective is to investigate how existing utility models—primarily prospect theory and isoelastic utility[9]—perform in comparison to a null model of time optimality in explaining choice behaviour under different dynamics. In an experiment that spanned two separate days, each subject engaged in a gambling paradigm with either additive or multiplicative dynamics in their in-game wealth (dynamics, hence). At the start of each day, participants were endowed with an initial wealth of 1000DKK / ~$155 (Fig 1A), after which they took part in a passive session during which they had an opportunity to learn, via observation, the deterministic effect of image stimuli on their endowed wealth (Fig 1B). On the additive day (Day+) the stimuli caused additive changes in wealth whereas on the multiplicative day (Day×) the stimuli caused multiplicative changes to their wealth (Eqs 1–5, S1 Fig in S1 Text). Different stimuli were used for the two different days and the association between the stimuli and the change in wealth was randomized between subjects. Having repeatedly observed these contingencies between the stimuli and the changes in wealth, subjects subsequently engaged in an active session during which they repeatedly chose between two gambles composed of pairs drawn from the same set of stimuli (Fig 1C, Eqs 6–9). Upon choosing a gamble, each of the two stimuli had a 50% probability of being the outcome of the gamble. Subjects understood that the gamble outcomes were not revealed during the game, and that 10 of the outcomes of the chosen gambles would be randomly realised at the end of each day and applied to their current in-game wealth. The dynamics exist in the active session insofar as the fractals impose dynamical changes in wealth when they are realised at the end of the experiment. There were four sessions in total per subject, Passive× and Active× occurring on Day×; and Passive+ and Active+ occurring on Day+. We adopted three complementary analysis strategies: The first is model-independent in the sense that we tested whether choice frequencies change according to different dynamics. The second and third approaches were model-dependent insofar as we formally compare theoretical models of utility in terms of their parameter estimates, and in terms of the predictive adequacy of each utility model.

Subjects and power This paper focuses on the behavioral data obtained from a neuroimaging study on the neural encoding of utility. The criteria for inclusion were being aged 18–50 and fluent in English. The criteria for exclusion were: a history of psychiatric or neurological disorder, credit problems (operationalized via bad pay status on www.dininfo.dk), or expertise in a quantitative or cognitive domain (finance, banking, accountancy, economics, mathematical sciences, computer science, engineering, physics, psychology, neuroscience). Neuroimaging specific exclusion criteria were also applied, including implanted metallic or electronic objects, heart or brain surgery, severe claustrophobia, or inability to fit into the scanner (weight limit of ~150kg, bore diameter of 60 cm). Except for the latter, all such information was self-reported. The intended sample size was 20, however due to post-hoc exclusion (1 participant fell asleep, 1 failed to learn the stimuli) the achieved sample size was 18 (6 females, age: M = 25.79, SD = 4.69, range 20–38). Subjects were recruited as a convenience sample, via the subject recruitment website www.forsøgsperson.dk. The sample number was based on general guidelines for the minimal number of subjects required for medium effect sizes in neuroimaging datasets[10]. The number, timing, and jittering (randomized timing) of events within each session was based on prior efficiency simulations for similar neuroimaging paradigms. As such, no a priori design analyses were performed for the behavioral data only. No stopping rule or interim analyses were performed. Data collection ran from the 10/06/2017 to 30/07/2017. All data was acquired at the Danish Research Centre for Magnetic Resonance. Independent of their payouts in the gambling paradigm, all subjects were compensated 1020 DKK / ~$160 for a grand total of 6 hours of participation over the two days. A forthcoming paper will focus primarily on the neuroimaging data.

Experimental procedure After changing into hospital gowns, subjects were read the instruction sheet (see below and S6 Text). To précis, subjects were truthfully informed that the aim of the experiment was to study how the brain reacts to changes in wealth, that all the money involved is real, and that the total accumulated wealth will be paid out as the sum of that accumulated over the two days (Fig 1A). They then played ~20 demo trials of the paradigm in the scanner control room, including both active and passive sessions (~5mins) for no financial consequence. The experimenter demonstrated what happened if buttons were not pressed in time (Fig 1B and 1C). Subjects were instructed that each day lasts 3 hours in total, with ~60mins for the passive session (inc. time for localiser scan and shim), a short break, then ~60-75mins for the active session (inc. localiser, shim, anatomical scans), with short breaks within the session (Fig 1A). Each subject entered the scanner and was set up with a respiratory belt to monitor breathing and with a pulse meter on the middle or index finger of the non-responding hand. All stimuli were projected under dark conditions onto a screen located within the bore of the scanner (Siemens, MAGNETOM Prisma), and viewed via mirrors mounted to the head coil. Subjects were instructed to always fixate the central fixation cross (Fig 1B and 1C) and choose via button box. The paradigm was presented via the Psychopy2 toolbox (v1,84.2) running on Python (2.7.11).

Experimental design The experiment is a fully crossed randomized controlled trial in which the wealth dynamic is the primary independent variable, and choice in the active task is the primary observable. The wealth dynamic, as well as the deterministic association between stimuli and outcomes was controlled via a computer programme and thus double blinded. Further, since payouts at the end of the test day were subject to being randomly realized from each subject’s choices as well as being statistically balanced between conditions, payout was also effectively double-blinded. Subjects were neither informed of any explicit details concerning dynamics or differences between test days, nor given any reason to expect that the test days were different. The instructions, procedures, and setup were otherwise identical for both test days. The order in which multiplicative and additive test days were conducted was randomized and counterbalanced across the group. Subjects were not able to make notes or use a calculator due to their location inside the brain scanner. Measures collected but not included in this report include all functional and structural neuroimaging modalities, physiological noise measurements (pulse rate and breathing), and reaction times. To ensure good quality model estimation, we recorded many decisions (312 in total per active session) spanning a large subspace (144) of the possible unique gamble combinations. To avoid the problems associated with gambling for "peanuts"[11], the outcomes of decisions are for large quantities of money on each trial (mean possible change in wealth Day× = 413.07 DKK / per decision, SD = 249.78, range = -422.87 to 946.71, mean possible change in wealth Day+ = 267.76 DKK per decision, SD = 119.20, range = -428–428). Subjects were thus strongly incentivized to pay attention to all the stimuli and to optimize their decision-making throughout the active sessions.

Pre-registration and deviations The experimental protocol was preregistered at www.osf.io/9yhau. There was one deviation from the protocol: The preregistration stated that in the Passive+ session, the final additional stimulus applied to their wealth after having returned to 1000DKK (see section “Passive session stimulus sequences” below) would exclude the most extreme stimuli. Those were however included in the paradigm.

Passive session instructions Subjects were instructed in English as follows: "For the passive phase, you will see a number in the middle of the screen, this is your current wealth for the day in kr. When you see a white box around the number, you are to press the button within 1s. (If you do not, you will be instructed to “press button earlier”). Shortly after pressing the button you will see an image in the background, and this will cause your wealth to change. You are instructed to attend to any relationship between the images and the effect this has on your wealth, since in the active phase that follows you will be given the opportunity to choose images to influence your wealth. Learning these relationships can make a large difference to your earnings in the active phase." These instructions were identical on both days to avoid biasing the subject toward any particular strategy. Full subject instructions are provided in S6 Text.

Passive session dynamics Formally the passive session can be described as follows: At the start of each test day, subjects were endowed with an initial wealth x(t 0 ) of 1000DKK, which defined their wealth at the first timepoint, which we denote as t 0 . Independently for each subject, 9 stimuli were randomly assigned (from a fixed set of 18) for Day+, with the remaining 9 assigned to Day×. Each stimulus, viewed at time t was programmed to have a deterministic effect on the subject’s wealth x(t), with the sequence of stimuli causing stochastic fluctuations in wealth (Fig 1D). The sequence of stimuli deterministically caused dynamics in their wealth which can be expressed as: (1) where ⊛ is a wildcard operator, which on Day+ is the addition operator +, and on Day× is the multiplication operator ×. s(t) is a random outcome variable drawn from set S× on Day×, and from set S+ on Day+ (see S1A Fig in S1 Text). This means that the type of wealth dynamic that the stimuli caused was determined by the test day. On Day× under multiplicative dynamics, the outcome s(t) is the realisation of a random multiplier (growth factors) that can range from ~doubling at one extreme, to ~halving at the other (equally spaced on a logarithmic scale). On Day+, under additive dynamics, the outcomes s(t) is the realisation of a random increment, ranging from +428 to -428DKK (equally spaced on a linear scale). Though the dynamics are qualitatively different, to constrain the differences in wealth changes between conditions, we set the bounds of the random increments for Passive+ to the central 85th percentile interval of the absolute wealth changes on Day×.

Passive session stimulus sequences The stimulus sequence was randomized such that wealth levels were constrained to lie in the interval (0 kr, 5000 kr) at all times. This was achieved by presenting each of the 9 stimuli 37 times (and the ensuing effect on wealth), thus generating a set of 333 stimuli. The sequence order was randomized without replacement. Any sequence that resulted in a partial sum larger than 5000 or lower than 0DKK, would be rejected and another random sequence generated. This was necessary to render the experiment subjectively plausible, and to avoid debts which for ethical reasons could not be realised. Since each stimulus was presented with equal frequency, at the end of these 333 trials in the additive condition, the finite time average additive growth rate was zero kr per unit time. Equivalently, at the end of the 333 trials in the multiplicative condition, the finite time average multiplicative growth rate amounted to a growth factor of one per unit time. Thus, at the end of these 333 trials, in both conditions subjects had returned to their initial endowed wealth of 1000 DKK. One additional stimulus was then shown and applied to their wealth, meaning that all subjects had a randomly determined wealth level, as they had been informed (Fig 1D).

Passive session wealth trajectories and growth The wealth at the end of the Passive+ session can be calculated as: (2) and for the Passive× session as: (3) where, in both equations, s(τ) is the random outcome variable in round τ, and T is the total number of trials in the passive sessions. The finite time average growth of wealth on Day+ can be calculated as: (4) where Δx = x(t 0 +Tδt)−x(t 0 ), and Δt = Tδt. On Day× this is calculated as: (5) This design ensured substantial opportunity for subjects to learn the causal effects of each stimulus, whilst also not accumulating extremely high or low wealth levels.

Active session instructions After the passive session, the subjects had a short break of ~5mins outside of the scanner before returning to engage in an active choice task in which they repeatedly decided between two different gambles composed of the stimuli they had just learnt about (Fig 1A). Subjects were instructed as follows: "With the money accumulated in the passive phase, you will play gambles composed of the same images. In each trial, you will be presented with two of the images that you have learned about in the passive phase. By pressing the buttons in the scanner to move a cursor, you now have the option to choose to either: a) Accept gamble one, in which case you will be assigned one of the two images, each with 50% probability (not shown), or… b) Accept gamble two, in which case you will be assigned one of the two images, each with 50% probability (again not shown). The outcomes of your gambles will be hidden from you, and only 10 of them will be randomly chosen and applied to your current wealth. You will be informed of your new wealth at the end of the active phase. You can keep any money accumulated after the active phase. If you do not choose in time, then we will give you one of the worst images, it is recommended that you always choose in time." These instructions were identical on both days to avoid biasing the subject toward any particular strategy.

Active session gambles As shown in Fig 1C, within a trial, subjects first saw the first gamble of a pair of gambles. This gamble is composed of two stimuli on the left-hand side of the screen, each of which they knew has a 50% chance of being applied to their wealth should this gamble be chosen and realised at the end of the day. We refer to this as the left gamble, Q(Left). 1.5–3 seconds later (uniformly distributed), on the right they saw another two stimuli, here comprising the right gamble Q(Right). In a two alternative forced choice, on each trial, subjects choose via button press between gamble Q(Left) and Q(Right). Formally the gambles are: (6) (7) Choosing between two gambles eliminates any confounds caused by potential preferences for or against gambling. Note that all probabilities are equal and correspond to a fair coin, such that these are easily communicated and to control for any probability distortion effects. The outcome of each gamble was hidden from subjects to avoid subjects being "conditioned" to prefer stimuli as a function of the stochastic pattern of previous outcomes. This also prevents mental accounting, where subjects keep track of what they have earnt, which introduces idiosyncratic path dependencies between subjects.

Active session growth rates For any gamble we can calculate its time average growth rate. The time average additive growth rate for the left-hand gamble is: (8) and equivalently for the right-hand gamble. The time average multiplicative growth rate for the left-hand gamble is: (9) and equivalently for the right-hand gamble. Note that the angled brackets indicate the expectation value operator. Also note that since there were no numerical or symbolic cues at this point, any decision could only be based on their memory of each stimulus (Fig 1C).

Active session gamble space On any test day, for any one gamble, there are 81 possible combinations of stimuli (92, see S1B Fig in S1 Text), and 6561 possible pairs of gambles (812). This gamble-choice space is too large to exhaustively sample, and contains many gambles that do not discriminate between our hypotheses, and thus we imposed the following constraints: All gambles should be mixed (composed of a gain and a loss), and no two stimuli presented in one trial should be the same, this reduces the gamble choice space down to 144 unique non-dominated choices between gambles: 16 mixed gambles (red text cells, in S1B Fig in S1 Text), paired with 9 other mixed gambles with unique stimuli, gives 16*9 possible gamble pairs. Each of these choices was presented twice, resulting in 288 in total. This restriction of the gamble space thus provides a more efficient means of testing the competing hypotheses of this experiment. Subjects were also presented with 24 No-brainer choices, in which both gambles shared an identical stimulus, but differed in a second. These are otherwise known as statewise dominated choices. In these No-brainer choices, the subject should choose whichever gamble includes the better unique stimulus. This offers a direct means of testing whether subjects could accurately rank the stimuli. One participant (#5) failed to choose statewise dominated gambles with a probability > 0.5 and was excluded from further analysis (S4E Fig in S1 Text). All choices were presented in a random order without replacement.

[END]

[1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009217

(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL: https://creativecommons.org/licenses/by/4.0/

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/