(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

What does the mean mean? A simple test for neuroscience [1]

['Alejandro Tlaie', 'Ernst Strüngmann Institute For Neuroscience', 'Frankfurt Am Main', 'Laboratory For Clinical Neuroscience', 'Centre For Biomedical Technology', 'Technical University Of Madrid', 'Madrid', 'Katharine Shapcott', 'Thijs L. Van Der Plas', 'Department Of Physiology']

Date: 2024-07

We apply this test to two data sets: (1) Two-photon recordings in primary somatosensory cortices (S1 and S2) of mice trained to detect optogenetic stimulation in S1; and (2) Electrophysiological recordings from 71 brain areas in mice performing a contrast discrimination task. Under the highly controlled settings of Data set 1, both assumptions were largely fulfilled. In contrast, the less restrictive paradigm of Data set 2 met neither assumption. Simulations predict that the larger diversity of neuronal response preferences, rather than higher cross-trial reliability, drives the better performance of Data set 1. We conclude that when behaviour is less tightly restricted, average responses do not seem particularly relevant to neuronal computation, potentially because information is encoded more dynamically. Most importantly, we encourage researchers to apply this simple test of computational relevance whenever using trial-averaged neuronal metrics, in order to gauge how representative cross-trial averages are in a given context.

Trial-averaged metrics, e.g. tuning curves or population response vectors, are a ubiquitous way of characterizing neuronal activity. But how relevant are such trial-averaged responses to neuronal computation itself? Here we present a simple test to estimate whether average responses reflect aspects of neuronal activity that contribute to neuronal processing. The test probes two assumptions implicitly made whenever average metrics are treated as meaningful representations of neuronal activity:

We apply this test to two example data sets featuring population recordings in mice performing perceptual tasks. We show that both assumptions were largely fulfilled in the first data set, but not in the second; suggesting that the relevance of averaging varies across contexts, e.g. due to experimental control levels and neuronal diversity. Most importantly, we encourage neuroscientists to use our test to gauge whether averages reflect informative aspects of neuronal activity in their data.

But how well do averages represent the computations that happen in the brain moment by moment? We developed a simple test that probes two assumptions implicit in averaging: Reliability: Neuronal responses repeat consistently enough across stimulus repetitions that the average remains recognizable. Behavioural relevance: Neuronal responses that are more similar to the average, are more likely to evoke correct behaviour.

Neuronal activity is highly dynamic—our brain never responds to the same situation in exactly the same way. How do we extract information from such dynamic signals? The classical answer is: averaging neuronal activity across repetitions of the same stimulus to detect its consistent aspects. This logic is widespread—it is hard to find a neuroscience study that does not contain averages.

Funding: A.T. received funding (for salary) from the Margarita Salas Fellowship (NextGenerationEU) and from the Joachim Herz Stiftung. The salaries of J.M.R., R.M.L., and A.M.P. were funded by the Wellcome Trust (204651/Z/16/Z). T.L.v.d.P. acknowledges support from the Biotechnology and Biological Sciences Research Council (grant number BB/M011224/1), in the form of salary. The salaries of K.S., M.N.H. and M.L.S. were funded by the Max Planck Society. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

In the present study, we provide a simple and widely applicable statistical test to explicitly determine whether cross-trial averages computed in a specific experiment are likely to be meaningful to neuronal information processing, or whether they are more likely to arise as an epiphenomenon with no clear computational function. To this end, our approach formalizes two implicit assumptions inherent in the computation of average neuronal responses, and tests directly whether they hold in a given experimental context ( Fig 1 ). Importantly, these two testable assumptions are not based on our own or other researchers’ views of how neuronal processing might actually work. Rather, they summarize how neuronal activity would need to behave if cross-trial averages reflect information that down-stream brain areas rely on to process information.

One potential reason that this disconnect has not been tackled more actively is that the evidence regarding the functional relevance of trial-averaged responses is quite split. On the one hand, studies highlighting the large inter-trial variability of neuronal responses [ 16 , 17 , 25 – 27 ] suggest that average responses fail to accurately capture ongoing neuronal dynamics. Then there is the simple fact that outside the lab, stimuli generally do not repeat, which renders pooled responses across stimulus repetitions a poor basis for neuronal coding. On the other hand, the fact that perceptual decisions can be altered by shifting neuronal activity away from the average response [ 28 – 31 ] indicates that at least in typical lab experiments [ 32 ], average population responses do matter [ 33 ]. Such widely diverging evidence suggests that cross-trial averages may be more relevant to neuronal computation in some contexts (and brain areas) than in others. This calls for a way to move the debate on their computational relevance beyond the realm of opinion and theory, and instead test this question concretely and practically across different experimental contexts.

While this framework has undoubtedly been useful for characterizing the general response dynamics of neuronal networks, there is a sizable explanatory gap between the general neuronal response preferences reflected in trial-averaged metrics, and the way in which neurons transmit information moment by moment. As such, using trial-averaged data as a proxy to infer principles of one-shot, moment-by-moment neuronal processing is potentially problematic—an issue that has repeatedly been discussed in the field (see for instance [ 21 – 24 ]). However, neuroscience as a field has so far been reluctant to draw practical consequences. A vast majority of neuroscience studies present trial-averaged metrics like receptive fields, response preferences or peri-stimulus time histograms. These metrics rely on the implicit assumption that trial-averaged neuronal activity is fundamentally meaningful to our understanding of neuronal processing. For instance, upon finding that with repeated stimulus exposure, trial-averaged population responses become more sensitive to behaviourally relevant stimuli (e.g. [ 3 , 4 ]), it is implicitly assumed that this average neuronal shift will improve an animal’s ability to perceive these stimuli correctly. In other words, neuroscience as a field seems to suffer from a disconnect between the limitations of cross-trial averaging that we acknowledge explicitly, and the implicit assumptions that we allow ourselves to make when we use cross-trial averages in our work.

Brain dynamics are commonly studied by recording neuronal activity over many stimulus repetitions (trials) and subsequently averaging them across time. Trial-averaging has been applied to single neurons, describing their average response preferences [ 1 – 6 ], and, more recently, to neural populations [ 7 – 9 ]. Implicit in the practice of trial averaging is the notion that deviations from the average response represent ‘noise’ of one form or another. The exact interpretation of such neuronal noise has been debated [ 10 ], ranging from truly random and meaningless activity [ 11 – 14 ], to neuronal processes that are meaningful but irrelevant for the neuronal computation at hand [ 15 – 17 ], to an intrinsic ingredient of efficient neuronal coding [ 18 – 20 ]. Nevertheless, in all of these cases a clear distinction is being made between neuronal activity that is directly related to the cognitive process under study (e.g. perceiving a specific stimulus)—which is approximated by a trial-averaged neuronal response—and ‘the rest’.

Results

We started by examining our two assumptions in a data set that was acquired under tightly controlled experimental settings. Data set 1 consists of two-photon calcium imaging recordings in primary and secondary somatosensory cortex (S1 and S2) as mice detected a low intensity optogenetic stimulus in S1 [34] (Fig 2A). Mice were trained to lick for reward in response to the optogenetic activation of 5 to 150 randomly selected S1 neurons (‘stimulus present’ condition). On 33% of trials, there was a sham stimulus during which no optogenetic stimulation was given (‘stimulus absent’ condition). Simultaneously, using gCAMP6s, 250–631 neurons were imaged in S1 and 45–288 in S2. Notably, in S1, the stimulus directly drives the neuronal response, skipping upstream neuronal relays.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Single-trial responses are stimulus-specific for Data set 1. A) Animals report whether they perceived the optogenetic stimulation of somatosensory neurons (S1) through licking to receive reward. This panel was originally published in [34], under an open-access CC-BY license by the copyright holder. B) Trial-average population responses (‘templates’) for S1 (orange) and S2 (brown), under optogenetic stimulation (top) or no stimulation (bottom) conditions. Neurons are sorted the same under both conditions. C) Distribution of the correlations between single-trial responses and the matching (left) and non-matching (middle) trial-averaged response templates. Box: 25th and 75th percentile. Center line: median. Whiskers: 10th and 90th percentile. Dotted lines: median of surrogate data, which were generated by randomly sampling based on neurons’ trial-averaged response probabilities for the correct template. The difference between the correlations to the matching and non-matching templates gives the Specificity Index (right). https://doi.org/10.1371/journal.pcbi.1012000.g002

To probe the computational role of averages within this tightly controlled setting, we first computed average population responses for the two experimental conditions. Since individual stimulation intensities were often only presented in a small number of trials, we pooled all stimulation intensities into the ‘stimulus present’ condition (the high correlations between the average responses to different stimulation intensities are shown in S2 Fig).

Average response templates were computed as the mean fluorescence (ΔF/F) of each neuron in a time window of 0.5 s following the stimulation offset (Fig 2B). We next quantified how well single-trial responses matched the corresponding average template (stimulation present or absent) (Fig 2C, left; see also [35]). To this end, we computed linear correlations between the single-trial and trial-averaged population responses. While in principle, single-trial responses could reflect the corresponding average template in a multitude of ways, including multi-dimensional and/or non-linear relations, linear correlations are the correct way to capture their match if one accepts the assumptions underlying cross-trial averaging. Averages are based on linear computations (sums and rescaling), which implicitly assumes that the single-trial responses subsumed in the average differ from each other linearly and along one dimension—otherwise, pooling them in a linear average would not be a suitable approach.

Single-trial correlations were mostly positive in both S1 and S2 (Fig 2C) (n = 1795 trials; p < 0.001), suggesting that single-trial responses represented the average template quite faithfully. To assess if single-trial responses can be regarded as ‘noisy’ versions of the average template, we computed bootstrapped surrogate responses for each trial based on the neurons’ average response preferences. Specifically, we created surrogate data for each trial by drawing a fluorescence value per neuron from its overall distribution of fluorescence values across all trials of one stimulus condition (for details, see S4 Fig and section Surrogate Models in Methods). The shuffled surrogate data correlated equally well to the template as the original data in both S1 and S2 (Fig 2C, left), suggesting that in Data set 1, single-trial responses can be interpreted as mostly faithful random samples of the respective average template.

Correlations between single-trial responses and the population template may partially stem from neurons’ basic firing properties, which would not be task-related. To estimate the stimulus specificity of the correlations we observed, we also computed single-trial correlations to the incorrect template (e.g. ‘stimulus absent’ for a trial featuring optogenetic stimulation). Correlations to the incorrect template were significantly lower than to the correct one (Fig 2C, middle, Mann-Whitney U-test, p = 5.98 ⋅ 10−169, p = 4.86 ⋅ 10−51 for S1 and S2, respectively). To quantify this difference directly, we defined the Specificity Index, which measures, on a single-trial basis, the excess correlation to the correct template compared to the incorrect template. Thus, the Specificity Index quantifies to what extent neuronal activity in an individual trial relates to the average response of the relevant experimental condition, compared to the average responses for other experimental conditions. Since it subtracts two correlation coefficients from each other, it is bounded between -2 and 2.

Note that while single-trial correlations to the average template scale directly with the amount of inter-trial variability in a data set as well as the baseline firing rate in an individual trial, the Specificity Index is largely independent of these variables, because it reflects the differential match of single-trial responses to the correct versus incorrect template (see S6 Fig). This also means that the Specificity Index can be applied to trials that may not correlate well to cross-trial averages simply due to low spike counts. Since the correlations to two different cross-trial averages are compared to each other, higher or lower baseline correlations should not contribute to this metric.

For Data set 1, the Specificity Indices of single-trial responses indicate clear stimulus-specificity (Fig 2C, right). In addition, the observed Specificity Indices are highly similar to those reached by the corresponding surrogate data. This suggests that in Data set 1, single-trial responses can be seen as a somewhat noisy representation of the respective cross-trial average. Together, these results indicate that single-trial responses in Data set 1 were strongly and selectively correlated to the corresponding average template, largely fulfilling Assumption 1.

Next, we set out to test if the correlation between single-trial responses and average templates predicted the animal’s licking behaviour (see Fig 3A, left, for an example session). To this end, we separately examined the single-trial correlations in trials that resulted in hits, misses, correct rejections (CRs) and false positives (FPs) (Fig 3A, right). For the trials where optogenetic stimulation was present, single-trial correlations in S1 were significantly higher in hit trials than in miss trials, suggesting that a better match to the average template did indeed produce hit trials more often (Fig 3B, Mann-Whitney U-test, p = 4.96e − 68). Similarly, while single-trial correlations were overall lower in the absence of optogenetic stimulation, correct rejections nevertheless featured significantly higher correlations than false positives (Fig 3B, Mann-Whitney U-test, p = 2.82e − 17). The same pattern held true for S2, though overall correlations were marginally smaller and the difference between correct and incorrect trials was somewhat less pronounced (Fig 3B; p = [2.02e − 50, 1.37e − 11] for hit/miss and CR/FP comparisons, respectively). To quantify directly to what extent single-trial correlations predicted behaviour, we computed the Behavioural Relevance Index (Ω) as Ω = max(A, 1 − A), where A is the Vargha-Delaney’s effect size [36] (see the section on Specificity and Behavioural Relevance Index in Methods). The Behavioural Relevance Index quantifies whether successful behavioural responses occur preferentially in trials with a higher Specificity Index. The Behavioural Relevance Index is bounded between 0.5 and 1, with 0.5 indicating complete overlap between the distributions of Specificity Indices for correct and incorrect trials, and 1 meaning no overlap at all. For both the trials with stimulation (hits and misses) and without stimulation (CRs and FPs) Ω exceeded 0.5 in S1 and S2 (Fig 3C). This suggests that in both areas, single-trial responses that were better matched to the corresponding cross-trial average resulted in more successful behaviour, fulfilling Assumption 2. Together, these results indicate that in Data set 1, cross-trial averages are both reliable and behaviourally relevant enough be computationally meaningful.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Better template-matching predicts better behaviour. A) Licking times for an example session (left) and for all sessions (right). The stimulation window is shown in blue, the analysis window in pink. This panel was originally published in [34], under an open-access CC-BY license by the copyright holder. B) Reliability of single-trial responses, as quantified by the Specificity Index, split out by hits, misses, correct rejections and false positives. C) Behavioural Relevance indices for these categories. https://doi.org/10.1371/journal.pcbi.1012000.g003

Building on these results, we set out to determine how computationally meaningful cross-trial averages might be within the less restrictive experimental paradigm of Data set 2. Data set 2 contains high-density electrophysiological (Neuropixel) recordings across 71 brain regions (Fig 4A, right) in mice performing a two-choice contrast discrimination task [31]. Mice were presented with two gratings of varying contrast (0, 25, 50 or 100%) appearing in their left and right hemifield. To receive reward, animals turned a small steering wheel to bring the higher-contrast grating into the center, or refrained from moving the wheel if no grating appeared on either side (Fig 4A, left). When both stimulus contrasts were equal, animals were randomly rewarded for turning right or left. Those trials were discarded in our analysis since there is no ‘correct’ behavioural response.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. Single-trial responses are hardly stimulus-specific for Data set 2. A) Graphic representation of paradigm used in Data set 2. Animals move a steering wheel to move the higher-contrast grating of two alternative grating stimuli towards the centre (left), while being recorded from 71 brain areas (right). Note that grating depicted here does not accurately represent the grating stimuli used. B) Stimulus and target choice information decoded by a multinomial GLM decoder (Methods) from the neuronal activity in all recorded brain areas. Each point represents the median (dot location) and standard deviation across sessions (dot size) of one brain area (see in-figure labels). Colours (blue, red, purple) represent those areas where (stimulus, choice, both) information was above an elbow criterion. C) We repeated the decoding with other models (see labels) and then performed a hierarchical clustering of the total mutual information of the ranked brain areas (rows). The 14 areas we found with the GLM (see B) are consistently found with other decoders. D) Specificity Index of the selected areas, defined as the difference in the correlations between single-trial responses and the matching (cartoon, left) and non-matching (cartoon, right) trial-averaged response templates. Box: 25th and 75th percentile. Center line: median. Whiskers: 10th and 90th percentile. Shaded areas: 5th and 95th percentiles of bootstrapped data. Dotted lines: median Specificity Index for the bootstrapped surrogate data, which were generated for each recorded area using Poissonian sampling of the trial-averaged response templates. https://doi.org/10.1371/journal.pcbi.1012000.g004

Since this data set contains neuronal recordings from 71 brain areas, not all of which may be directly involved in the perceptual task at hand, we used a data-driven approach to identify to what extent neuronal population activity predicted the presented stimulus and/or the animal’s target choice. We trained a decoder (Multinomial GLM, see section Decoders in Methods) based on single-trial population vectors, to identify either target choice (left/right/no turn) or stimulus condition (higher contrast on left/right, zero contrast on both). For the neuronal response vectors, we considered neuronal activity 0 − 200ms post-stimulus onset (S5 Fig). We then computed the mutual information between the decoder predictions and the real outcomes (Fig 4B; see section Decoders in Methods).

Many brain areas appeared to contain little task-relevant information (shown in black in Fig 4B). We therefore used a standard elbow criterion (see section Decoders in Methods) to determine a threshold for selecting brain areas that provided the highest information on either stimulus ( bits; blue areas), choice ( bits; red areas), or both (i.e. both thresholds exceeded; purple areas). These areas seem largely congruent with the literature. For instance, primary visual cortex (VISp) is expected to reflect the visual stimulus, while choice information is conveyed e.g. by the ventral anterior-lateral complex of the thalamus (VAL)—known to be a central integrative center for motor control [37]. As an example of a both choice- and stimulus-informative area, we see caudoputamen (CP)—involved in goal-directed behaviours [38], spatial learning [39], and orienting saccadic eye movements based on reward expectancy [40].

In principle, the selection of relevant brain areas might be dependent e.g. on the specific cut-off thresholds introduced by the elbow criterion we applied. To test the validity of this selection process, we repeated the analysis with five other decoders (Fig 4C). We then ranked the total amount of Mutual Information per area (stimulus + choice information) for each of these models. Finally, we performed a hierarchical clustering to determine areas that were consistently classed as highly informative across decoders. Interestingly, the brain areas identified via the elbow criterion in Fig 4B coincided exactly with the top performing cluster in our multi-decoder analysis (Fig 4C). As such, both analyses converged on a group of 14 brain areas that conveyed significant task information regardless of decoder approach.

Having identified task-relevant brain areas, we used neuronal recordings from those informative areas to test the two assumptions set out in the Introduction. First, we computed the average population response templates for different experimental conditions. To avoid working with trial numbers as low as n = 2 for specific contrast combinations, we pooled several contrast levels requiring the same behavioural response (e.g. 50% right—0% left and 100% right—50% left) into two conditions: Target stimulus on the left or on the right. Average responses to the individual contrast levels were very comparable (S2 Fig).

To test the first assumption, as we did for Data set 1, we quantified how well single-trial responses correlated with the average template for a given stimulus (S3 Fig (A); see also [35]). Median correlations ranged from r = 0.56 to 0.89 across all brain areas (n = 89 to 3560 trials per brain area; all p < 0.001), suggesting that single-trial responses clearly resembled the average template. As a control, we computed 100 bootstrapped response vectors for each trial. Specifically, for each trial we generated the same number of spikes as recorded in the original trial, and randomly assigned these spikes to the various neurons in the population based on the probabilities given by their average firing rates (S3(B) Fig; see section Surrogate Models in Methods). Surrogate data created in this way should converge towards the original data if trial-averaged firing rates represent the true response, which is sampled discretely via a Poisson process in individual trials. These Poissonian surrogate data uniformly correlated better to the template than the original data (S3(A) Fig). In other words, single-trial responses in Data set 2 exhibited more variation than explained simply by (Poissonian) down-sampling of the firing preferences represented by the average template into a specific number of discrete spikes in an individual trial. As in Data set 1, single-trial correlations scaled with the amount of inter-trial variability, reflected for instance by the Fano Factor (S6 Fig). Note that the Fano Factors for all analysed brain areas in Data set 2 fell within the range of previously reported results [41, 42] (S6(B) Fig), suggesting that Data set 2 provides a representative example for neuronal activity in these brain areas.

Next, we estimated the stimulus specificity of the observed correlations by computing single-trial correlations to the incorrect template (e.g. ‘target right’ for the left target; see Fig 4D, top). These were broadly distributed, but on average marginally lower than the single-trial correlations to the correct template (S3(B) Fig). Consequently, Specificity Indices across all 14 brain areas were mostly positive but rarely exceeded 0.1 (Fig 4D, bottom; note that the Specificity Index is bounded by −2 and 2). In other words, correlations between single-trial responses and template were largely stimulus-independent. This lack of response specificity was not directly predicted by the amount of inter-trial variability, because the Specificity Index reflects the differential link to correct versus incorrect template rather than the overall reproducibility of neuronal responses (see S5(B) Fig).

These results tally with recent work demonstrating how strongly non-task-related factors drive neuronal responses even in primary sensory areas like visual cortex [16, 17, 43–47]. However, despite these factors, the animal still needs to arrive at a coherent perceptual choice (e.g. steering right or left, see Fig 5A)—and indeed succeeds in doing so in most trials. To test if trial-averaged templates are relevant to this perceptual decision, we compared single-trial correlations for hit trials (correct target choice) and miss trials (incorrect target or no response). Single-trial correlations were marginally lower in miss trials than in hit trials across most brain areas (Fig 5B). However, their difference was small, leading to Behavioural Relevance Indices between 0.51 and 0.66 (where the Behavioural Relevance Index is bounded between 0.5 and 1). According to the Vargha & Delaney’s effect size, such values would be considered largely negligible, indicating that single-trial correlations are not a reliable way to predict subsequent behaviour in Data set 2 (Fig 5C; see also [36]).

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 5. Single-trial responses are barely behaviourally relevant for Data set 2. A) Graphic representation of task structure in Data set 2. Note that gratings depicted here are not accurate representations of the grating stimuli used. Mice move the higher contrast grating stimulus towards the centre by steering a wheel, or refrain from moving the wheel when no stimulus is present (left, middle). Animals accomplish this task with high proficiency (right). We show representations of the stimuli instead of the actual gratings. B) Specificity Index for the selected areas, split by hits and misses. C) Behavioural Relevance for selected brain areas. https://doi.org/10.1371/journal.pcbi.1012000.g005

Together, these results suggest that in Data set 2, the relation between single-trial responses and trial-averaged firing rate templates was only marginally stimulus-specific, and did not appear to substantially inform subsequent behavioural choices. However, this estimate may present a lower bound for several reasons. First, while the task information conveyed by cross-trial averages seemed to be limited in the recorded population of neurons, it might be sufficient to generate accurate behaviour when scaled up to a larger population. To explore this possibility, we sub-sampled the population of recorded neurons in each brain area from N/10 to N. We then extrapolated how Specificity and Behavioural Relevance would evolve with a growing number of neurons. These extrapolations indicated that taking into account larger neuronal populations seemed to be at least somewhat beneficial for the Specificity and Behavioural Relevance of cross-trial averages in Data set 2, though improvements were rather moderate (Fig 6A, right). This did not seem to be an inherent feature of our extrapolation approach: for Data set 1, the Specificity Index appeared to remain largely stable with growing n, but Ω rose steeply, indicating that with more neurons, single-trial correlations to the average template would more robustly predict behaviour (Fig 6A, left).

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 6. Control analyses for both data sets. A) We subsampled the neuronal populations to check whether we could extrapolate a marked benefit from adding neurons when performing template-matching. Increasing the number of sampled neurons left the Specificity Indices for both Data sets largely unchanged (top), and yielded slight increases in Ω (bottom). We then clustered trials based on the similarity in their neural response (B) and pupil size (C). These clusterings had either a negative (Data set 1) or no (Data set 2) effect on the Behavioural Relevance Index, and only slightly increased the Specificity Index for both data sets. https://doi.org/10.1371/journal.pcbi.1012000.g006

Alternatively, at least in the brain areas most involved in task processing, there might be a group of ‘super-coder’ neurons that reflect relevant task variables more consistently [48, 49]. To test this possibility, we implemented a jackknife procedure, removing one neuron at a time from the data and recomputing all metrics based on the remaining population. This approach generally did not reveal neurons that particularly boosted single-trial correlations or Specificity (S9 Fig). Rather, the contribution of different neurons to the population response’s Specificity was distributed largely symmetrically around zero (as measured by γ; S9 Fig lower panels). However, a few brain regions in Data set 2 did feature a somewhat right-skewed distribution of jackknifed Specificity Indices (S9 Fig, indicating that for these areas, at least some neurons contributed substantially to single-trial correlations. Intriguingly, such super-coder neurons contributed more heavily to the Specificity Index in hit than in miss trials. This suggests that when super-coder responses were more specific, animals tended to act more successfully. The areas highlighted by this analysis are consistent with the notion that ‘super-coder’ neurons might appear in brain areas most directly involved in task processing. Specifically, the analysis identified six brain areas: The red nucleus (RN, a subcortical hub for motor coordination), the subparafascicular thalamic nucleus (SPF, auditory processing), the primary visual cortex (VISp, visual processing), reticular substantia nigra (SNr, reward and motor planning) and midbrain reticular nucleus (MRN, arousal and conscious state).

In contrast, in Data set 1, we found no evidence of super-coder neurons altogether, i.e. γ ≈ 0. This may reflect the fact that task information is distributed evenly throughout the neuronal population, or that Data set 1 did not contain recordings from brain areas that might contain ‘super-coder neurons’. Together, these results imply that while there were some neurons whose averaged responses reflected task-relevant information more robustly, these neurons were rather rare, and far from perfectly reliable. In other words, in both data sets, any sub-set of neurons could likely convey the average response template to approximately the same extent under most circumstances.

Even if population responses generally did not feature a clearly distinct group of neurons with particularly reliable responses, it is in principle possible that downstream areas only ‘listen to’ the neurons at the most informative tail of the distribution, and ignore responses from less informative neurons. To explore whether in this scenario, single-trial responses would clearly reflect the relevant cross-trial averages, we sub-sampled of each neuronal population to include only the 10 percent of neurons that had emerged as most and least stimulus-specific, respectively, based on the jackknifing procedure detailed above. As one would expect, the Specificity Index derived from the most reliably stimulus-selective neurons was generally higher than that of the least informative neurons, even though the difference was reasonably small (ΔSpecificity Index < 0.2; see S10 Fig). In contrast, Behavioural Relevance (Ω) decreased in most regions when only a sub-group of neurons was considered—whether most or least informative (S10 Fig). This suggests that cross-trial averages based only on the individually most informative neurons failed to reflect behaviourally relevant information more successfully than those based on a wider range of neuronal responses.

In addition, by pooling stimulus pairs with large and small contrast differences into just two stimulus categories—’target left’ and ‘target right’—we may have caused the resulting average templates to appear less distinctive. Specifically, difficult stimulus pairs might ‘blur the boundaries’ between average templates. To estimate the impact of stimulus similarity on the Specificity Index and Behavioural Relevance, we computed them separately for difficult and easy trials. Both Specificity and Relevance increased in a majority of brain areas when only taking into account stimulus pairs with large contrast differences, but plummeted for more subtle contrast differences (see S8 Fig). This suggests that in Data set 2, single trial population firing rates were both specific and behaviourally relevant when processing coarse stimulus information, but only barely in finer contrast discrimination. In this context, it is important to note that animals were also highly successful in discriminating difficult stimulus pairs. This suggests that fine contrast discrimination relied on other coding modalities than average firing rates.

Another potential limiting factor of our analysis could be that template-matching may occur in a way that cannot be captured by simple correlations. Even though linear correlations are in principle the correct way to test the assumptions inherent in cross-trial averaging, it is still possible that the linear operation of cross-trial averaging somehow reflects neuronal features of single-trial responses that are better understood in a higher-dimensional space. To explore this scenario, we repeated all previous analyses, but characterized population responses using Principal Component Analysis (PCA) via Singular Value Decomposition (SVD), and quantified their resemblance (normalized distance, see section PCA Analysis in Methods) to the average template in this lower-dimensional space. In both data sets, stimulus specificity increased marginally, but behavioural relevance decreased (S12 Fig and Fig 7). The decrease in behavioural relevance was particularly steep in Data set 1, indicating that raw firing rates were more instructive to the animals’ choices than lower-dimensional features of neuronal activity (Fig 7; for details, see S12 Fig).

Finally, neuronal responses may reflect a multi-factorial conjunction of response preferences to a wide range of stimulus and behavioural variables. To test this hypothesis, we quantified whether single-trial correlations to the average would become more specific or behaviourally relevant when additional variables were taken into account. As a first test, we accounted for potential modulating variables in an agnostic way by clustering the neuronal population responses from all trials according to similarity. Such clusters might reflect different spontaneously occurring processing states that the animal enters into for reasons (e.g. locomotion, satiation, learning etc.) that may remain unknown to the experimenter. Based on the Silhouette Index, which measures cluster compactness (S11 Fig), we decided to group trials into two clusters. We repeated all analyses of Specificity and Behavioural Relevance within each of these trial clusters. In Data set 1, the Specificity Index was largely unchanged (Fig 6B; see Fig 7 for a summary). This aligns with the fact that clusters in Data set 1 were less compact and thus trial-grouping did not significantly reduce response spread (S11(A) Fig). At the same time, Behavioural Relevance decreased rather sharply (Fig 6B), indicating that differences between the identified trial clusters were in fact behaviourally informative.

In contrast, Specificity rose sharply in Data set 2 after trial clustering (Fig 6B; see Fig 7 for a summary). This suggests a scenario in which the same stimulus can evoke multiple distinct, but self-consistent, response patterns. These patterns are obscured when responses are averaged across all trials indiscriminately, but emerge robustly when averages are computed separately per trial cluster. However, the lack of improvement in Behavioural Relevance suggests that single-trial correlations to the average did not consistently predict behaviour in either response mode. The test presented here can help researchers to reveal and explicitly address such unique neuronal response dynamics.

As a second test, we accounted for spontaneous fluctuations of attentional state as reflected by pupil size [45, 50, 51]. To this end, we grouped trials by pupil size and computed cross-trial averages for these trial groups. If population responses are modulated by attentional state, examining only trials that occurred during similar attentional states should reduce unexplained variability. However, grouping trials by average pupil size only slightly improved Specificity, and did not improve Behavioural Relevance at all in Data set 2 (Fig 6C; Data set 1 did not contain measurements of pupil size).

Together, these analyses suggest that the missing link between single-trial responses and cross-trial averages in Data set 2 is not sufficiently explained by unmeasured confounding factors, non-linear interactions or lack of neurons. Rather, it appears to be an inherent feature of the data set. Fig 7 summarizes the outcomes of different analysis approaches. Across analyses, Data set 1 generally shows better Specificity and Behavioural Relevance than Data set 2. Interestingly, the Specificity in Data set 1 did not increase further with procedures such as Z-scoring neuronal responses over trials to remove baseline firing rates, applying PCAs to extract lower-dimensional response features or clustering trials according to similarity—and Behavioural Relevance was even reduced by these procedures. This suggests that in Data set 1, absolute firing rates were instructive of behaviour on a single-trial level, so that metrics obscuring absolute firing rates (e.g. by dimensionality reduction) impaired their Behavioural Relevance. This finding is particularly interesting given that in this paradigm animals had essentially been trained to detect extra spikes in somatosensory cortex. As such, it seems plausible that the neuronal representation of this task would feature absolute spike counts.

In contrast, in Data set 2, response specificity clearly benefited when baseline firing rates were accounted for—either by z-scoring firing rates, applying PCA, or clustering trials according to similarity. This suggests that here, absolute firing rate fluctuations were not tied to task performance, so that removing such fluctuations helped to uncover task information available at the single-trial level. However, this did not improve Behavioural Relevance. Thus, while baseline firing rates might have obscured the reliability of single-trial responses, removing them did not produce more behaviourally relevant cross-trial averages.

These results raise several questions. What features of Data set 1 make cross-trial averages so much more representative of single-trial processing than in Data set 2? And how representative are those features compared to the range of data generated by neuroscience? To start delineating answers to these questions, we created a simple model to simulate what the Specificity and Behavioural Relevance Index would be when our tests were applied to neuronal population responses with (1) different distributions of response preferences, (2) different degrees of neuronal single-trial variability and (3) different degrees of variability in the translation from neuronal responses to behavioural choice. Specifically, we simulated a neuronal population of similar size as those recorded in both data sets (n = 200). For these 200 neurons, we simulated a Beta(β, β) distribution, governed by a parameter β, of average response preferences regarding two hypothetical stimuli. When β = 1, response preferences were distributed completely uniformly across the spectrum from Stimulus 1 to Stimulus 2. β < 1 indicated a shift towards an increasingly segregated bimodal distribution, with neurons preferring either Stimulus 1 or 2. β > 1 indicated an increasingly tight unimodal Gaussian function (even though the Beta distribution does not exactly equate the Gaussian distribution for any set of parameters, in the case of lim α→∞ β(α, α), the Beta function converges to the standard normal distribution [52]), with all neurons responding in largely the same way regardless of the presence of Stimulus 1 or 2. Next, single-trial responses of each simulated neuron were the sum of its ‘true’ response preference, and a varying level of ‘noise’. Based on these simulated single-trial responses, we modelled a behavioural read-out that acted as a non-linearity, choosing the correct or incorrect behavioural output depending on how similar a given neuronal single-trial response was to the correct versus incorrect average template (see section on Simulations in Methods). This non-linearity could either act in a noise-free manner, translating single-trial responses directly to the most closely corresponding behavioural decision, or it could add some ‘decision variability’ of its own (Fig 8A).

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 8. Template-matching simulation. A) Stimuli are randomly sampled from a bimodal distribution. Then, neural responses are modelled as a baseline firing rate plus a stimulus-related response (with a background noise term, parametrized by an SNR), modulated by the selectivity β. Finally, the choice is made by passing the Specificity Index (difference between the correlation to one stimulus minus the other) through a sigmoid with Gaussian noise. (B) Specificity Index as we vary the SNR and the selectivity (β) of the model neurons. We have highlighted the points in the simulation that are compatible with the experimental data sets (color coded as indicated in the legend). Compatibility is defined by a threshold of |SpecIdx measured − SpecIdx model | < 0.05. (C) Same as B), but for the Behavioural Relevance Index (Ω). In this case, we varied the noise intensity of the decision-making process, for a fixed intermediate SNR. https://doi.org/10.1371/journal.pcbi.1012000.g008

These simulations showed that the Specificity Index will rise steeply when the distribution of neuronal response preferences is reasonably spread out (β <= 1). In contrast, the Specificity Index will remain low as soon as neuronal response preferences are not particularly stimulus-specific (β > 1). Interestingly, this overall pattern was only marginally dependent on the amount of single-trial ‘noise’ added to the simulated average response preferences (Fig 8B). In other words, the diversity of neuronal response preferences was much more crucial to the computational utility of cross-trial averages than low cross-trial variability. Comparing the median Specificity Index of Data sets 1 and 2 to these simulations (Fig 8B) suggests that in order to reach the Specificity values we observed empirically, neuronal responses in Data set 1 should be distributed at least somewhat bimodally between the two stimulus conditions, while those in Data set 2 should be less distinguishable from each other. The real distributions of neuronal response preferences in both data sets confirmed these predictions (S14 Fig). These simulations therefore suggest that the improved Specificity of cross-trial averages in Data set 1 compared to Data set 2 largely hinges on the broader distribution of neuronal response preferences.

We next explored how the Behavioural Relevance Index reflected the interplay between the distribution of neuronal response preferences, and the variability in the decision making process itself. Like the Specificity Index, Behavioural Relevance increased with a broader distribution of response preferences, as well as (unsurprisingly), with smaller variability of decision making (Fig 8C). Based on their median Behavioural Relevance Indices, our two example data sets would appear to occupy distinct areas of the parameter landscape: Based on its measured Behavioural Relevance, Data set 1 seems to operate in a regime with an at least somewhat bimodal distribution of neuronal preferences and a moderate variability of behavioural decision making, meaning that single-trial correlations to the average template drive behavioural decisions quite faithfully. In contrast, Data set 2 would be predicted to feature either a tight unimodal distribution of neuronal response preferences, combined with similarly faithful decision making as Data set 1—or a more spread-out, uniform distribution of neuronal preferences, but with extremely noisy decision making. Given the response distributions shown in (S14 Fig), we assume that both unspecific neuronal response preferences and genuine decision-making variability contribute to the low Behavioural Relevance observed in Data set 2.

Together, these simulations demonstrate three main outcomes. First, the Specificity and Behavioural Relevance Index are able to correctly pick up the neuro-behavioural features they were designed to reflect—stimulus-specific neuronal response profiles in the case of the Specificity Index, and consistent decision criteria based on these neuronal response profiles in the case of the Behavioural Relevance Index. Second, our two example data sets occupy distinct and complementary spaces within the parameter landscape of neuronal and behavioural variability, with the lower effectiveness of cross-trial averages in Data set 2 being most likely due to a tighter distribution of neuronal preferences and higher variability of decision making. Finally, based on the Specificity and Behavioural Relevance computed in real data, simulations like the one presented here can in fact be used to hypothesize about the neuronal and behavioural mechanisms that boost or bust the computations relevance of cross-trial averages in a specific experiment. For instance, our simulation highlighted the broad distribution of neuronal response preferences, rather than the magnitude of single-trial variability, as the main factor that makes cross-trial averages more meaningful in Data set 1 than in Data set 2.

[END]
---
[1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012000

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/