(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Multisensory perceptual and causal inference is largely preserved in medicated post-acute individuals with schizophrenia [1]

['Tim Rohe', 'Department Of Psychiatry', 'Psychotherapy', 'University Of Tübingen', 'Tübingen', 'Institute Of Psychology', 'Friedrich-Alexander-Universität Erlangen-Nürnberg', 'Erlangen', 'Klaus Hesse', 'Ann-Christine Ehlis']

Date: 2024-09

Hallucinations and perceptual abnormalities in psychosis are thought to arise from imbalanced integration of prior information and sensory inputs. We combined psychophysics, Bayesian modeling, and electroencephalography (EEG) to investigate potential changes in perceptual and causal inference in response to audiovisual flash-beep sequences in medicated individuals with schizophrenia who exhibited limited psychotic symptoms. Seventeen participants with schizophrenia and 23 healthy controls reported either the number of flashes or the number of beeps of audiovisual sequences that varied in their audiovisual numeric disparity across trials. Both groups balanced sensory integration and segregation in line with Bayesian causal inference rather than resorting to simpler heuristics. Both also showed comparable weighting of prior information regarding the signals’ causal structure, although the schizophrenia group slightly overweighted prior information about the number of flashes or beeps. At the neural level, both groups computed Bayesian causal inference through dynamic encoding of independent estimates of the flash and beep counts, followed by estimates that flexibly combine audiovisual inputs. Our results demonstrate that the core neurocomputational mechanisms for audiovisual perceptual and causal inference in number estimation tasks are largely preserved in our limited sample of medicated post-acute individuals with schizophrenia. Future research should explore whether these findings generalize to unmedicated patients with acute psychotic symptoms.

Funding: This work was supported by the Deutsche Forschungsgemeinschaft ( https://www.dfg.de/ ; RO 5587/ 1–1 to TR) and the University of Tübingen ( https://uni-tuebingen.de/ ; Fortüne award 2292–0–0 and 2454–0–0 to TR). Publication costs were covered by the Open Access Publishing Fund of the University of Tübingen. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This psychophysics-EEG study investigated whether schizophrenia alters the computational and/or neural mechanisms of multisensory perception in a sound-induced flash illusion paradigm. In an inter-sensory selective attention task, patients with schizophrenia and age-matched healthy controls were presented with sequences of a varying number of flashes and beeps. We first assessed whether schizophrenia altered the computations of how observers combined auditory and visual signals into number estimates by comparing normative and approximate Bayesian causal inference (BCI) models. Next, we combined BCI modeling with multivariate EEG analyses to unravel the underlying neural mechanisms. Our results suggest that the core computational mechanisms underlying perceptual and caudal inference are largely preserved in our limited medicated schizophrenia cohort compared to healthy participants at the behavioral, computational, and neural level.

The intricacies of multisensory perception may explain the inconsistent findings regarding multisensory abnormalities in psychosis [ 40 – 43 ] (for a review, see [ 44 ]). For example, the rate at which participants with schizophrenia experience the McGurk- or sound-induced flash illusions has been shown to be lower [ 40 , 43 , 45 ], equal [ 46 ], or even higher [ 41 ] compared to healthy controls. These inconsistencies may arise from the complex interplay of an individual’s auditory and visual precisions, perceptual and causal priors, and decisional strategies, which may all be altered in psychosis. Bayesian modeling and formal model comparison moves beyond previous descriptive approaches by allowing us to dissociate these distinct computational ingredients [ 47 – 49 ]. For example, psychosis may alter how the brain weights prior knowledge and different pieces of sensory information: It may over-rely on prior information or even assign a greater weight to information from a specific sensory modality. Psychosis may also increase observers’ tendency to bind signals across different senses as quantified by the causal prior. This could facilitate the emergence of percepts that misbind incongruent sensory signals. Finally, psychosis may alter how observers read out their perceptual estimates from complex (e.g., bimodal) posterior distributions that typically arise through Bayesian causal inference [ 28 ]. For example, instead of model averaging that is predominantly observed in healthy individuals [ 27 , 28 , 36 ], patients may apply suboptimal or heuristic strategies that do not optimally take the causal structure of the signals into account [ 30 , 31 ]. This brief overview highlights the powerful insights that may be obtained by combining Bayesian causal inference models with behavioral and neuroimaging data acquired in more complex and challenging multisensory environments.

Models of hierarchical Bayesian causal inference [ 23 – 25 ] account for this causal inference problem in multisensory perception by explicitly modeling the causal structures that could have generated the sensory signals. When signals happen at the same time and space and are semantically (or numerically) congruent, it is likely that they arise from a common source. Hence, in this common-cause case, the brain should fuse the signals weighted in proportion to their relative sensory precisions into 1 single unified perceptual estimate, giving a stronger weight to the more reliable (i.e., less noisy) signal (i.e., fusion estimate). When signals happen at different times or are semantically incongruent, it is likely that they emanate from different sources. In this case of separate sources, the brain should process them independently (i.e., segregation estimates for the unisensory signals). Critically, the brain does not a priori know whether signals come from common or independent sources. Instead, it needs to infer the underlying causal structure from noisy statistical correspondence cues such as signals happening at the same time or space, being numerically or semantically congruent [ 26 – 29 ]. To account for observers’ uncertainty about the signals’ causal structure, the brain should read out a final perceptual estimate by combining the fusion (i.e., common source) and segregation (i.e., separate sources) estimates weighted by the posterior probabilities of each causal structure (i.e., common or independent sources). This decision strategy is referred to as model averaging (for other decisional functions, see [ 30 ]). Bayesian causal inference thereby enables a graceful transition from integration for (near-) congruent auditory and visual signals to segregation for incongruent signals. Behaviorally, accumulating research has shown that human observers arbitrate between sensory integration and segregation consistent with models of Bayesian causal inference [ 23 , 27 , 28 , 31 – 36 ]. At small discrepancies, they combine signals into one coherent percept which leads to prominent crossmodal biases. At larger disparities, these multisensory interactions and crossmodal biases are attenuated. At the neural level, recent research has shown that the brain accomplishes Bayesian causal inference by dynamically encoding the segregation, fusion, and the final perceptual estimate that accounts for the brain’s causal uncertainty along cortical pathways [ 27 , 33 , 36 – 39 ]. Multisensory perceptual inference is thus governed by 2 sorts of priors, observers’ perceptual priors about environmental properties (e.g., the number of signals) as well as a causal prior about whether signals come from common or independent sources, which represents observers’ a priori tendency to bind sensory signals. While the former priors influence observer’s perceptual estimates directly, the latter does so indirectly by modulating the strength of cross-sensory interactions.

The weighting of various pieces of information becomes even more complex when the brain is confronted with multiple sensory signals that may come from same or different causes. In the face of this causal uncertainty, the brain needs to infer whether 2 sensory signals—say the sound of a whispering voice and the sight of articulatory movements—come from a common source (e.g., a single speaker) and should hence be integrated or else be processed independently (e.g., in case of different speakers). Recent research has shown that hallucinations in psychosis are not only unisensory (e.g., hearing voices), but hallucinations are more often multisensory than previously assumed [ 21 ]. Moreover, multisensory hallucinations are associated with greater conviction and more distress for patients [ 22 ], especially when signals from different sensory modalities are semantically related or occur simultaneously. This pattern suggests that patients perform causal inference and integrate signals from different sensory modalities into unified hallucinatory percepts. The multisensory nature of hallucinations points towards an additional intriguing mechanism for hallucinations: Patients may be more prone to integrate even signals from different sources and thereby attribute greater “reality status” to them. Critically, because multisensory hallucinations rely on causal inference computations, they cannot be mediated solely by local neural mechanisms in early sensory cortices. Instead, they must involve complex perceptual and causal inference processes along cortical hierarchies.

Consistent with this conjecture, overreliance on priors correlated with psychotic symptoms in patients with schizophrenia [ 10 , 11 ] and hallucination-like experiences in hallucination-prone individuals [ 11 – 15 ]. By contrast, direct comparison between patients with schizophrenia and healthy controls did not consistently reveal overreliance on priors but also underweighting of priors [ 16 – 19 ]. This pattern of results suggests that overreliance on priors may be associated with psychotic symptoms in patients or hallucination-like experiences in healthy controls rather than act as a trait-marker for schizophrenia [ 12 ]. However, these diverse findings raise the possibility that psychosis in schizophrenia may either increase or decrease the precision or weight of different types of priors [ 16 , 17 ]. While the weight of priors about simple features (e.g., motion) is thought to decrease in psychosis [ 18 ], the weight of priors about semantic or related information may increase [ 20 ].

Hallucinations—percepts in the absence of sources in the external world—are a hallmark of psychotic disorders such as schizophrenia. Individuals with psychosis may for instance hear voices or see people that are not present in their environment. The computational and neural mechanisms that give rise to these perceptual abnormalities remain unclear. Increasing research is guided by the notion that perception relies on probabilistic inference based on 2 distinct sources of information, observers’ top-down prior beliefs and bottom-up noisy sensory signals [ 1 – 7 ]. According to Bayesian probability theory, the brain should combine these 2 sources of information weighted according to their relative precisions (i.e., inverse of noise or variance), with a greater weight given to more reliable information. Hallucinations may thus arise from abnormal weighting of prior beliefs and incoming sensory evidence [ 8 , 9 ].

To examine whether SCZ patients and HC may differ more subtly in their multivariate EEG response, we trained a support-vector classifier (SVC) to classify participants [ 72 ] as SCZ or HC based on their EEG activation patterns to auditory, visual, and audiovisual stimuli across poststimulus time windows (see S1 Text and S7 Fig ). The ERP topographies evolved largely similar in HC and SCZ, so that the EEG-based decoder did not classify participants as HC or SCZ better than chance in one-sided cluster-based corrected randomization tests when corrected across multiple comparisons across time. Overall, our ERP and multivariate analyses thus corroborated that the neural mechanisms underlying audiovisual integration were largely comparable in HC and SCZ patients. However, in line with our EEG results from the BCI analysis, we observed a deficit in visual processing in SCZ as indicated by smaller visual P3 responses.

ERPs (across-participants mean grand averages; n = 40) of HC and SCZ participants elicited by unisensory auditory stimuli (A), unisensory visual stimuli (V), audiovisual congruent conditions (AV congr ) and the difference of these ERPs (i.e., AV congr −(A+V)) indicating multisensory interactions. The ERPs are averaged across occipital electrodes. Color-coded horizontal dotted lines indicate significant clusters (p < 0.05) of ERPs against baseline (i.e., across HC and SCZ, main effect of condition) in one-sample two-sided cluster-based corrected randomization tests. The horizontal solid line indicates a significant cluster of ERP differences between HC and SCZ in two-sample two-sided cluster-based corrected randomization tests. The x-axis shows the stimulus onsets. Source data is provided in S7 Data . ERP, event-related potential; HC, healthy control; SCZ, schizophrenia.

Following previous work, we also analyzed and compared the event-related potentials (ERPs) to unisensory and multisensory stimuli between HC and SCZ [ 46 , 64 – 68 ]. ERPs showed the typical components in response to flashes and beeps ( Fig 7 ), i.e., P1 (approximately 50 ms), N1 (100 ms), P2 (200 ms), N2 (280 ms), and P3 (>300 ms) [ 69 ]. As previously reported [ 70 ], the unisensory visual P3 component was significantly smaller in SCZ than HC (cluster 560 to 705 ms, p = 0.045). To characterize the neural processes of audiovisual integration, we tested for audiovisual interactions (i.e., AV congr versus (A + V)) over occipital electrodes. In line with previous studies [ 69 , 71 ], we observed early audiovisual interactions from 75 to 130 ms (i.e., measured from the onset of the first flash-beep slot; p = 0.049; two-sided one-sample cluster-based corrected randomization test) and later negative audiovisual interactions from 260 to 750 ms after stimulus onset (p < 0.001). Crucially, the ERP interactions did not differ between HC and SCZ (p > 0.05; two-sided two-sample cluster-based corrected randomization test).

(A) Decoding accuracy (Fisher’s z-transformed correlation; across-participants mean) of the SVR decoders as a function of time and group (HC vs. SCZ). Decoding accuracy was computed as correlation coefficient between the given BCI model’s internal estimates and BCI estimates that were decoded from EEG activity patterns using SVR models trained separately for each numeric estimate. The BCI model’s internal numeric estimates comprise of: (i) the unisensory visual ( ), (ii) the unisensory auditory ( ) estimates under the assumption of independent causes (C = 2), (iii) the forced-fusion estimate ( ) under the assumption of a common cause (C = 1), and (iv) the final BCI estimate ( or depending on the sensory modality that is task-relevant) that averages the task-relevant unisensory and the precision-weighted estimate by the posterior probability estimate of each causal structure. Color-coded horizontal solid lines (HC) or dashed lines (SCZ) indicate clusters of significant decoding accuracy (p < 0.05; one-sided one-sample cluster-based corrected randomization t test). Color-coded horizontal dotted lines indicate clusters of significant differences of decoding accuracy between both groups (p < 0.05; two-sided two-sample cluster-based corrected randomization t test). Stimulus onsets are shown along the x-axis. (B) Bayes factors for the comparison between the decoding accuracies of HC and SCZ for each of the BCI estimate (i.e., BF 10 > 3 substantial evidence for or BF 10 < 1/3 against group differences). Source data is provided in S6 Data . BCI, Bayesian causal inference; EEG, electroencephalography; HC, healthy control; SCZ, schizophrenia; SVR, support-vector regression.

In both HC and SCZ, the decoders predicted the BCI estimates from EEG patterns significantly better than chance over most periods of the post-stimulus time window (i.e., decoding accuracy r ≈ 0.3; Fig 6A ). Yet, the different perceptual estimates evolved with different time courses commonly in both groups. Initially, the EEG activity encoded mainly the visual segregation estimates starting at about approximately 60 to 100 ms (i.e., significant clusters in one-sided cluster-based corrected randomization t test; see S4 Table ). Slightly later, the auditory segregation and fusion estimates peaked with the fusion estimates showing a slower decline. The final BCI estimate, which accounts for the signals’ causal structure, rose more slowly and showed a more sustained time course. Moreover, in HC only did the decoding accuracy of the final BCI estimate exceed those of the other estimates at 600 ms. The temporal profiles of the decoding accuracies were largely comparable between HC and SCZ. Bayes factors provided mainly weak evidence for no difference between the groups ( Fig 6B ), except for the visual segregation-estimate that was associated with a significantly lower and more protracted decoding accuracy in SCZ than HC from 120 to 220 ms ( Fig 6A and S4 Table; all further clusters p > 0.05). Because we did not track observers’ eye movements, these lower decoding accuracies in SCZ may potentially be explained by reduced fixation stability in SCZ [ 62 ]. Potentially, it may also result from increases in visual uncertainty in SCZ patients that were reported in previous studies on the computational mechanisms of causal inference. These previous studies manipulated the asynchrony of audiovisual stimuli in simultaneity judgment tasks [ 42 , 63 ] and may therefore have been more sensitive to detect differences in sensory uncertainty than the current study that categorically manipulated the number of beeps and flashes. In summary, combining Bayesian modeling and EEG decoding largely confirmed that SCZ and HC performed audiovisual number estimation according to similar neurocomputational mechanisms. However, SCZ encoded the visual signal with less precision resulting in a lower decoding accuracy of the visual segregation estimate. These results suggested that the visual uncertainty may be increased in SCZ participants as previously reported [ 42 ]. By characterizing sensory uncertainty across time at millisecond resolution, EEG may be able to reveal sensory differences that may not be reflected in observers’ behavioral responses.

To investigate whether SCZ and HC achieve Bayesian causal inference via shared or different neural processes, we combined BCI models with EEG decoding. In particular, we temporally resolved how the brain encodes the auditory ( ) and visual ( ) segregation, the fusion ( ) and the final BCI perceptual estimates (i.e., or ) that combine the forced-fusion estimate with the task-relevant unisensory segregation estimates, weighted by the posterior causal probability of common or separate causes [ 27 ]. To track the evolution of these different estimates across time, we trained a linear support-vector regression model (SVR decoder) to decode each of the 4 BCI estimates from EEG patterns of 60 ms time windows. SVR decoders were trained on EEG patterns individually in each participant. We computed the decoding accuracy as the Pearson correlation between the decoded and “true” perceptual estimates to quantify how strongly a perceptual estimate was encoded in EEG patterns for each participant. Decoding performance at chance level was defined as a correlation of zero (i.e., r = 0). At the group level, we then assessed whether the decoded perceptual estimates (as indicated by the correlation coefficients) evolved with different time courses in HZ and SCZ.

In summary, our analyses of the behavioral data based on the general linear model (GLM) and formal Bayesian modeling suggest that the medicated SCZ dynamically adapted and combined priors about the signals’ causal structure and the number of signals with new audiovisual inputs comparable to the healthy controls and in line with Bayesian principles.

Next, we investigated the influence of the number of signals on the previous trial obn observers’ numeric estimates. As expected, observers’ numeric estimates were biased towards the number of signals on the previous trial (see S6B Fig ). Again, this was also reflected in our Bayesian analysis assessing the updating of observers’ numeric prior (i.e., μ P ) and its variance (i.e., σ P ). As expected for Bayesian learners, both SCZ and HC increased their numeric prior’s mean and variance after exposure to a high number of signals in the task-relevant modality (e.g., beeps for auditory report; Fig 5B and 5C ; significant main effects in Table 4 ). But again, no significant differences were observed between groups (i.e., no significant interaction effects with group) suggesting that HC and SCZ also adjust their perceptual priors at shorter timescales similarly.

To further explore whether SCZ over-relied on prior information, we exploited the fact that observers dynamically adapt their priors in response to previous stimuli. Some previous studies have suggested that this dynamic updating of priors is altered in psychosis ([ 10 ], but see [ 18 ]). Thus, we first examined whether and how SCZ and HC increased their binding tendency (“model-free”) or causal prior (i.e., p common from Bayesian analysis) after exposure to congruent or incongruent flash-beep sequences. As expected based on previous findings [ 27 , 59 – 61 ], both SCZ and HC similarly decreased their binding tendency after a trial with greater numeric disparity. As shown in S6A Fig , the differences between the CMB for visual and auditory report increased with the greater numerical disparity of the previous trial. In other words, observers were more capable to selectively report the numeric estimate of the task-relevant sensory modality for large numeric disparities on the previous trials. This was consistent with the idea that after a large numeric disparity trial, observers decreased their prior binding tendency which in turn attenuated audiovisual interactions and crossmodal bias on subsequent trials. Likewise, our Bayesian modeling analysis indicated that observers’ causal prior increased after a congruent trial and decreased after a trial with large numeric disparity (i.e., a main effect of previous disparity; Fig 5A and Table 4 ). Importantly, we did not observe any significant differences between groups (i.e., no significant disparity × group interaction).

These quantitative Bayesian modeling results corroborated our initial GLM-based conclusions that both HC and SCZ combined audiovisual signals consistent with Bayesian causal inference. The numeric prior was slightly more precise in SCZ compared to HC (according to classical statistics), which was in line with previous results in unisensory perception [ 10 , 11 , 13 , 15 ]. Moreover, PANSS positive symptoms correlated positively with visual variance and the LSHS-R correlated negatively with auditory variance. Collectively, these results suggested that patients with psychotic symptoms may have relied more strongly on prior information and possibly auditory information relative to visual inputs for numeric inferences. The positive correlation between PANSS negative symptoms and lapse parameter indicated that at least some SCZ patients may have found it harder to stay attentive throughout the entire experiment.

Further, we observed a significant positive correlation between SCZ’s PANSS positive symptoms with their visual variance (r = 0.625, t 16 = 3.101, p = 0.003, p corr = 0.024, BF 10 = 6.518; randomization test of the correlation). This effect remained significant even after controlling for PANSS negative symptoms and general psychopathology ( S3 Table ). Further, we correlated the BCI model’s parameters with self-reported hallucinatory experiences (i.e., LSHS-R) and paranoid thinking (i.e., PCL) as sensitive measures of psychosis severity that may also capture subclinical psychotic symptoms. LSHS-R and PCL were associated with greater variability in our sample compared to the PANSS positive symptoms ( S5 Fig ). Only the auditory variance correlated negatively with hallucinatory experiences at marginal significance (without multiple comparison correction, r = −0.498, t 16 = −2.223, p = 0.047, p corr = 0.141, BF 10 = 1.439; S3 Table ). In addition, we observed a positive correlation between PANSS negative symptoms and lapse rates in SCZ (r = 0.483, t 16 = 2.134, p = 0.029, p corr = 0.232, BF 10 = 1.248). No other significant group differences were revealed when comparing the BCI model’s parameters between the 2 groups ( Fig 4 and Table 3 ).

Having established that SCZ and HC combined audiovisual signals in line with Bayesian causal inference and read out the final estimate according to model averaging, we examined whether SCZ may overweight their causal prior about the signals’ causal structure (i.e., p common ). Contrary to this conjecture, two-sample randomization tests on p common did not reveal any significant difference between the 2 groups ( Fig 4 and Table 3 ). In the next step, we asked whether SCZ relied more strongly on their priors about the flash/beep number relative to the sensory inputs as may be hypothesized based on a growing number of studies in unisensory perception [ 10 , 11 , 13 , 15 ]. An over-reliance on prior information would be reflected in a greater precision (i.e., smaller variance; σ P ) of numeric priors (i.e., μ P ) in SCZ relative to HC. As shown in Fig 4 and Table 3 , two-sample randomization tests indeed revealed a significantly smaller variance for the numeric prior in SCZ compared to HC. This more precise numeric prior accounts for the fact that the numeric reports are centered more around the numeric prior mean in SCZ compared to HC ( Table 3 ; cf. Fig 2 ). It is consistent with previous research [ 10 , 11 , 13 , 15 ] and consistent with the general notion that SZC rely more on prior knowledge than new incoming audiovisual evidence. However, the effect was small and did not survive correction for multiple comparisons across the model parameters or when including the 6 additional patients with schizoaffective disorder (cf. S1 Text and S7 Table ).

Comparing all 10 models in this 2 × 5 factorial model space ( Fig 3 and S2 Table ) revealed that the model-averaging model with increasing sensory variances outperformed all other 9 models in both HC and SCZ observers. Furthermore, a between-group Bayesian model-comparison provided strong evidence that HC and SCZ individuals relied similarly on the 5 decision strategies. In particular, model-averaging turned out as the “winning” strategy equally often (BF 10 = 0.022). These Bayesian model-comparison results further supported the notion that SCZ, like HC, performed multisensory perceptual and causal inference according to the same computational and decision strategies. S4 Fig also shows the BICs as approximations to the model evidence for each of the 5 models that allowed for scalar variability for each patient ranked according to their PANSS score. The figure shows that the model averaging was the dominant strategy in almost all of the patients irrespective of their psychosis severity.

In total, we compared the following 5 decisional strategies: (i) Model averaging combines the segregation and fusion estimates weighted by the posterior probabilities of each causal structure. For example, it gives a stronger weight to the fusion estimate when it is likely that the flash and beep sequences are generated by one common cause. (ii) Model selection selects the perceptual estimate from the causal structure with the highest posterior probability. So rather than averaging fusion and segregation estimates, it selectively reports either the fusion or segregation estimates depending on the posterior probabilities of common and independent causes. (iii) Probability matching selects either the fusion or segregation estimates in proportion to their posterior probability. (iv) The fixed-criterion threshold model incorporates the simple heuristic of selecting the segregation estimate when the audiovisual numeric disparity estimate exceeded a fixed threshold. This differs from Bayesian model selection strategies in that observers’ do not apply the threshold to the posterior probability of a common cause, but directly on the numeric disparity estimate—thereby ignoring observers’ uncertainty about this estimate. (v) Finally, the probabilistic fusion model employs the suboptimal strategy of selecting either the fusion or segregation estimates with a fixed probability that is estimated from observers’ responses (see Materials and methods for details and [ 31 ]).

Along the factor “decision strategy,” we manipulated how observers combined the estimates formed under fusion (i.e., common source) and segregation (i.e., separate sources) assumptions into a final perceptual estimate. While growing research has shown that HC combine audiovisual signals according to the decision strategy of model averaging [ 27 , 28 , 36 ], we hypothesized that SCZ may resort to suboptimal strategies or even simpler heuristics such as applying a fixed threshold on audiovisual numeric disparity [ 31 ]. In other words, rather than arbitrating between sensory integration and segregation according to the posterior probability of a common cause in a Bayesian fashion, SCZ may simply do so based on audiovisual spatial disparity.

(A) The BCI model assumes that audiovisual stimuli are generated depending on a causal prior (p Common ): In case of a common cause (C = 1), the “true” number of audiovisual stimuli (N AV ) is drawn from a common numeric prior distribution (with mean μ P ) leading to noisy auditory (x A ) and visual (x V ) inputs. In case of independent causes (C = 2), the “true” auditory (N A ) and visual (N V ) numbers of stimuli are drawn independently from the numeric prior distribution. To estimate the number of auditory and visual stimuli given the causal uncertainty, the BCI model estimates the auditory or visual stimulus number ( or , depending on the sensory modality that needs to be reported). In the model-averaging decision strategy, the BCI model combines the forced-fusion estimate of the auditory and visual stimuli ( ) with the task-relevant unisensory visual ( ) or auditory estimates ( ), each weighted by the posterior probability of a common (C = 1) or independent (C = 2) causes, respectively (i.e., or ). (B) The factorial Bayesian model comparison (n = 40) of models with different decision strategies (model averaging, MA; model selection, MS; probability matching, PM; fixed criterion, FC; stochastic fusion, SF) with constant or increasing sensory auditory and visual variances, separately for HC and SCZ. The images show the relative model evidence for each model (i.e., participant-specific Bayesian information criterion of a model relative to the worst model summed over all participants). A larger model evidence indicates that a model provides a better explanation of our data. The bar plots show the protected exceedance probability (i.e., the probability that a given model is more likely than any other model, beyond differences due to chance) for each model factor. The BOR estimates the probability that factor frequencies purely arose from chance. Source data is provided in S3 Data . BCI, Bayesian causal inference; BOR, Bayesian omnibus risk; HC, healthy control; SCZ, schizophrenia.

Overall, these GLM-based analyses of behavioral data (i.e., accuracy, crossmodal bias) suggested that both SCZ and HC combine audiovisual signals into number estimates qualitatively consistent with the principles of BCI. Both groups gave a stronger weight to the more reliable auditory signals leading to the well-known sound-induced flash illusions. Moreover, they arbitrated between sensory integration and segregation depending on the numeric disparity.

An additional 2 × 3 × 2 mixed-model ANOVA with factors group (SCZ versus HC), audiovisual numeric disparity between beeps and flashes (1, 2 versus 3), and task-relevance (auditory versus visual report) revealed that these crossmodal biases were significantly decreased at large numeric disparities, when signals most likely originated from different sources and should hence be segregated (i.e., task relevance × numeric disparity interaction, Table 2 ). At large numeric disparities, HC and SCZ were thus able to selectively report the number of flashes (or beeps) with minimal interference from task-irrelevant beeps (or flashes). This task relevance × numeric disparity interaction was qualitatively the key response profile predicted by BCI. It basically demonstrated that observers integrated signals at small conflict size, but segregated signals at large numeric conflicts when it was unlikely that auditory beep and visual flash sequences were generated by one common underlying source. Again, Bayes factors indicated substantial to strong evidence for comparable performance in SCZ and HC (i.e., BF incl < 1/3 or even < 1/10, Table 2 ).

Based on Bayesian probability theory, observers should therefore assign a stronger weight to the more precise auditory signal when integrating audiovisual signals into number estimates, resulting in the well-known sound-induced flash illusion [ 27 , 56 , 57 ]. Consistent with this conjecture, both SCZ and HC were more likely to perceive 2 flashes when a single flash was presented together with 2 sounds (i.e., fission illusion) and a single flash when 2 flashes were presented together with 1 sound (i.e., fusion illusion; see S3 Fig ). We quantified these audiovisual interactions using the crossmodal bias (CMB) that ranges from pure visual (CMB = 1) to pure auditory (CMB = 0) influence ( Fig 1D ; n.b. the crossmodal bias can be computed only for numerically disparate flash-beep sequences). Consistent with the principles of precision-weighted integration [ 58 ], the HC and SCZ’s flash reports were biased towards the number of auditory beeps (i.e., CMB < 1, t 39 = −11.864, p < 0.001, Cohen’s d = −1.876, BF 10 > 100), again with no significant differences between the groups (t 38 = 0.434, p = 0.667, d = 0.139, BF 10 = 0.336). By contrast, we observed only a small but significant crossmodal bias for auditory reports towards the number of flashes in HC and SCZ ( Fig 1D ; CMB > 0, t 39 = 8.550, p < 0.001, d = 1.352, BF 10 > 100)—again with no evidence for group differences (t 38 = 0.448, p = 0.657, d = 0.143, BF 10 = 0.337). Thus, both HC and SCZ assigned a greater weight to the temporally more reliable auditory sense.

As expected [ 54 ], we observed a significant main effect of task-relevance and a task-relevance × modality interaction on response accuracies. Overall, observers estimated the number of beeps more accurately than the number of flashes. They also estimated the number of beeps better when presented alone than together with flashes, while the reverse was true for observers’ flash counts. Consistent with extensive previous research showing a greater temporal precision for the auditory than the visual sense [ 55 ], our results confirm that observers obtained more precise estimates for the number of beeps than for the number of flashes.

Fig 2A shows the reported flash and beep counts in SCZ and HC as a function of the true flash and beep numbers. Both SCZ and HC progressively underestimated the increasing numbers of flashes and beeps ( S1 Text , S1 Fig , and S1 Table ). This logarithmic compression and the increasing variances for greater number of flashes/beeps was observed similarly in both groups. It is in line with the known scalar variability of numerosity estimates [ 52 , 53 ]. Fig 2 also indicated that observers’ reported flash (resp. beep) counts were biased towards the concurrent incongruent beep (resp. flash) number in the ignored sensory modality. We formally compared SCZ and HC in their ability to selectively estimate either the number of flashes or beeps (see Fig 1C and Table 2 ) using a 2 × 2 × 2 mixed-model ANOVA, with factors group (SCZ vs. HC), stimulus modality (unisensory visual/auditory vs. audiovisual congruent), and task relevance (TR) (auditory vs. visual report) on response accuracies (n.b. no incongruent conditions were included in this ANOVA). No significant group differences or interactions with group were observed. Instead, Bayes factors provided substantial evidence for comparable accuracies of the flash and beep counts in HC and SCZ (i.e., BF incl < 1/3, Table 2 ; see S2 Fig for comparable response time results).

(A) Example trial of the flash-beep paradigm (e.g., 2 flashes and 4 beeps are shown) in which participants either report the number of flashes or beeps. (B) The experimental design factorially manipulated the number of beeps (i.e., 1 to 4), number of flashes (i.e., 1 to 4) and the task relevance of the sensory modality (report number of visual flashes vs. auditory beeps). We reorganized these conditions into a 2 (task relevance: auditory vs. visual report) × 2 (numeric disparity: high vs. low) factorial design for the GLM analyses of the audiovisual crossmodal bias. (C) Response accuracy (across-participants mean ± SEM; n = 40) was computed as correlation between experimentally defined task-relevant and reported signal number. Response accuracy is shown as a function of modality (audiovisual congruent conditions vs. unisensory visual and auditory conditions), task relevance (auditory vs. visual report), and group (HC vs. SCZ). (D) The audiovisual CMB (across-participants mean ± SEM; n = 40) is shown as a function of numeric disparity (1, 2, or 3), task relevance (auditory vs. visual report) and group (HC vs. SCZ). CMB was computed from participants’ behavior (upper panel) and from the prediction of the individually fitted BCI model (lower panel; i.e., model averaging with increasing sensory variances). CMB = 1 for purely visual and CMB = 0 for purely auditory influence. Source data is provided in S1 Data . BCI, Bayesian causal inference; CMB, crossmodal bias; GLM, general linear model; HC, healthy control; SCZ, schizophrenia.

In a sound-induced flash-illusion (SIFI) paradigm, we presented HC and SCZ with flash-beep sequences and their unisensory counterparts. Across trials, the number of beeps and flashes varied independently according to a 4 (1 to 4 flashes) × 4 (1 to 4 beeps) factorial design ( Fig 1A and 1B ). Thereby, the paradigm yielded numerically congruent or incongruent flash-beep sequences at 4 levels of audiovisual numeric disparity. In an inter-sensory selective attention task, observers reported either the number of beeps or flashes. The manipulation of numeric disparity across several levels enabled us to characterize how observers gracefully transitioned from integration to segregation as a key feature of Bayesian causal inference [ 27 , 35 , 50 , 51 ].

Discussion

This study combined psychophysics, EEG, and Bayesian modeling to investigate whether and how schizophrenia impacts the computational and neural mechanisms of multisensory causal and perceptual inference. A growing number of studies suggests that schizophrenia may alter the brain’s ability to integrate audiovisual signals into coherent percepts (for a review, see [44]). Many of these studies have used qualitative approaches such as evaluating the frequency of perceptual illusions in schizophrenia compared to healthy controls. Yet, the complex dependencies of multisensory illusion rates on sensory precisions and perceptual as well as causal priors may have contributed to the inconsistent findings reported so far [41,43,46]. In this study, we have therefore employed a more sophisticated factorial design with formal Bayesian modeling to disentangle these different computational components.

Our GLM-based and Bayesian modeling analyses offered convergent evidence that SCZ as their HC counterparts performed multisensory perceptual and causal inference in line with normative principles of Bayesian causal inference. When numeric disparities were small and signals were likely to originate from common sources, individuals with SCZ and HC integrated auditory and visual signals weighted by their relative precisions resulting in crossmodal biases. At large numeric disparities, these crossmodal biases and interactions were reduced. Through formal Bayesian model comparison, we further examined whether SCZ computed Bayesian estimates or resort to simpler heuristics such as segregating signals above a numeric disparity threshold. Our quantitative Bayesian modeling analyses confirmed that SCZ, like their healthy counterparts, combined signals consistent with BCI models and read out final numeric estimates based on the decisional strategy of model averaging.

Next, we examined whether SCZ may overweight prior information about flash/beep number in relation to sensory evidence at long and/or short timescales as previously observed in unisensory perception [10,11,13,15]. In support of an overweighting of priors in psychosis, our Bayesian modeling analysis revealed a significantly more precise numeric prior for SCZ compared to HC. Further, the visual variance correlated positively with the positive symptoms on the PANSS scale. These findings are consistent with previous studies showing overreliance of patients with psychosis on prior knowledge relative to sensory evidence [10–15]. However, the effect on prior variance was very small. It was no longer significant after correcting for multiple comparisons or including schizoaffective patients (cf. S1 Text and S7 Table). Likewise, the influence of the previous trial’s flash/beep number and numeric disparity on current perceptual choices was comparable in SCZ and HC.

Furthermore, we cannot exclude the possibility that SCZ showed a bias towards an intermediate signal number, which could be misinterpreted as overreliance on their numeric prior (see Figs 4 and 5B), because they were less vigilant or motivated and therefore failed to count some signals. Indeed, in line with this notion, the lapse parameter correlated positively with SCZs’ negative symptoms (S3 Table) and negatively with SCZs’ memory performance in the VLMT as well as their ability to inhibit inappropriate responses in the Stroop test (S5 Table). Overall, our analyses thus suggested that the medicated SCZ group of our study weighted and dynamically adapted numeric priors in a largely comparable fashion as their healthy counterparts with a small trend towards an overreliance on their numeric prior relative to bottom-up audiovisual evidence. The absence of substantial prior overweighting may be explained by our specific sample of medicated patients with chronic rather than acute SCZ and low expression of psychotic symptoms. Thus, prior overweighting may be a state marker associated with psychotic symptoms rather than a trait marker for schizophrenia [12].

Moreover, to our knowledge, this is the first study assessing prior weighting in a multisensory context. The availability of pieces of evidence furnished simultaneously by 2 sensory modalities may decrease the overall reliance on prior information and may thus decrease our paradigm’s sensitivity to detect small modulations in prior variance and weighting. Further, it is conceivable that distinct mechanisms across the cortical hierarchy support integration of prior and sensory evidence depending on whether information is provided only in one or in several sensory modalities. Given the prevalence of multisensory hallucinations in patients with psychosis [21], future work is required to assess perceptual inference in multisensory situations. For example, one may even develop elegant multisensory conditioning paradigms to provoke conditioned audiovisual hallucinations [11].

Crucially, our multisensory paradigm enabled us to assess not only the weighting of prior knowledge regarding environmental properties such as flash/beep number, but also to the world’s causal structure, i.e., whether signals come from common or independent sources as incorporated in the causal prior. A high causal prior or binding tendency increases crossmodal biases and interactions, while a low causal prior enhances an individual’s ability to selectively report the signal number in the task-relevant sensory modality while ignoring the incongruent number of signals in the irrelevant sensory modality. Changes in the causal prior may thus be closely associated with conflict monitoring, cognitive control, and selective attention mechanisms [51,73–75]. Yet, despite growing evidence for impaired selective attention and cognitive control mechanisms in SCZ [76–78], our Bayesian modeling analyses did not reveal any significant differences in the causal prior between SCZ and HC.

In summary, our GLM-based and Bayesian modeling analyses showed that SCZ and HC combined perceptual and causal priors with sensory evidence in a manner consistent with the principles of normative Bayesian causal inference. Furthermore, Bayesian statistics provided consistent evidence that the causal and perceptual priors’ weighting and updating was largely maintained in the SCZ group. These findings suggest that both groups dynamically adapt to the statistical structure of multisensory stimuli across short time scales (cf. serial dependencies in perception [79]). The absence of notable computational abnormalities in SCZ could be explained by the fact that our patient group was post-acute and medicated and scored low on the PANSS positive symptom scale. Most importantly, our cohort of SCZ patients also showed near-normal performance on the TMT-B and Stroop tests (cf. Table 1). Both tests measure executive and attentional functions which are particularly relevant for optimal performance in our inter-sensory selective-attention paradigm. Our findings add further evidence to the fact that fundamental mechanisms of perceptual inference are preserved at least in subgroups of medicated SCZ individuals [16–19]. Future research is required to explore whether schizophrenia patients with attentional and executive deficits and in more severe psychosis states may exhibit abnormal causal priors, and thus, how they arbitrate between sensory integration and segregation.

Using EEG, we investigated whether SCZ and HC support these computations via similar or different neural mechanisms. For example, in SCZ, the computations may involve compensatory neural systems or exhibit a slower dynamic. To explore these questions, we decoded the auditory and visual segregation, the fusion and the final BCI model’s estimates from scalp EEG data. In HC, the decoding accuracy of the visual segregation-estimate peaked earlier and higher compared to the other estimates. By contrast, in the SCZ group, decoding of the visual estimate showed a more protracted time course initially overlapping with the auditory segregation and fusion estimates. Statistical tests confirmed a significant difference in decoding accuracy between HC and SCZ from approximately 100 to 200 ms. However, apart from this difference in decoding of the unisensory visual estimate and similarly in the visual P3 ERP component, the decoding profiles appeared largely similar across the 2 groups.

To ensure that we did not miss any differences between SCZ and HC that were previously reported in the literature [46,64–68], we also performed standard ERP analyses to test for audiovisual interactions. However, these analyses revealed only an attenuated visual P3 component in SCZ, but no significant differences in audiovisual interactions between the 2 groups. Similarly, we were unable to predict group membership (i.e., HC versus SCZ) from multivariate audiovisual EEG responses significantly better than chance.

To conclude, our behavioral, computational, and neuroimaging results consistently demonstrate that audiovisual perception in a sound-induced flash illusion paradigm is based on comparable computational and neural mechanisms in SCZ and HC. Both SCZ and HC combined audiovisual signals into number estimates in line with the computational principles of Bayesian causal inference. However, our small sample size may have prevented us from detecting more subtle potential alterations in SCZ (cf. see limitations below). Our computational modeling approach moves significantly beyond previous studies assessing audiovisual illusion rates by characterizing patients according to their perceptual inference, decisional strategy, and specific parameters in the winning model. Additional time-resolved EEG decoding revealed that both HC and SCZ performed Bayesian causal inference by dynamically encoding the visual and auditory segregation- and fusion-estimates followed by later estimates that flexibly integrate auditory and visual information according to their causal structure. Collectively, our results thus showed that at least in our limited sample of post-acute medicated SCZ patients, the computations and neural mechanisms of hierarchical Bayesian causal inference in audiovisual numeric perception were largely preserved.

However, several factors limit the scope of these conclusions: First, the sample size was small and only adequate to reveal large effect sizes: Our study was able to detect significant larger differences between SCZ and HC in the numeric prior (e.g., Cohen’s d = 0.675) or larger correlations between the BCI model parameters and psychopathology such as between visual variance and participant’s psychotic symptoms (e.g., r = 0.625). Yet, the small sample size may have precluded the detection of more subtle effects. Our choice of a small sample size was guided by previous research into behavioral multisensory deficits in SCZ that reported large effect sizes for differences between SCZ and controls in small sample sizes between 12 and 30 patients [40,80,81]. However, these previous small-sample studies may have led to an unrealistic overestimation of the effect size due to the “winner’s curse” phenomenon, i.e., the tendency for published studies to report effect sizes that exceed the true effect size [82]. In fact, effect sizes of interindividual differences can generally be expected to be much smaller [83]. However, increasing the patient sample size by 6 patients with schizoaffective disorder yielded highly similar results for the CMB, BCI modeling and decoding of BCI estimates and even turned the overreliance on the numeric prior nonsignificant (see S1 Text and S8–S11 Figs and S6–S8 Tables). Moreover, Bayes factors frequently supported the absence of group differences. Nevertheless, studies with higher power are needed to corroborate that our study did not miss out on subtle alterations in multisensory perceptual inference in SCZ. Second, as a result of our recruitment strategy, our sample included mainly chronic patients with high doses of antipsychotics, relatively intact cognitive functions and low psychosis severity (cf. Table 1 and S5 Fig). By contrast, previous studies reported that overweighting of perceptual priors arises specifically in individuals with acute hallucinations [14], independently from a diagnosis of schizophrenia [11]. Thus, prior overweighting could rather be a state-marker for psychotic episodes than a trait-marker for schizophrenia [12]. Our conclusions of relatively intact multisensory perceptual and causal inference may not hold for individuals with prodromal or acute psychotic states of schizophrenia or patients with persistent post-acute hallucinations. Thus, future studies are needed to assess multisensory perceptual inference in a larger sample of SCZ participants that vary substantially in their psychotic symptoms. It has even been suggested that psychotic symptoms arise on a symptomatic continuum across healthy individuals and patients [84], so that altered weighting of priors in multisensory causal and perceptual inference may be found in nonclinical samples with subclinical psychotic symptoms. In line with this conjecture, a recent study in young nonclinical participants reported that individuals with stronger prodromal psychotic symptoms showed reduced causal priors [50]. Third, our small clinical SCZ sample was still heterogeneous with respect to the duration of illness, comorbidities, the dose and type of antipsychotic medication, and the strength of acute positive and negative symptoms as well as cognitive impairments. This substantial heterogeneity might have obfuscated small alterations in multisensory perceptual and causal inference in SCZ. Fourth, our experimental SIFI paradigm manipulated the number of flashes and beeps to investigate causal inference via numeric disparities. Previous studies have shown a widening of the temporal binding window in SCZ ([42]; see [85] for a review), which may be further investigated in SIFI paradigms with variable audiovisual asynchronies. Indeed, one previous study [42] has varied the audiovisual asynchrony of audiovisual stimuli to assess Bayesian causal inference in SCZ, individuals with autism spectrum disorder (ASD) and controls. Interestingly, the widening of the temporal binding window which was common to both ASD and SCZ could be attributed to different computational parameters. While it resulted from a stronger causal prior or binding tendency in ASD, it arose from larger sensory uncertainty in SCZ. Manipulating audiovisual asynchrony enables more precise estimation of observers’ sensory uncertainty than categorically manipulating the number of flashes and beeps as in our paradigm. Thus, it is possible that subtle increases in sensory uncertainty in SCZ may have gone undetected in our paradigm.

Future research is needed to determine whether deviations from normative Bayesian principles in multisensory perception may occur in larger samples of unmedicated psychotic patients in prodromal or acute stages or patient groups with more pronounced impairments on their attentional and executive functions. Critically, our study focused on simple artificial flash-beep stimuli. This makes it an important future research direction to investigate multisensory perceptual and causal inference in psychosis in more naturalistic situations such as face-to-face communication.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002790

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/