(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Original speech and its echo are segregated and separately processed in the human brain [1]

['Jiaxin Gao', 'Key Laboratory For Biomedical Engineering Of Ministry Of Education', 'College Of Biomedical Engineering', 'Instrument Sciences', 'Zhejiang University', 'Hangzhou', 'Honghua Chen', 'Mingxuan Fang', 'Nai Ding', 'Nanhu Brain-Computer Interface Institute']

Date: 2024-02

Echo-related suppression of temporal modulations

The echoic speech was a mixture of 2 speech signals that only differed by a time delay (Fig 1A). In the following, the leading sound was referred to as the direct sound, and the lagging sound was referred to as the echo. We first constructed a challenging echoic condition in which the echo had the same amplitude as the direct sound. The echo delay was either 0.125 s or 0.25 s (see Methods for rationales). A modulation spectrum analysis clearly revealed that an echo could notch out the temporal modulations at some frequencies that were related to the echo delay (Fig 1B and 1C). Fig 1A illustrates why these notches were created—the Fourier analysis decomposed a signal into sinusoids. If the signal was time shifted by T, forming an echo that had the same amplitude as the direct sound, all its Fourier components would also be shifted by T. Consider a sinusoidal Fourier component whose period was 2T, which had an opposite phase for the original signal and the echo, would get canceled when the original signal and the echo were mixed. The same applied to Fourier components whose periods were 2T/3, 2T/5, etc. In the following, the frequencies of these sinusoidal components that were notched out by the echo, i.e., 1/2T, 3/2T, 5/2T, were referred to as the echo-related frequencies. Furthermore, since previous studies showed that the cortical responses mainly track the speech envelope below 10 Hz [35], we only analyzed the echo-related frequencies below 10 Hz.

The power spectrum analysis could effectively reveal how the speech envelope was influenced by an echo, but it could not be directly applied to quantify the envelope-tracking neural response in the MEG recording, since the power of MEG response is dominated by, e.g., spontaneous neural activity. To isolate the envelope-tracking response, we analyzed the phase coherence between the MEG response and the temporal envelope of direct sound [36–38]. The phase coherence spectrum quantified the phase locking between 2 signals in different frequencies. Specifically, the response phase of each signal was extracted in consecutive time windows using the Fourier transform. If the 2 signals were perfectly synchronized, the phase lag between them would be a constant across all time windows and the phase coherence would reach its maximum, i.e., 1. In contrast, if the 2 signals were independent of each other, their phase lag would be random and the phase coherence would be at a chance level. Our hypothesis, referred to as the envelope restoration hypothesis, was that the auditory system could fully or partially restore the temporal envelope of direct sound, and an alternative hypothesis was that the auditory system faithfully followed the temporal envelope of the echoic speech. We first quantified the prediction of the alternative hypothesis through a simulation, in which the neural response was simply simulated using the envelope of echoic speech. The phase coherence spectrum between the simulated neural response and the envelope of direct sound showed notches at 4 and 12 Hz for speech with a 0.125-s echo and showed notches at 2, 6, 10, and 14 Hz for speech with a 0.25-s echo (Fig 2A). These results demonstrated that if neural activity faithfully tracked the envelope of echoic speech, it would show very low phase coherence (<0.07) with the envelope of the direct sound at echo-related frequencies. Therefore, if the phase coherence between the neural response to echoic speech and the temporal envelope of direct sound is near 0 at echo-related frequencies, the alternative hypothesis is supported. Otherwise, the envelope restoration hypothesis is supported and full restoration is suggested if the phase coherence value is comparable to the neural responses to anechoic speech and direct sound.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Results of Experiment 1. (A) Simulation of neural activity ideally tracking the envelope of echoic speech. (B) Phase coherence spectrum between the MEG response and the envelope of direct sound, averaged over participants and MEG gradiometers. Shaded areas cover 1 SEM across participants on each side. The dashed black line near the bottom shows chance-level phase coherence. Bars on top denote the frequency bins with significant phase coherence (p < 0.05, permutation test, FDR corrected). (C) Phase coherence at echo-related frequencies. Each dot represents 1 individual and error bars represent 1 SEM. Dashed black lines show chance-level phase coherence. Phase coherence significantly higher than chance level and significant differences between conditions are marked (* p < 0.05, ** p < 0.01, permutation test, FDR corrected). (D) Topography of gradiometers for the phase coherence at echo-related frequencies, normalized by dividing its maximum. The underlying data can be found at https://zenodo.org/records/10472483. https://doi.org/10.1371/journal.pbio.3002498.g002

In Experiment 1, the participants listened to the anechoic speech, 0.125-s echoic speech, and 0.25-s echoic speech in separate blocks (S1 Audio), while the cortical responses were recorded using MEG. In the anechoic speech condition, listeners only listened to the direct sound without being accompanied by an echo. In the echoic speech conditions, the magnitude of the echo was the same as the direct sound (0-dB echo). Participants were asked to attend to speech and answer comprehension questions after each block. The average accuracy for question-answering was 95.6% for the 3 conditions. We characterized the neural tracking of the direct sound using phase coherence spectrum between the MEG responses and the temporal envelope of the direct sound, which was in the same way as the simulation analysis in Fig 2A. The phase coherence spectrum was calculated for each gradiometer and the average overall gradiometers was shown in Fig 2B. In the anechoic condition, i.e., when the stimulus only included the direct sound, the phase coherence between the MEG response and speech envelope was significantly above chance below approximately 10 Hz (Fig 2B, p < 0.05, permutation test, false discovery rate (FDR) corrected). Within this frequency range, i.e., below 10 Hz, we would probe whether the MEG activity can track the speech envelope at echo-related frequencies when the participants listening to echoic speech.

For the echoic conditions, the phase coherence was also significantly above chance at the echo-related frequencies (Fig 2B and 2C, p < 0.05 for all echo-related frequencies, permutation test, FDR corrected). The response topography at the echo-related frequency revealed bilateral temporal distribution (Fig 2D). Compared with the anechoic condition, the 0.125-s echoic condition did not show a significant reduction in phase coherence at the echo-related frequency, e.g., 4 Hz, (Fig 2B, left plot, and 2C, p = 0.4235 at 4 Hz, permutation test, FDR corrected) and the 0.25-s echoic condition did not show a significant reduction in phase coherence at 2 Hz (Fig 2B, right plot, and 2C, p = 0.3941 at 2 Hz, permutation test, FDR corrected), suggesting full restoration. At 6 Hz, however, the phase coherence in the 0.25-s echoic condition is significantly above chance but lower than the phase coherence in the anechoic condition (Fig 2B, right plot, and 2C, p = 0.0168, permutation test, FDR corrected), suggesting partial restoration. Furthermore, the phase coherence around 1 Hz was enhanced in the echoic conditions and this difference was statistically significant (S2 Fig, p < 0.001 in 0.125-s, and p = 0.0016 in 0.25-s echoic conditions, permutation test, FDR corrected), which can possibly be explained by the enhancement of 1-Hz modulation power for echoic speech. Taken together, these results suggested that the human auditory system could effectively restore the speech envelope at the echo-related frequencies.

Experiment 1 showed that the temporal envelope of direct sound could be neurally restored when listening to echoic speech, while the underlying mechanisms remain to be explored. One potential mechanism that might contribute to echo suppression is neural adaptation. To test whether neural adaptation was sufficient to explain the MEG response to the echoic speech, we simulated the adapted neural response [9] and calculated the phase coherence between the simulated neural response and the envelope of direct sound (Figs 3B and S1). Simulation showed that the phase coherence at echo-related frequencies was boosted by neural adaptation for echoic speech, but it remained lower for echoic speech than anechoic speech (Fig 3B), suggesting that neural adaptation could partially restore the envelope of direct sound. Therefore, neural adaptation could possibly explain the neural response at 6 Hz when listening to 0.25-s echoic speech, but could not fully explain the neural responses at 2 and 4 Hz for the 0.25-s and 0.125-s echoic speech conditions.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Results of Experiment 2. (A) Illustration of neural adaptation. The left part shows the echoic stimulus when the echo was 0 dB and 6 dB, the pulse with the blue background is the direct sound, while the pulse with the yellow background is the echo. The right part shows the adapted neural response simulated using a computational model (Mesgarani and colleagues). (B) Phase coherence spectrum between the simulated adapted neural response to echoic speech and the envelope of direct sound. Simulation based on another 2 models is shown in S1 Fig. (C–E) The MEG results of Experiment 2 are shown with the same conventions in Fig 2B–2D. The underlying data can be found at https://zenodo.org/records/10472483. https://doi.org/10.1371/journal.pbio.3002498.g003

When the echo and the direct sound have equal amplitude, their Fourier components at echo-related frequencies are fully canceled. When the echo amplitude deviates from the amplitude of the direct sound, it is less effective at canceling the speech envelope at echo-related frequencies. Neural adaptation breaks the balance between the neural responses to the echo and the direct sound by attenuating the echo response, and therefore recovers the temporal envelope at echo-related frequencies. Attenuating the neural response to the echo, however, did not always have a positive effect on restoring the envelope of direct sound. For example, if the echo was stronger than the direct sound (Fig 3A), attenuating the echo response could make the neural responses to echo and the direct sound had a more similar amplitude, canceling the temporal envelope at echo-related frequencies. Motivated by this idea, we constructed Experiment 2, in which the echo was more intense than the direct sound (S2 Audio), to further probe whether neural restoration of speech envelope could be well explained by neural adaptation. If neural adaptation played a dominant role in restoring the speech envelope, the phase coherence between the neural response to echoic speech and the temporal envelope of direct sound should significantly reduce at echo-related frequencies compared with Experiment 1.

Experiment 2 was the same as Experiment 1 except that the echo was 6 dB more intense than the direct sound. Simulations showed that the neural adaptation aggravated, instead of alleviating, the loss of speech envelope at echo-related frequencies in Experiment 2 (Fig 3B). Participants correctly answered 94.4%, 97.2%, and 100% questions in the anechoic, 0.125-s echoic and 0.25-s echoic conditions, suggesting that they had no troubling understanding the echoic speech even if the echo is more intense than the direct sound. Compared with Experiment 1, the phase coherence in Experiment 2 was lower at 4 Hz (p = 0.003, bootstrap, FDR corrected) in the 0.125-s echoic condition and was also lower at 6 Hz in the 0.25-s echoic condition (p = 0.01, bootstrap, FDR corrected). No significant difference across Experiments 1 and 2 was observed at 2 Hz in the 0.25-s echoic condition (p = 0.8031, bootstrap, FDR corrected). Additionally, the phase coherence between the MEG response and the envelope of direct sound showed that, for 4 Hz in 0.125-s echoic condition, the phase coherence showed a significant reduction compared to anechoic condition (Fig 3C, left plot, and 3D, p = 0.0176, bootstrap, FDR corrected), but was still above chance level (Fig 3C, left plot, and 3D, p < 0.001, permutation test, FDR corrected), suggesting partial restoration in the 0.125-s echoic condition. For 2 Hz in the 0.25-s echoic condition, the phase coherence was above chance level at 2 Hz (Fig 3C, right plot, and 3D, p < 0.001, permutation, FDR corrected) and did not show a significant reduction compared to the anechoic condition at 2 Hz (Fig 3C, right plot, and 3D, p = 0.4781, permutation test, FDR corrected), suggesting full restoration. For 6 Hz in the 0.25-s echoic condition, the phase coherence was not significantly higher than the chance level (Fig 3C, right plot, and 3D, p = 0.4781, permutation test, FDR corrected), which was similar to the simulation of neural adaptation. Taken together, consistent with Experiment 1, neural adaptation could possibly explain the neural response in Experiment 2 at 6 Hz, but could not fully explain the neural response at 2 and 4 Hz, suggesting that additional mechanisms are required to restore the speech envelope in echoic conditions.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002498

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/