(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Neural dynamics differentially encode phrases and sentences during spoken language comprehension [1]

['Fan Bai', 'Max Planck Institute For Psycholinguistics', 'Nijmegen', 'The Netherlands', 'Donders Institute For Brain', 'Cognition', 'Behaviour', 'Radboud University', 'Antje S. Meyer', 'Andrea E. Martin']

Date: 2022-07

We first calculated the power activation using DORA’s PO-units, where differences in binding between units initially occurs as the structures unfolded. The spectrally decomposed activity is shown in Fig 1C . Statistical analysis using paired-sample t-test on the frequency combined response suggested that power of activation was significantly higher for sentences than for phrases (t (99) = 8.40, p < 3.25e-13, ***). The results are shown in Fig 1D . We then calculated the level of power coupling (connectivity) between PO-units and S-units for both conditions ( Fig 1E ). Statistical inference using paired sample t-test indicated that the level of power coupling was significantly higher for sentences than for phrases (t (99) = 251.02, p < 1e-31, ***). The results are shown in Fig 1F . As we are also interested in the difference of phase alignment between the two conditions, we calculated phase coherence and phase coupling. Fig 1G shows the results of phase coherence on PO-units, statistical inference conducted using paired sample t-test on the averaged phase coherence suggested that the phase coherence for the sentences was significantly higher than the phrases ( Fig 1H , t (99) = 10.24, p < 1e-20, ***). Fig 1I shows the results of phase coupling between PO-units and S-units, statistical comparison using a paired sample t-test on the frequency-averaged phase synchronization indicated that the level of phase synchronization was significantly higher for sentences than for phrases ( Fig 1J , t (99) = 296.03, p < 1e-39, ***). In sum, based on the time-based binding mechanism, the results of the simulation demonstrated that the degree of phase synchronization between layers of nodes varies when different types of syntactic structure are processed. In addition, intersite connectivity for both power and phase should be higher for sentences than phrases. Due to the unknowable temporal and spatial aspects that are inherent in simulation (e.g., missing information as to where the neural ensembles modeled in the simulation occur in the brain and how this distribution then affects the measurement of coupling), we conducted our EEG experiment in order to observe whether empirical measurements of phase coupling can function as a neural readout that is sensitive to linguistic structure.

Simulation results based on the time-based binding hypothesis. (a) and (b) The model representation of phrases and sentences, in which the P (Proposition units), RB (Role-filler units), PO (Propositional Object units), and S (syllables units) represent the different types of node in DORA. P(DP) represents the top-level unit is a DP and P(s) represents the highest-level unit is a sentence. (c) Simulation results on power, in which the red dotted line and blue solid line represent the frequency response of the sentences and the phrases, respectively. The shading area covers 2 SEM centered on the mean. (d) Statistical comparison on the frequency combined power using paired sample t test suggested that the power for the sentences was significantly higher than the phrases (t (99) = 8.40, p < 3.25e-13, ***). (e) Results of power coupling between PO units and S units, where the red and blue squares show the frequency separated coupling level for the sentences and the phrases, respectively. (f) Statistical comparison on the level of power coupling using paired sample t test suggested that the power coupling level for the sentences was significantly higher than the phrases (t (99) = 251.02, p < 1e-31, ***). (g) Results of phase coherence, in which the red-dotted line and blue-solid line shows the phase coherence of the sentences and the phrases, respectively. The shading area represents 2 SEM centered on the mean. (h) Statistical comparison of the phase coherence was conducted using paired sample t test on the frequency averaged response. The comparison indicates that the phase coherence of the sentences was significantly higher than that of the phrases (t (99) = 10.24, p < 1e-20, ***). (i) Phase coupling between PO units and S units, the red and blue circles show the level of phase synchronization of the sentences and the phrases, respectively. (j) . Statistical comparison on the level of phase coupling between the phrases and the sentences using paired sample t test suggested that the phase coupling level for the sentences was significantly higher than the phrases (t (99) = 296.03, p < 1e-39, ***). DP, determiner phase.

To test the predictions based on the time-based binding mechanism, we performed a simulation using the two different types of linguistic structure, namely phrases and sentences (for details, see Methods ). The model’s representations of the two syntactic structures are shown in Fig 1A and 1B . Strong evidence for a link between time-based binding and observed data would be the ability to make fine-grained quantitative predictions about the magnitude of a coupling effect and its timing. However, there is a major impediment to that state of affairs—namely that precise timing predictions for coupling cannot be derived from any model without explicit knowledge about interregion connectivity. Another way to describe this constraint is that nodes in the model represent groups of neurons that could potentially reside at different locations in the brain (viz., DORA is a model of aggregate unit activity, where nodes represent population-level responses, not the responses of individual neurons). Given the varied properties of the intersite communication in both temporal and spectral dimensions [ 52 , 53 , 78 ], we cannot give a specific prediction regarding either the frequencies nor the positions (in the brain) of the effect, or of the precise temporal distribution of the effect, other than the time window in which it should occur given how linguistic structure is represented using time-based binding in DORA. In other words, we can simulate the boundary conditions on phase coupling given the representational and mechanistic assumptions made by time-based binding and DORA for representing and processing linguistic structure. The initial binding difference begins at approximately the second S-unit (PO-2 for both conditions), or the onset of the second syllable. We predict that the phase coherence difference should begin to occur within the range of approximately 250 ms to 500 ms. In addition, in line with previous studies, we predict that low-frequency neural oscillations will be highly involved in the discrimination of the two types of linguistic structure. To un-bias the estimation, all information for statistical inference was extracted in a frequency combined approach (<13.5 Hz). Based on the unfolding of syntactic structure in the computational model, we demonstrate that increased power and phase coherence occurs in the model during the time window of 250 to 1000 ms.

The results indicated that the low-frequency phase coherence could reliably distinguish phrases and sentences, consistent with the hypothesis that low-frequency phase coherence represents cortical computations over speech stimuli [ 9 , 10 , 25 , 27 , 29 , 42 – 45 , 79 ]. Our findings therefore suggest that low-frequency phase coherence contributes to the comprehension of syntactic information. Given that the acoustics of the phrase and sentence conditions were statistically indistinguishable (see Methods for acoustic normalization and analysis), the phase coherence difference may instead reflect the formation or processing of syntactic structures via endogenous recruitment of neural ensembles.

Statistical analysis on the phase coherence (ITPC) was conducted using a nonparametric cluster-based permutation test (1,000 times) on a 1,200-ms time window, which started at the audio onset and over the frequencies from 2 Hz to 8 Hz. The results indicated that the phase coherence was higher for the sentences than the phrases (p < 1e-4 ***, 2-tailed). (a) The temporal evolution of the cluster that corresponds to the separation effect. The activity was drawn using the ITPC of the phrases minus the ITPC of the sentences. The topographies were plotted in steps of 50 ms. (b) ITPC averaged over all the sensors in the cluster. The upper panel and the lower panel show the ITPC of the phrases and the sentences, respectively. ITPC, intertrial phrase coherence.

Fig 2A shows the temporal evolution, in steps of 50 ms, of the discrimination effect which is computed as the averaged phase coherence of the phrases minus the averaged phase coherence of the sentences. Fig 2B shows the time–frequency decomposition using all the sensors in the cluster, in which the upper and lower panel are the plot for the phrase condition and the sentence condition, respectively.

To answer our first question, whether low-frequency neural oscillations distinguish phrases and sentences, we calculated phase coherence (for details, see Methods ). We then performed a nonparametric cluster-based permutation test (1,000 permutations) on a 1,200-ms time window starting at the audio onset and over the frequencies from 1 Hz to 8 Hz. The comparison indicated that phase coherence was significantly higher for sentences than phrases (p < 1e-4 ***, 2-tailed). In the selected latency and frequency range, the effect was most pronounced at central electrodes.

The analysis indicated that the phase connectivity degree over the sensor space at the low-frequency range (approximately <2 Hz) could reliably separate the two syntactically different stimuli and that the effect was most prominent at the right posterior region.

Since the statistical analysis indicated a difference in degree of phase connectivity between phrases and sentences, we assessed how this effect was distributed in the sensor space. To do so, we extracted all binarized connectivity matrices that corresponded to the time and frequency range of the cluster and averaged all the matrices in this range for both conditions (for details, see Methods ). Fig 3C shows the averaged matrix representation of the sentence condition minus the averaged matrix representation of the phrase condition. This result suggests that the connectivity difference was mainly localized at the frontal–central area. After extracting the matrix representation, we used all sensors of this cluster as seeds to plot connectivity topographies for both conditions. Fig 3D shows the pattern of the thresholded phase connectivity. The black triangles represent the seed sensors. The upper panel and lower panel represent the phrase and the sentence condition, respectively. The figure shows how the phase connectivity (synchronization) is distributed on the scalp in each condition. From this figure we can see that the overall degree of the phase connectivity was stronger for the sentence condition than the phrase condition.

Statistical analysis on the phase connectivity degree was conducted using a nonparametric cluster-based permutation test (1000 times) on a 3500-ms time window, which started at the audio onset and over the frequencies from 1 Hz to 8 Hz. The results indicated that the phase connectivity degree was higher for the sentences than the phrases (p < 0.01**, 2-tailed). (a) The temporal evolution of the cluster. The activity was drawn by using the averaged connectivity degree of the phrases minus the connectivity degree of the sentences. The topographies were plotted in steps of 100 ms. (b) The time–frequency decomposition of the connectivity degree, which was averaged over all the sensors in the cluster. The left and the right panel show the connectivity degree of the phrase condition and the sentence condition, respectively. (c) The matrix representation of the phase connectivity differences between the phrases and the sentences. The figure was drawn by using the averaged connectivity matrix of the sentences minus the averaged connectivity matrix of the phrases. (d) All the sensors in this cluster were used as the seed sensors to plot the topographical representation of the phase connectivity. The upper panel and the lower panel show the phase connectivity of the phrases and the sentences, respectively.

Fig 3A shows the temporal evolution of the separation effect, which is represented by the connectivity degree of the phrase condition minus the connectivity degree of sentence condition (in steps of 100 ms). Fig 3B shows the time–frequency decomposition of the phase connectivity degree, which is averaged across all sensors in the cluster. The left and right panel are the time–frequency plot for the phrase condition and the sentence condition, respectively.

We initially calculated phase connectivity over the sensor space by ISPC at each time–frequency bin (for details, see Methods ). We then used a statistical threshold to transform each connectivity representation to a super-threshold count at each bin. After baseline correction, we conducted a 1000-time cluster-based permutation test on a 3500-ms time window starting at the audio onset and over the frequencies from 1 Hz to 8 Hz to compare the degree of the phase connectivity between phrases and sentences (for details, see Methods ). Phrases and sentences showed a significant difference in connectivity (p < 0.01 **, 2-tailed). The effect corresponded to a cluster extended from approximately 1800 ms to approximately 2600 ms after the speech stimulus onset and was mainly located at a very low frequency range (approximately <2 Hz). In the selected latency and frequency range, the effect was most pronounced at the right posterior region.

Fig 4C shows the topographical representation of the PAC-Z. The activity in these figures was the averaged PAC-Z values over the ROI. The results indicate that when the participants listened to the speech stimuli, PAC was introduced symmetrically at both hemispheres over the central area. This could be evidence for the existence of PAC when speech stimuli are being processed. However, both the parametric and the nonparametric statistical analysis failed to show a significant difference of the PAC-Z between phrases and sentences, which means we do not have evidence to show that the PAC was related to syntactic information processing. Therefore, our results suggest the PAC could be a generalized neural mechanism for speech perception, rather than a mechanism specifically recruited during the processing of higher-level linguistic structures.

The figure shows a z-score transformed PAC, PAC-Z. (a) The PAC-Z for the phrases and the sentences at each hemisphere. Each figure was created by averaging 8 sensors which showed the biggest PAC-Z over the ROI. A z-score transformation with Bonferroni correction was conducted to test the significance, which lead to the threshold to be 3.73 corresponding to p-value equals 0.05. (b) The figure shows how sensors were selected at each hemisphere. The bigger the red circle indicates the more times this sensor was selected across participants. (c) The topographical distribution of the PAC-Z, which indicates the PAC was largely localized at the bilateral central areas. PAC, phase amplitude coupling; ROI, region of interest.

To assess whether phase amplitude coupling (PAC) distinguished phrases and sentences, we calculated the PAC value at each phase–amplitude bin for each condition and then transformed it to the PAC-Z (for details, see Methods ). The grand average (average over sensors, conditions and participants) of the PAC-Z showed a strong activation over a region from 4 Hz to 10 Hz for the frequency of phase and from 15 Hz to 40 Hz for the frequency of amplitude. We therefore used the averaged PAC-Z value in this region as the region of interest (ROI) for sensors clustering. For each participant, we first selected 8 sensors that had the highest PAC-Z (conditions averaged) at each hemisphere. Averaging over sensors was conducted separately for the conditions (phrase and sentence) and the two hemispheres (see Fig 4A ). The Bonferroni correction was performed to address the multiple comparison problem. This resulted in the z-score of 3.73 for p-value equals 0.05 (the z-score corresponded to the p-value equals 0.05 divided 11 (the number of phase bins) *12(the number of amplitude bins) *4(the number of conditions)). From the results, we can see that there was a strong low-frequency phase (4 Hz to 10 Hz) response entrained to high frequency amplitude (15 Hz to 40 Hz). The results indicate that the PAC was introduced when participants listened to the speech stimuli.

Statistical analysis on the induced activity was conducted using a nonparametric cluster-based permutation test (1000 times) on a 1000-ms time window, which started at the audio onset and over the frequencies from 7.5 Hz to 13.5 Hz. The results indicated that the power was higher for sentences than phrases (p < 0.01 **, 2-tailed). (a) The temporal evolution of the cluster that corresponds to the discrimination effect. The activity was drawn by using the averaged induced power of the phrases minus the averaged induced power of the sentences. The topographies were extracted in steps of 50 ms. (b) Induced power averaged over all the sensors in this cluster. The upper panel and the lower panel show the induced power of the phrases and the sentences, respectively.

To query whether neural oscillations at the alpha band reflect the processing of syntactic structure, we calculated the induced power. The grand average (over all participants and all conditions) of the induced power showed a strong inhibition at the alpha band (approximately 7.5 to 13.5 Hz). Therefore, we checked whether this alpha-band inhibition could separate the two types of linguistic structures. Statistical analysis was conducted using a nonparametric cluster-based permutation test (1000 times) over the frequencies of alpha band with a 1000 ms time window that started at the audio onset (for details, see Methods ). The results indicated that the alpha-band inhibition was stronger for the phrase condition than the sentence condition (p < 0.01 **, 2-tailed). In the selected time and frequency range, the effect corresponded to a cluster that lasted from approximately 350 ms to approximately 1,000 ms after the audio onset and was largely localized at the left hemisphere, although the right frontal–central sensors were also involved during the temporal evolution of this cluster . Fig 5A shows the temporal evolution of this cluster in steps of 50 ms using the induced power of the phrase condition minus the induced power of the sentence condition. Fig 5B shows the time–frequency plot of the induced power using the average of all the sensors in this cluster. The upper and lower panel shows the phrase condition and the sentences condition, respectively. From these figures, we can see that the alpha-band inhibition was stronger for the phrase condition than the sentence condition. These results show that the processing of phrases and sentences is reflected in the intensity of the induced neural response in the alpha band.

Since the degree of the power connectivity over the alpha-band indicated a separation between phrases and sentences, we also checked how this difference was distributed in the sensor space. To do so, we extracted the binarized power connectivity representations (matrices) that are located in the ROI, then averaging was performed for each condition across all connectivity matrices. Fig 6D shows the difference of the power connectivity degree over the sensor space using the average of the binarized sentence connectivity matrix minus the average of the binarized phrase connectivity matrix. The results indicate that the inhibition of power connectivity was stronger for phrases than for sentences. In other words, the overall level of power connectivity was higher for sentences than for phrases. Fig 6E is the topographical representation of the power connectivity, which was plotted using the binarized power connectivity of the selected sensors. The upper and lower panel are the phrase condition and the sentence condition, respectively. From this figure, we can see that the difference was largely localized at the bilateral central area and more strongly present at the left than the right hemisphere. These results reflect that the neural network which was organized by the intensity of the induced power at the alpha band was different for phrases and sentences.

(a) Power connectivity degree for all conditions. Each plot was clustered by the sensors at each hemisphere that showed the biggest inhibition on the grand averaged power connectivity. (b) The results of a two-way repeated measures ANOVA for the power connectivity on the factors of stimulus-type (phrase or sentence) and hemisphere (left or right). The results indicate a significant main effect of stimulus-type, post-hoc comparison on the main effect indicated that the overall inhibition level of the power connectivity was stronger for the phrases than the sentences (t (29) = 2.82, p = 0.0085 **, two-sided). (c) How sensors were selected for the clustering. The size of the red circle indicates the more times the sensor was selected across participants. (d) The connectivity differences between the phrases and the sentences on all sensor pair. The figure was drawn using the average of the binarized connectivity matrix of the sentences minus the matrix of the phrases. The results indicate that the connectivity degree over the sensor space for the sentences was higher than the phrases. (e) Topographical representation of the binarized connectivity, which was clustered using the sensors showed biggest inhibition on the power connectivity. The upper and lower panel shows the phrase and sentence condition, respectively.

Fig 6A shows the power connectivity degree which was averaged over all participants for each condition. To check whether power connectivity degree could separate the phrases and the sentences, a Stimulus-Type*Hemisphere two-way repeated measures ANOVA was conducted. The comparison revealed a main effect of Stimulus-Type (F (1, 14) = 5.28, p = 0.033 *). Planned post-hoc comparisons using paired sample t-tests on the main effect of Stimulus-Type showed that the power connectivity inhibition was stronger for the phrases than the sentences (t (29) = 2.82, p = 0.0085 **). Fig 6B shows the power connectivity degree for each extracted condition. Fig 6C shows how sensors were selected. The size of the red circle indicates the more times a sensor was selected.

We calculated power connectivity in each sensor-pair at each time–frequency bin using Rank correlation (for details, see Methods ). The grand average (over all participants and all conditions) of the power connectivity level showed a strong inhibition at the alpha band from 100 ms to 2200 ms after the audio onset. This region, which showed a strong power connectivity inhibition, was defined as the ROI. For each participant, we selected 8 sensors at each hemisphere that showed the biggest inhibition on the condition averaged power connectivity. Averaging across all the selected sensors was followed up, which resulted in 4 conditions for each participant (left phrase, left sentence, right phrase, and right sentence).

Different encoding states for phrases versus sentences in both temporal and spectral dimensions

Previous research has shown that the low-frequency neural response reliably reflects the phase-locked encoding of the acoustic features of speech [72,73]. Therefore, we initially tested whether the neural response from all canonical frequency bands could equally reflect the encoding of the acoustic features. To do so, we fitted the STRF for each condition at all frequency bands, which are Delta (<4 Hz), Theta (4 to 7 Hz), Alpha (8 to 13 Hz), Beta (14 to 30 Hz), and Low Gamma (31 to 50 Hz), respectively. Then we compared the real performance of the STRFs to the random performance of them (for details, see Methods). Fig 7A shows the results of this comparison. The blue and red dots represent the real performance of the STRFs, the error bar presents 1 SEM on each side. The small gray dots represent the random performance (1000 times in each frequency band per condition). The upper boarder, which is delineated by these gray dots represents the 97.5 percentiles of the random performance. The performance of the STRFs was above chance level only at low frequencies (delta and theta), which is consistent with previous research [72,73]. Our results confirm that the low-frequency STRF reliably reflects the relationship between the acoustic features of speech and the neural response at low frequencies.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 7. Acoustic features are encoded differently between phrases and sentences in a phase locked manner. (a) Comparison between the real performance and the random performance of the STRF in each canonical frequency band. The results suggested that only the performance of the STRF in the delta band (<4 Hz) and theta band (4 to 8 Hz) was statistically better than the random performance. The blue and red dots represent the real performance of the STRFs for the phrases and the sentences, respectively. The error bar represents two SEM centered on the mean. The gray dots represent the random performance drawn by permutations. (b) The performance of the low-frequency range (<8 Hz) STRF averaged across all participants. The solid blue and red dot represent the averaged performance across all testing trials. The error bar represents two SEM across the mean. The transparent blue and red dots represent the model’s performance on each testing trial for the phrases and the sentences, respectively. The results indicated no performance difference on the kernel between the phrases and the sentences. (c) The comparison between the real neural response (dashed lines) and the model predicted response (solid blue for the phrase, solid red for the sentence) at a sample sensor Cz. The results suggest that the STRFs performed equally well for the phrases (r = 0.47, p < 1e-5 ***, n = 1,024) and the sentences (r = 0.41, p < 1e-5 ***, n = 1,024). (d) The clustered STRF using the selected sensors showing the biggest activity (negative) on the ROI. The figure on the left and right side of the upper panel represents the clustered STRF for the phrases at the left and right hemisphere, respectively. The corresponding position of the lower panel represents the clustered kernel for the sentences. (e) The figure shows how the sensors were selected, in which the bigger the red circle represents the more times the sensor was selected across all participants. (f) The TRFs that were decomposed from the STRFs, in which the blue and red lines represent the phrases and the sentences, respectively. The solid and the dashed lines represent left and right hemisphere, respectively. (g) The comparison of the magnitude of the TRFs. The blue and the red bars represent the phrases and the sentences, respectively. The error bar shows 1 SEM across the mean on each side. A 3-way repeated measure ANOVA of the peak magnitude was conducted on the factors of Stimulus-type (phrase or sentence), Hemisphere (left or right), and Peak-type (approximately 100 ms or approximately 300 ms). The results indicated a main effect of Stimulus-type and a 3-way interaction. The post hoc comparison on the main effect of Stimulus-type suggested that the amplitude (negative) was stronger for the phrase condition than the sentence condition (t (59) = 4.55, p < 2e-5 ***). To investigate the 3-way, Stimulus-type*Peak-type*Hemisphere, interaction, two 2-way repeated measure ANOVA with the Bonferroni correction were conducted on the factors of Hemisphere and Stimulus-type at each level of the Peak-type. The results indicated a main effect of Stimulus-type at the first peak (F (1, 14) = 8.19, p = 0.012 *) and a 2-way Hemisphere*Stimulus-type interaction at the second peak (F (1, 14) = 6.42, p = 0.023 *). At the first peak, a post hoc comparison on the main effect of Stimulus-type was conducted using a paired sample t tests, the results showed that the magnitude of the phrase condition was higher than the magnitude of the sentence condition (t (29) = 3.49, p = 0.001 ***). For the 2-way, Hemisphere*Stimulus-type, interaction at the second peak, the paired sample t tests with Bonferroni correction was conducted to compare the difference of the magnitude between the phrases and the sentences at each hemisphere. The results indicate that the magnitude at the second peak was stronger for the phrase condition than the sentence condition in the right hemisphere (t (14) = 3.21, p = 0.006 **), but not the left hemisphere (t (14) = 0.86, p = 0.40). (h) The comparison of the peak latency of TRFs, the blue and the red bars represent the phrases and the sentences, respectively. The error bar shows 1 SEM across the mean on each side. A 3-way repeated measure ANOVA of the peak latency was conducted on the factors of Stimulus-type (phrase or sentence), Hemisphere (left or right) and Peak-type (approximately 100 ms or approximately 300 ms). The results indicated a main effect of Peak-type and a 3-way interaction. The post hoc comparison on the main effect of Peak-type suggested that the latency of the first peak was significantly faster than the second peak (t (59) = 38.89, p < 2e-16 ***). The post hoc comparison on the 3-way interaction with the Bonferroni correction on the factors of Hemisphere and Stimulus-type for each level of the Peak-type suggested a 2-way Hemisphere*Stimulus-type interaction at the first peak (F (1, 14) = 12.83, p = 0.002**). The post hoc comparison on this 2-way interaction using paired sample t tests with the Bonferroni correction indicated that the latency at the first peak was significantly longer for the sentences than the phrases at the right hemisphere (t (14) = 3.55, p = 0.003 **), but not the left hemisphere (t (14) = 0.58, p = 0.56). (i) The SRFs which were decomposed from the STRFs, in which the blue and the red lines represent the phrases and the sentences, respectively, the solid and the dashed lines represent the left and right hemisphere, respectively. (j) The comparison of the amplitude of the SRFs. The SRF was first separated into 3 bands, low (<0.1 kHz), middle (0.1 to 0.8 kHz) and high (>0.8 kHz) based on the averaged frequency response of the STRF, then a 3-way repeated measure ANOVA of the amplitude was conducted on the factors of Stimulus-type (phrase or sentence), Hemisphere (left or right) and Band-type (low, middle, and high). The results indicated a main effect of Band-type (F (2, 28) = 119.67, p < 2e-14 ***) and a 2-way, Band-type*Stimulus-type, interaction (F (2, 28) = 27.61, p < 3e-7 ***). The post hoc comparison on the main effect of Band-type using paired sample t tests with Bonferroni correction showed that the magnitude of the middle frequency band was stronger than the low-frequency band (t (59) = 17.9, p < 4e-25 ***) and high frequency band (t (59) = 18.7, p < 5e-26 ***). The post hoc comparison using paired sample t tests with the Bonferroni correction on the, Band-type*Stimulus-type, interaction showed that the amplitude of the SRF was stronger for the phrases than the sentences only at middle frequency band (t (29) = 4.67, p < 6e-5 ***). ROI, region of interest; STRF, spectral temporal response function; TRF, temporal response function. https://doi.org/10.1371/journal.pbio.3001713.g007

Since only low-frequency neural responses robustly reflected the encoding of the speech stimuli, we fitted the STRF for both conditions using the neural response that was low-pass filtered at 9 Hz. Leave-one-out cross-validation was used to maximize the performance of the STRFs (for details, see Methods). Fig 7B shows the performance of the STRF for each condition. The transparent dots, blue for phrases and red for sentences, represent the model’s performance on each testing trial. The solid dots represent the model’s performance that was averaged over all trials, the error bars represent one standard error of the mean (SEM) on each side of the mean. A paired sample t-test was used to compare the performance between the phrase and sentence conditions. No evidence was found to indicate a performance difference between the two conditions (t (74) = 1.25, p = 0.21). The results indicated that the STRF were fitted equally well for phrases and sentences. Thus, any difference in temporal–spectral features between the STRF of phrases and sentences cannot be driven by the model’s performance. Fig 7C shows the comparison between the real neural response and the model predicted response at the sample sensor Cz. The upper and lower panel shows the performance of the STRF to phrases (r = 0.47, N = 1,024, p < 1e-5 ***) and sentences (r = 0.41, N = 1,024, p < 1e-5 ***), respectively.

The grand average of the STRFs was negative from 0 to 400 ms in the time dimension and from 100 to 1000 Hz in the frequency dimension, and the sensor clustering of the STRF was conducted based on the averaged activation on this ROI. More concretely, we selected 8 sensors at each hemisphere for each participant, which showed the strongest averaged magnitude (negative) in this region. Fig 7D shows the clustered STRFs that were averaged across all participants. Fig 7E shows how the sensors were selected across the participants, in which the bigger the red circle indicates the more times a given sensor was selected.

To compare the differences of the kernel (STRF) in both the temporal and spectral dimensions, the temporal response function (TRF) and the spectral response function (SRF) were extracted separately for each condition (for details, see Methods).

Fig 7F shows the TRFs that were averaged across all participants. The grand average of all TRFs showed 2 peaks at approximately 100 ms and approximately 300 ms. We therefore defined the first temporal window from 50 to 150 ms (center at 100 ms) and the second temporal window from 250 to 350 (center at 300 ms) for searching the magnitude and the latency of these 2 peaks. The latency of each peak was defined as the time when it appeared. The magnitude of each peak was defined as the average magnitude over a 5 ms window on both sides around it. After extracting the magnitude and the latency of these 2 peaks, a Stimulus-type*Peak-type*Hemisphere 3-way repeated measures ANOVA was conducted on both the magnitude and the latency.

For the magnitude of the TRF (Fig 7G), the statistical comparison showed a significant main effect of Stimulus-type (F (1, 14) = 13.58, p = 0.002 **) and a significant three-way, Stimulus-type*Peak-type*Hemisphere, interaction (F (1, 14) = 15.25, p = 0.001 ***).

The post-hoc comparison on the main effect of Stimulus-Type using paired-sample t tests showed that the magnitude for phrases was significantly larger than the magnitude for sentences (t (59) = 4.55, p < 2e-5 ***). This result suggests that the instantaneous neural activity in response to phrases had a stronger phase-locked dependency on the acoustic features than in response to sentences.

To investigate the three-way, Stimulus-type*Peak-type*Hemisphere, interaction, two two-way repeated measures ANOVAs with the Bonferroni correction were conducted on the factors of Hemisphere and Stimulus-type at each level of the Peak-type. The results indicated a main effect of Stimulus-Type at the first peak (F (1, 14) = 8.19, p = 0.012 *) and a two-way Hemisphere*Stimulus-Type interaction at the second peak (F (1, 14) = 6.42, p = 0.023 *).

At the first peak, we conducted a post-hoc comparison on the main effect of Stimulus-type using a paired sample t-tests, which showed that the magnitude of the phrase condition was higher than the magnitude of the sentence condition (t (29) = 3.49, p = 0.001 ***). The results indicate that the instantaneous neural activity was more strongly driven by the acoustic features that were presented approximately 100 ms ago when phrases were presented compared to when sentences were presented.

For the two-way, Hemisphere*Stimulus-Type, interaction at the second peak, the paired- sample t-tests with Bonferroni correction was conducted to compare the difference of the magnitude between phrases and sentences at each hemisphere. The results indicated that the magnitude at the second peak was stronger for phrases than sentences in the right hemisphere (t (14) = 3.21, p = 0.006 **), but not the left hemisphere (t (14) = 0.86, p = 0.40). The findings suggest that, at the right hemisphere, the instantaneous neural activity of the phrases was more strongly driven by the acoustic features that were present approximately 300 ms than it was under sentences.

For the latency of the TRF (Fig 7H), the comparison showed a main effect of the Peak-type (F (1, 14) = 1e+3, p < 1e-14 ***) and a three-way, Stimulus-type*Peak-type*Hemisphere, interaction (F (1, 14) = 8.04, p = 0.013 *).

The post-hoc comparison for the main effect of the Peak-type with paired-sample t-tests showed, as expected, that the latency of the first peak was significantly shorter than the second one (t (59) = 38.89, p < 2e-16 ***). The result is clear since regardless of search method for the analysis time windows, the latency of the first peak will always be earlier than the second peak.

To investigate the 3-way, Stimulus-type*Peak-type*Hemisphere, interaction, two two-way repeated measures ANOVA with the Bonferroni correction were conducted on the factors of Hemisphere and Stimulus-type for each level of the Peak-type. The comparison suggested a two-way Hemisphere*Stimulus-Type interaction at the first peak (F (1, 14) = 12.83, p = 0.002 **). The post hoc comparison on this two-way interaction using paired-sample t-tests with the Bonferroni correction indicated that the latency at the first peak was significantly later for sentences than for phrases at the right hemisphere (t (14) = 3.55, p = 0.003 **), but not the left hemisphere (t (14) = 0.58, p = 0.56). The results suggest that, within the first temporal window (approximately 50 to 150 ms), only at the right hemisphere, the neural response to sentences was predominantly driven by acoustic features that occurred earlier in time than the acoustic features that drove the neural response of the phrases.

Fig 7I shows the SRFs that were averaged across all participants. The grand average of the STRFs suggested that the activation of the kernel was most prominent in the frequency range from 0.1 kHz to 0.8 kHz. To compare the differences of the neural encoding of the acoustic features in the spectral dimension, we separated the SRF into 3 frequency bands, which were lower than 0.1 kHz, 0.1 to 0.8 kHz and higher than 0.8 kHz. We then averaged the response in each extracted frequency band for each condition. The statistical comparison was conducted using three-way repeated measures ANOVA on the factors of Hemisphere, Stimulus-type and Band-type. The results (Fig 7J) indicated a main effect of Band-type (F (2, 28) = 119.67, p < 2e-14 ***) and a 2-way, Band-type*Stimulus-type, interaction (F (2, 28) = 27.61, p < 3e-7 ***).

The post-hoc comparison on the main effect of Band-type using paired-sample t-tests with Bonferroni correction showed that the magnitude of the middle frequency band was stronger than that of the low-frequency band (t (59) = 17.9, p < 4e-25 ***) and high-frequency band (t (59) = 18.7, p < 5e-26 ***). The results indicated that the acoustic features from different frequency bands contributed differently to the evoked neural response. In other words, for both conditions, the neural response was predominantly driven by the encoding of acoustic features from 0.1 kHz to 0.8 kHz, which was considered as the spectral–temporal features at the range of the first formant [80–83].

The post hoc comparison using paired-sample t-tests with the Bonferroni correction on the, Band-type*Stimulus-Type, interaction showed that the amplitude of the SRF was stronger for the phrase condition than the sentence condition only at the middle frequency band (t (29) = 4.67, p < 6e-5 ***). The results indicate that at the middle frequency range, the neural response of phrases was more strongly predicted solely by modeling the encoding of the acoustic features than it was in the sentence condition. This pattern of results indicates that the neural representation of sentences is more abstracted away from the neural response that is driven by the physicality of the stimulus.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001713

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/