Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects

Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides tempor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of neuroscience 2021-06, Vol.41 (23), p.4991-5003
Hauptverfasser: O'Sullivan, Aisling E, Crosse, Michael J, Liberto, Giovanni M Di, de Cheveigné, Alain, Lalor, Edmund C
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5003
container_issue 23
container_start_page 4991
container_title The Journal of neuroscience
container_volume 41
creator O'Sullivan, Aisling E
Crosse, Michael J
Liberto, Giovanni M Di
de Cheveigné, Alain
Lalor, Edmund C
description Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy. During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
doi_str_mv 10.1523/JNEUROSCI.0906-20.2021
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8197638</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2509606194</sourcerecordid><originalsourceid>FETCH-LOGICAL-c442t-db0e3b6093e88d450a8c3a5ddd376515f34c9d6cf3c647f5869beaa7ce0647043</originalsourceid><addsrcrecordid>eNpdkV9v0zAUxS0EYmXwFaZIvPCScv03yQvSVHWsaGxoY8-Wa9-0ntK42EmlfnscbVTAk2Wfc8_10Y-QCwpzKhn__O12-Xh_97BYzaEBVTKYM2D0FZlltSmZAPqazIBVUCpRiTPyLqUnAKiAVm_JGec1E7SBGYm3OMaw3x6TD13YeGu6YtU7bzEVoS0uR-fDwacxPz_sEe22-BFDFpPvN8U9HjALprj2GE202-M0833sBp-wTyEec9aAm2gGH_pi2bZoh_SevGlNl_DDy3lOHq-WPxfX5c3d19Xi8qa0QrChdGtAvlbQcKxrJySY2nIjnXO8UpLKlgvbOGVbbnPFVtaqWaMxlUXIdxD8nHx5zt2P6x06i_0QTaf30e9MPOpgvP5X6f1Wb8JB17SpFK9zwKeXgBh-jZgGvfPJYteZHsOYNJPQKFC0mXZ9_M_6FMbY53rZJWk1UZHZpZ5dNoaUIranz1DQE1Z9wqonrJqBnrDmwYu_q5zG_nDkvwHa6KH1</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2551715295</pqid></control><display><type>article</type><title>Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects</title><source>MEDLINE</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>O'Sullivan, Aisling E ; Crosse, Michael J ; Liberto, Giovanni M Di ; de Cheveigné, Alain ; Lalor, Edmund C</creator><creatorcontrib>O'Sullivan, Aisling E ; Crosse, Michael J ; Liberto, Giovanni M Di ; de Cheveigné, Alain ; Lalor, Edmund C</creatorcontrib><description>Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy. During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.</description><identifier>ISSN: 0270-6474</identifier><identifier>EISSN: 1529-2401</identifier><identifier>DOI: 10.1523/JNEUROSCI.0906-20.2021</identifier><identifier>PMID: 33824190</identifier><language>eng</language><publisher>United States: Society for Neuroscience</publisher><subject>Acoustic Stimulation ; Articulatory phonetics ; Audio visual equipment ; Brain - physiology ; Brain Mapping ; Comprehension - physiology ; Correlation analysis ; Cortex (auditory) ; Cortex (temporal) ; Cues ; EEG ; Electroencephalography ; Female ; Humans ; Linguistic units ; Listening ; Listening comprehension ; Male ; Pattern recognition ; Phonemes ; Phonetic features ; Phonetics ; Photic Stimulation ; Sensory integration ; Spectrograms ; Speech ; Speech perception ; Speech Perception - physiology ; Speech processing ; Syllables ; Visual Perception - physiology ; Visual stimuli</subject><ispartof>The Journal of neuroscience, 2021-06, Vol.41 (23), p.4991-5003</ispartof><rights>Copyright © 2021 the authors.</rights><rights>Copyright Society for Neuroscience Jun 9, 2021</rights><rights>Copyright © 2021 the authors 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c442t-db0e3b6093e88d450a8c3a5ddd376515f34c9d6cf3c647f5869beaa7ce0647043</citedby><orcidid>0000-0002-8282-8973 ; 0000-0002-4725-8864</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8197638/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8197638/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27922,27923,53789,53791</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33824190$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>O'Sullivan, Aisling E</creatorcontrib><creatorcontrib>Crosse, Michael J</creatorcontrib><creatorcontrib>Liberto, Giovanni M Di</creatorcontrib><creatorcontrib>de Cheveigné, Alain</creatorcontrib><creatorcontrib>Lalor, Edmund C</creatorcontrib><title>Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects</title><title>The Journal of neuroscience</title><addtitle>J Neurosci</addtitle><description>Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy. During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.</description><subject>Acoustic Stimulation</subject><subject>Articulatory phonetics</subject><subject>Audio visual equipment</subject><subject>Brain - physiology</subject><subject>Brain Mapping</subject><subject>Comprehension - physiology</subject><subject>Correlation analysis</subject><subject>Cortex (auditory)</subject><subject>Cortex (temporal)</subject><subject>Cues</subject><subject>EEG</subject><subject>Electroencephalography</subject><subject>Female</subject><subject>Humans</subject><subject>Linguistic units</subject><subject>Listening</subject><subject>Listening comprehension</subject><subject>Male</subject><subject>Pattern recognition</subject><subject>Phonemes</subject><subject>Phonetic features</subject><subject>Phonetics</subject><subject>Photic Stimulation</subject><subject>Sensory integration</subject><subject>Spectrograms</subject><subject>Speech</subject><subject>Speech perception</subject><subject>Speech Perception - physiology</subject><subject>Speech processing</subject><subject>Syllables</subject><subject>Visual Perception - physiology</subject><subject>Visual stimuli</subject><issn>0270-6474</issn><issn>1529-2401</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkV9v0zAUxS0EYmXwFaZIvPCScv03yQvSVHWsaGxoY8-Wa9-0ntK42EmlfnscbVTAk2Wfc8_10Y-QCwpzKhn__O12-Xh_97BYzaEBVTKYM2D0FZlltSmZAPqazIBVUCpRiTPyLqUnAKiAVm_JGec1E7SBGYm3OMaw3x6TD13YeGu6YtU7bzEVoS0uR-fDwacxPz_sEe22-BFDFpPvN8U9HjALprj2GE202-M0833sBp-wTyEec9aAm2gGH_pi2bZoh_SevGlNl_DDy3lOHq-WPxfX5c3d19Xi8qa0QrChdGtAvlbQcKxrJySY2nIjnXO8UpLKlgvbOGVbbnPFVtaqWaMxlUXIdxD8nHx5zt2P6x06i_0QTaf30e9MPOpgvP5X6f1Wb8JB17SpFK9zwKeXgBh-jZgGvfPJYteZHsOYNJPQKFC0mXZ9_M_6FMbY53rZJWk1UZHZpZ5dNoaUIranz1DQE1Z9wqonrJqBnrDmwYu_q5zG_nDkvwHa6KH1</recordid><startdate>20210609</startdate><enddate>20210609</enddate><creator>O'Sullivan, Aisling E</creator><creator>Crosse, Michael J</creator><creator>Liberto, Giovanni M Di</creator><creator>de Cheveigné, Alain</creator><creator>Lalor, Edmund C</creator><general>Society for Neuroscience</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QR</scope><scope>7T9</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>P64</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-8282-8973</orcidid><orcidid>https://orcid.org/0000-0002-4725-8864</orcidid></search><sort><creationdate>20210609</creationdate><title>Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects</title><author>O'Sullivan, Aisling E ; Crosse, Michael J ; Liberto, Giovanni M Di ; de Cheveigné, Alain ; Lalor, Edmund C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c442t-db0e3b6093e88d450a8c3a5ddd376515f34c9d6cf3c647f5869beaa7ce0647043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acoustic Stimulation</topic><topic>Articulatory phonetics</topic><topic>Audio visual equipment</topic><topic>Brain - physiology</topic><topic>Brain Mapping</topic><topic>Comprehension - physiology</topic><topic>Correlation analysis</topic><topic>Cortex (auditory)</topic><topic>Cortex (temporal)</topic><topic>Cues</topic><topic>EEG</topic><topic>Electroencephalography</topic><topic>Female</topic><topic>Humans</topic><topic>Linguistic units</topic><topic>Listening</topic><topic>Listening comprehension</topic><topic>Male</topic><topic>Pattern recognition</topic><topic>Phonemes</topic><topic>Phonetic features</topic><topic>Phonetics</topic><topic>Photic Stimulation</topic><topic>Sensory integration</topic><topic>Spectrograms</topic><topic>Speech</topic><topic>Speech perception</topic><topic>Speech Perception - physiology</topic><topic>Speech processing</topic><topic>Syllables</topic><topic>Visual Perception - physiology</topic><topic>Visual stimuli</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>O'Sullivan, Aisling E</creatorcontrib><creatorcontrib>Crosse, Michael J</creatorcontrib><creatorcontrib>Liberto, Giovanni M Di</creatorcontrib><creatorcontrib>de Cheveigné, Alain</creatorcontrib><creatorcontrib>Lalor, Edmund C</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>The Journal of neuroscience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>O'Sullivan, Aisling E</au><au>Crosse, Michael J</au><au>Liberto, Giovanni M Di</au><au>de Cheveigné, Alain</au><au>Lalor, Edmund C</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects</atitle><jtitle>The Journal of neuroscience</jtitle><addtitle>J Neurosci</addtitle><date>2021-06-09</date><risdate>2021</risdate><volume>41</volume><issue>23</issue><spage>4991</spage><epage>5003</epage><pages>4991-5003</pages><issn>0270-6474</issn><eissn>1529-2401</eissn><abstract>Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy. During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.</abstract><cop>United States</cop><pub>Society for Neuroscience</pub><pmid>33824190</pmid><doi>10.1523/JNEUROSCI.0906-20.2021</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-8282-8973</orcidid><orcidid>https://orcid.org/0000-0002-4725-8864</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0270-6474
ispartof The Journal of neuroscience, 2021-06, Vol.41 (23), p.4991-5003
issn 0270-6474
1529-2401
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8197638
source MEDLINE; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Acoustic Stimulation
Articulatory phonetics
Audio visual equipment
Brain - physiology
Brain Mapping
Comprehension - physiology
Correlation analysis
Cortex (auditory)
Cortex (temporal)
Cues
EEG
Electroencephalography
Female
Humans
Linguistic units
Listening
Listening comprehension
Male
Pattern recognition
Phonemes
Phonetic features
Phonetics
Photic Stimulation
Sensory integration
Spectrograms
Speech
Speech perception
Speech Perception - physiology
Speech processing
Syllables
Visual Perception - physiology
Visual stimuli
title Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A39%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Neurophysiological%20Indices%20of%20Audiovisual%20Speech%20Processing%20Reveal%20a%20Hierarchy%20of%20Multisensory%20Integration%20Effects&rft.jtitle=The%20Journal%20of%20neuroscience&rft.au=O'Sullivan,%20Aisling%20E&rft.date=2021-06-09&rft.volume=41&rft.issue=23&rft.spage=4991&rft.epage=5003&rft.pages=4991-5003&rft.issn=0270-6474&rft.eissn=1529-2401&rft_id=info:doi/10.1523/JNEUROSCI.0906-20.2021&rft_dat=%3Cproquest_pubme%3E2509606194%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2551715295&rft_id=info:pmid/33824190&rfr_iscdi=true