Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision

Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning al...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on affective computing 2023-04, Vol.14 (2), p.969-985
Hauptverfasser: Hemamou, Leo, Guillon, Arthur, Martin, Jean-Claude, Clavel, Chloe
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 985
container_issue 2
container_start_page 969
container_title IEEE transactions on affective computing
container_volume 14
creator Hemamou, Leo
Guillon, Arthur
Martin, Jean-Claude
Clavel, Chloe
description Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.
doi_str_mv 10.1109/TAFFC.2021.3113159
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2821067426</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9540240</ieee_id><sourcerecordid>2821067426</sourcerecordid><originalsourceid>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</originalsourceid><addsrcrecordid>eNpNUU1LKzEUHURBUf-AbgIuxEX7bnIzMx13tdpXoSpIwWVIM3dstJ3UJNOH_970VcS7uR-cc7iHk2VnHPqcQ_VnNhyPR30BgveRc-R5tZcd8UpWPQSZ7_-aD7PTEN4gFSIWojzK4kO3jHblar1kE0tee7OwJi3DGKmN1rXskTqfDo8U_zn_fs2mzr3b9pU1zrORbmtb60iB3dBCb6zrPHtJCgt2v1prE9kzGd_ZSP4ysFsyNiTJk-yg0ctAp9_9OJuN72ajSW_69Pd-NJz2DIoq9qguQXLYWqqxRNKDYi7Kco6maBCbRiAvtJ5rlDI3wEHXA8jrOhckGygIj7OrnexCL9Xa25X2n8ppqybDqdreQIqyAIQNT9iLHXbt3UdHIaq3ZKVN3ykxEByKUooiocQOZbwLwVPzI8tBbbNQ_7NQ2yzUdxaJdL4jWSL6IVS5BCEBvwDCy4S0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2821067426</pqid></control><display><type>article</type><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><source>IEEE Electronic Library (IEL)</source><creator>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</creator><creatorcontrib>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</creatorcontrib><description>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</description><identifier>ISSN: 1949-3045</identifier><identifier>EISSN: 1949-3045</identifier><identifier>DOI: 10.1109/TAFFC.2021.3113159</identifier><identifier>CODEN: ITACBQ</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Artificial neural networks ; Computer Science ; Decision analysis ; Deep learning ; employment ; Face recognition ; Feature extraction ; human resources ; Impact analysis ; Industrial research ; interpretability ; Interviews ; job interviews ; Machine learning ; multimodal systems ; neural nets ; Neural networks ; Nonverbal signals ; Performance prediction ; Questions ; Video ; Visualization</subject><ispartof>IEEE transactions on affective computing, 2023-04, Vol.14 (2), p.969-985</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</citedby><cites>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</cites><orcidid>0000-0001-8763-9453 ; 0000-0003-4850-3398 ; 0000-0002-7157-0727</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9540240$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9540240$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://hal.science/hal-04276030$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Hemamou, Leo</creatorcontrib><creatorcontrib>Guillon, Arthur</creatorcontrib><creatorcontrib>Martin, Jean-Claude</creatorcontrib><creatorcontrib>Clavel, Chloe</creatorcontrib><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><title>IEEE transactions on affective computing</title><addtitle>TAFFC</addtitle><description>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Science</subject><subject>Decision analysis</subject><subject>Deep learning</subject><subject>employment</subject><subject>Face recognition</subject><subject>Feature extraction</subject><subject>human resources</subject><subject>Impact analysis</subject><subject>Industrial research</subject><subject>interpretability</subject><subject>Interviews</subject><subject>job interviews</subject><subject>Machine learning</subject><subject>multimodal systems</subject><subject>neural nets</subject><subject>Neural networks</subject><subject>Nonverbal signals</subject><subject>Performance prediction</subject><subject>Questions</subject><subject>Video</subject><subject>Visualization</subject><issn>1949-3045</issn><issn>1949-3045</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUU1LKzEUHURBUf-AbgIuxEX7bnIzMx13tdpXoSpIwWVIM3dstJ3UJNOH_970VcS7uR-cc7iHk2VnHPqcQ_VnNhyPR30BgveRc-R5tZcd8UpWPQSZ7_-aD7PTEN4gFSIWojzK4kO3jHblar1kE0tee7OwJi3DGKmN1rXskTqfDo8U_zn_fs2mzr3b9pU1zrORbmtb60iB3dBCb6zrPHtJCgt2v1prE9kzGd_ZSP4ysFsyNiTJk-yg0ctAp9_9OJuN72ajSW_69Pd-NJz2DIoq9qguQXLYWqqxRNKDYi7Kco6maBCbRiAvtJ5rlDI3wEHXA8jrOhckGygIj7OrnexCL9Xa25X2n8ppqybDqdreQIqyAIQNT9iLHXbt3UdHIaq3ZKVN3ykxEByKUooiocQOZbwLwVPzI8tBbbNQ_7NQ2yzUdxaJdL4jWSL6IVS5BCEBvwDCy4S0</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Hemamou, Leo</creator><creator>Guillon, Arthur</creator><creator>Martin, Jean-Claude</creator><creator>Clavel, Chloe</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0001-8763-9453</orcidid><orcidid>https://orcid.org/0000-0003-4850-3398</orcidid><orcidid>https://orcid.org/0000-0002-7157-0727</orcidid></search><sort><creationdate>20230401</creationdate><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><author>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Science</topic><topic>Decision analysis</topic><topic>Deep learning</topic><topic>employment</topic><topic>Face recognition</topic><topic>Feature extraction</topic><topic>human resources</topic><topic>Impact analysis</topic><topic>Industrial research</topic><topic>interpretability</topic><topic>Interviews</topic><topic>job interviews</topic><topic>Machine learning</topic><topic>multimodal systems</topic><topic>neural nets</topic><topic>Neural networks</topic><topic>Nonverbal signals</topic><topic>Performance prediction</topic><topic>Questions</topic><topic>Video</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hemamou, Leo</creatorcontrib><creatorcontrib>Guillon, Arthur</creatorcontrib><creatorcontrib>Martin, Jean-Claude</creatorcontrib><creatorcontrib>Clavel, Chloe</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>IEEE transactions on affective computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hemamou, Leo</au><au>Guillon, Arthur</au><au>Martin, Jean-Claude</au><au>Clavel, Chloe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</atitle><jtitle>IEEE transactions on affective computing</jtitle><stitle>TAFFC</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>14</volume><issue>2</issue><spage>969</spage><epage>985</epage><pages>969-985</pages><issn>1949-3045</issn><eissn>1949-3045</eissn><coden>ITACBQ</coden><abstract>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TAFFC.2021.3113159</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-8763-9453</orcidid><orcidid>https://orcid.org/0000-0003-4850-3398</orcidid><orcidid>https://orcid.org/0000-0002-7157-0727</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1949-3045
ispartof IEEE transactions on affective computing, 2023-04, Vol.14 (2), p.969-985
issn 1949-3045
1949-3045
language eng
recordid cdi_proquest_journals_2821067426
source IEEE Electronic Library (IEL)
subjects Algorithms
Artificial Intelligence
Artificial neural networks
Computer Science
Decision analysis
Deep learning
employment
Face recognition
Feature extraction
human resources
Impact analysis
Industrial research
interpretability
Interviews
job interviews
Machine learning
multimodal systems
neural nets
Neural networks
Nonverbal signals
Performance prediction
Questions
Video
Visualization
title Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T12%3A43%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Hierarchical%20Attention%20Neural%20Network:%20Looking%20for%20Candidates%20Behaviour%20Which%20Impact%20Recruiter's%20Decision&rft.jtitle=IEEE%20transactions%20on%20affective%20computing&rft.au=Hemamou,%20Leo&rft.date=2023-04-01&rft.volume=14&rft.issue=2&rft.spage=969&rft.epage=985&rft.pages=969-985&rft.issn=1949-3045&rft.eissn=1949-3045&rft.coden=ITACBQ&rft_id=info:doi/10.1109/TAFFC.2021.3113159&rft_dat=%3Cproquest_RIE%3E2821067426%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2821067426&rft_id=info:pmid/&rft_ieee_id=9540240&rfr_iscdi=true