Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision
Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning al...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on affective computing 2023-04, Vol.14 (2), p.969-985 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 985 |
---|---|
container_issue | 2 |
container_start_page | 969 |
container_title | IEEE transactions on affective computing |
container_volume | 14 |
creator | Hemamou, Leo Guillon, Arthur Martin, Jean-Claude Clavel, Chloe |
description | Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments. |
doi_str_mv | 10.1109/TAFFC.2021.3113159 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2821067426</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9540240</ieee_id><sourcerecordid>2821067426</sourcerecordid><originalsourceid>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</originalsourceid><addsrcrecordid>eNpNUU1LKzEUHURBUf-AbgIuxEX7bnIzMx13tdpXoSpIwWVIM3dstJ3UJNOH_970VcS7uR-cc7iHk2VnHPqcQ_VnNhyPR30BgveRc-R5tZcd8UpWPQSZ7_-aD7PTEN4gFSIWojzK4kO3jHblar1kE0tee7OwJi3DGKmN1rXskTqfDo8U_zn_fs2mzr3b9pU1zrORbmtb60iB3dBCb6zrPHtJCgt2v1prE9kzGd_ZSP4ysFsyNiTJk-yg0ctAp9_9OJuN72ajSW_69Pd-NJz2DIoq9qguQXLYWqqxRNKDYi7Kco6maBCbRiAvtJ5rlDI3wEHXA8jrOhckGygIj7OrnexCL9Xa25X2n8ppqybDqdreQIqyAIQNT9iLHXbt3UdHIaq3ZKVN3ykxEByKUooiocQOZbwLwVPzI8tBbbNQ_7NQ2yzUdxaJdL4jWSL6IVS5BCEBvwDCy4S0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2821067426</pqid></control><display><type>article</type><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><source>IEEE Electronic Library (IEL)</source><creator>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</creator><creatorcontrib>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</creatorcontrib><description>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</description><identifier>ISSN: 1949-3045</identifier><identifier>EISSN: 1949-3045</identifier><identifier>DOI: 10.1109/TAFFC.2021.3113159</identifier><identifier>CODEN: ITACBQ</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Artificial neural networks ; Computer Science ; Decision analysis ; Deep learning ; employment ; Face recognition ; Feature extraction ; human resources ; Impact analysis ; Industrial research ; interpretability ; Interviews ; job interviews ; Machine learning ; multimodal systems ; neural nets ; Neural networks ; Nonverbal signals ; Performance prediction ; Questions ; Video ; Visualization</subject><ispartof>IEEE transactions on affective computing, 2023-04, Vol.14 (2), p.969-985</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</citedby><cites>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</cites><orcidid>0000-0001-8763-9453 ; 0000-0003-4850-3398 ; 0000-0002-7157-0727</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9540240$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9540240$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://hal.science/hal-04276030$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Hemamou, Leo</creatorcontrib><creatorcontrib>Guillon, Arthur</creatorcontrib><creatorcontrib>Martin, Jean-Claude</creatorcontrib><creatorcontrib>Clavel, Chloe</creatorcontrib><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><title>IEEE transactions on affective computing</title><addtitle>TAFFC</addtitle><description>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Science</subject><subject>Decision analysis</subject><subject>Deep learning</subject><subject>employment</subject><subject>Face recognition</subject><subject>Feature extraction</subject><subject>human resources</subject><subject>Impact analysis</subject><subject>Industrial research</subject><subject>interpretability</subject><subject>Interviews</subject><subject>job interviews</subject><subject>Machine learning</subject><subject>multimodal systems</subject><subject>neural nets</subject><subject>Neural networks</subject><subject>Nonverbal signals</subject><subject>Performance prediction</subject><subject>Questions</subject><subject>Video</subject><subject>Visualization</subject><issn>1949-3045</issn><issn>1949-3045</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUU1LKzEUHURBUf-AbgIuxEX7bnIzMx13tdpXoSpIwWVIM3dstJ3UJNOH_970VcS7uR-cc7iHk2VnHPqcQ_VnNhyPR30BgveRc-R5tZcd8UpWPQSZ7_-aD7PTEN4gFSIWojzK4kO3jHblar1kE0tee7OwJi3DGKmN1rXskTqfDo8U_zn_fs2mzr3b9pU1zrORbmtb60iB3dBCb6zrPHtJCgt2v1prE9kzGd_ZSP4ysFsyNiTJk-yg0ctAp9_9OJuN72ajSW_69Pd-NJz2DIoq9qguQXLYWqqxRNKDYi7Kco6maBCbRiAvtJ5rlDI3wEHXA8jrOhckGygIj7OrnexCL9Xa25X2n8ppqybDqdreQIqyAIQNT9iLHXbt3UdHIaq3ZKVN3ykxEByKUooiocQOZbwLwVPzI8tBbbNQ_7NQ2yzUdxaJdL4jWSL6IVS5BCEBvwDCy4S0</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Hemamou, Leo</creator><creator>Guillon, Arthur</creator><creator>Martin, Jean-Claude</creator><creator>Clavel, Chloe</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0001-8763-9453</orcidid><orcidid>https://orcid.org/0000-0003-4850-3398</orcidid><orcidid>https://orcid.org/0000-0002-7157-0727</orcidid></search><sort><creationdate>20230401</creationdate><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><author>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Science</topic><topic>Decision analysis</topic><topic>Deep learning</topic><topic>employment</topic><topic>Face recognition</topic><topic>Feature extraction</topic><topic>human resources</topic><topic>Impact analysis</topic><topic>Industrial research</topic><topic>interpretability</topic><topic>Interviews</topic><topic>job interviews</topic><topic>Machine learning</topic><topic>multimodal systems</topic><topic>neural nets</topic><topic>Neural networks</topic><topic>Nonverbal signals</topic><topic>Performance prediction</topic><topic>Questions</topic><topic>Video</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hemamou, Leo</creatorcontrib><creatorcontrib>Guillon, Arthur</creatorcontrib><creatorcontrib>Martin, Jean-Claude</creatorcontrib><creatorcontrib>Clavel, Chloe</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>IEEE transactions on affective computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hemamou, Leo</au><au>Guillon, Arthur</au><au>Martin, Jean-Claude</au><au>Clavel, Chloe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</atitle><jtitle>IEEE transactions on affective computing</jtitle><stitle>TAFFC</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>14</volume><issue>2</issue><spage>969</spage><epage>985</epage><pages>969-985</pages><issn>1949-3045</issn><eissn>1949-3045</eissn><coden>ITACBQ</coden><abstract>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TAFFC.2021.3113159</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-8763-9453</orcidid><orcidid>https://orcid.org/0000-0003-4850-3398</orcidid><orcidid>https://orcid.org/0000-0002-7157-0727</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1949-3045 |
ispartof | IEEE transactions on affective computing, 2023-04, Vol.14 (2), p.969-985 |
issn | 1949-3045 1949-3045 |
language | eng |
recordid | cdi_proquest_journals_2821067426 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Artificial Intelligence Artificial neural networks Computer Science Decision analysis Deep learning employment Face recognition Feature extraction human resources Impact analysis Industrial research interpretability Interviews job interviews Machine learning multimodal systems neural nets Neural networks Nonverbal signals Performance prediction Questions Video Visualization |
title | Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T12%3A43%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Hierarchical%20Attention%20Neural%20Network:%20Looking%20for%20Candidates%20Behaviour%20Which%20Impact%20Recruiter's%20Decision&rft.jtitle=IEEE%20transactions%20on%20affective%20computing&rft.au=Hemamou,%20Leo&rft.date=2023-04-01&rft.volume=14&rft.issue=2&rft.spage=969&rft.epage=985&rft.pages=969-985&rft.issn=1949-3045&rft.eissn=1949-3045&rft.coden=ITACBQ&rft_id=info:doi/10.1109/TAFFC.2021.3113159&rft_dat=%3Cproquest_RIE%3E2821067426%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2821067426&rft_id=info:pmid/&rft_ieee_id=9540240&rfr_iscdi=true |