Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision

Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning al...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on affective computing 2023-04, Vol.14 (2), p.969-985
Hauptverfasser:	Hemamou, Leo, Guillon, Arthur, Martin, Jean-Claude, Clavel, Chloe
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Artificial neural networks Computer Science Decision analysis Deep learning employment Face recognition Feature extraction human resources Impact analysis Industrial research interpretability Interviews job interviews Machine learning multimodal systems neural nets Neural networks Nonverbal signals Performance prediction Questions Video Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	985
container_issue	2
container_start_page	969
container_title	IEEE transactions on affective computing
container_volume	14
creator	Hemamou, Leo Guillon, Arthur Martin, Jean-Claude Clavel, Chloe
description	Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.
doi_str_mv	10.1109/TAFFC.2021.3113159
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2821067426</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9540240</ieee_id><sourcerecordid>2821067426</sourcerecordid><originalsourceid>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</originalsourceid><addsrcrecordid>eNpNUU1LKzEUHURBUf-AbgIuxEX7bnIzMx13tdpXoSpIwWVIM3dstJ3UJNOH_970VcS7uR-cc7iHk2VnHPqcQ_VnNhyPR30BgveRc-R5tZcd8UpWPQSZ7_-aD7PTEN4gFSIWojzK4kO3jHblar1kE0tee7OwJi3DGKmN1rXskTqfDo8U_zn_fs2mzr3b9pU1zrORbmtb60iB3dBCb6zrPHtJCgt2v1prE9kzGd_ZSP4ysFsyNiTJk-yg0ctAp9_9OJuN72ajSW_69Pd-NJz2DIoq9qguQXLYWqqxRNKDYi7Kco6maBCbRiAvtJ5rlDI3wEHXA8jrOhckGygIj7OrnexCL9Xa25X2n8ppqybDqdreQIqyAIQNT9iLHXbt3UdHIaq3ZKVN3ykxEByKUooiocQOZbwLwVPzI8tBbbNQ_7NQ2yzUdxaJdL4jWSL6IVS5BCEBvwDCy4S0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2821067426</pqid></control><display><type>article</type><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><source>IEEE Electronic Library (IEL)</source><creator>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</creator><creatorcontrib>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</creatorcontrib><description>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</description><identifier>ISSN: 1949-3045</identifier><identifier>EISSN: 1949-3045</identifier><identifier>DOI: 10.1109/TAFFC.2021.3113159</identifier><identifier>CODEN: ITACBQ</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Artificial neural networks ; Computer Science ; Decision analysis ; Deep learning ; employment ; Face recognition ; Feature extraction ; human resources ; Impact analysis ; Industrial research ; interpretability ; Interviews ; job interviews ; Machine learning ; multimodal systems ; neural nets ; Neural networks ; Nonverbal signals ; Performance prediction ; Questions ; Video ; Visualization</subject><ispartof>IEEE transactions on affective computing, 2023-04, Vol.14 (2), p.969-985</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</citedby><cites>FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</cites><orcidid>0000-0001-8763-9453 ; 0000-0003-4850-3398 ; 0000-0002-7157-0727</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9540240$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9540240$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://hal.science/hal-04276030$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Hemamou, Leo</creatorcontrib><creatorcontrib>Guillon, Arthur</creatorcontrib><creatorcontrib>Martin, Jean-Claude</creatorcontrib><creatorcontrib>Clavel, Chloe</creatorcontrib><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><title>IEEE transactions on affective computing</title><addtitle>TAFFC</addtitle><description>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Science</subject><subject>Decision analysis</subject><subject>Deep learning</subject><subject>employment</subject><subject>Face recognition</subject><subject>Feature extraction</subject><subject>human resources</subject><subject>Impact analysis</subject><subject>Industrial research</subject><subject>interpretability</subject><subject>Interviews</subject><subject>job interviews</subject><subject>Machine learning</subject><subject>multimodal systems</subject><subject>neural nets</subject><subject>Neural networks</subject><subject>Nonverbal signals</subject><subject>Performance prediction</subject><subject>Questions</subject><subject>Video</subject><subject>Visualization</subject><issn>1949-3045</issn><issn>1949-3045</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUU1LKzEUHURBUf-AbgIuxEX7bnIzMx13tdpXoSpIwWVIM3dstJ3UJNOH_970VcS7uR-cc7iHk2VnHPqcQ_VnNhyPR30BgveRc-R5tZcd8UpWPQSZ7_-aD7PTEN4gFSIWojzK4kO3jHblar1kE0tee7OwJi3DGKmN1rXskTqfDo8U_zn_fs2mzr3b9pU1zrORbmtb60iB3dBCb6zrPHtJCgt2v1prE9kzGd_ZSP4ysFsyNiTJk-yg0ctAp9_9OJuN72ajSW_69Pd-NJz2DIoq9qguQXLYWqqxRNKDYi7Kco6maBCbRiAvtJ5rlDI3wEHXA8jrOhckGygIj7OrnexCL9Xa25X2n8ppqybDqdreQIqyAIQNT9iLHXbt3UdHIaq3ZKVN3ykxEByKUooiocQOZbwLwVPzI8tBbbNQ_7NQ2yzUdxaJdL4jWSL6IVS5BCEBvwDCy4S0</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Hemamou, Leo</creator><creator>Guillon, Arthur</creator><creator>Martin, Jean-Claude</creator><creator>Clavel, Chloe</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0001-8763-9453</orcidid><orcidid>https://orcid.org/0000-0003-4850-3398</orcidid><orcidid>https://orcid.org/0000-0002-7157-0727</orcidid></search><sort><creationdate>20230401</creationdate><title>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</title><author>Hemamou, Leo ; Guillon, Arthur ; Martin, Jean-Claude ; Clavel, Chloe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c329t-ed704101131d373ea86b277b3c6f33ff2316aaba3445c010ad805dd52e4f06e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Science</topic><topic>Decision analysis</topic><topic>Deep learning</topic><topic>employment</topic><topic>Face recognition</topic><topic>Feature extraction</topic><topic>human resources</topic><topic>Impact analysis</topic><topic>Industrial research</topic><topic>interpretability</topic><topic>Interviews</topic><topic>job interviews</topic><topic>Machine learning</topic><topic>multimodal systems</topic><topic>neural nets</topic><topic>Neural networks</topic><topic>Nonverbal signals</topic><topic>Performance prediction</topic><topic>Questions</topic><topic>Video</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hemamou, Leo</creatorcontrib><creatorcontrib>Guillon, Arthur</creatorcontrib><creatorcontrib>Martin, Jean-Claude</creatorcontrib><creatorcontrib>Clavel, Chloe</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>IEEE transactions on affective computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hemamou, Leo</au><au>Guillon, Arthur</au><au>Martin, Jean-Claude</au><au>Clavel, Chloe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision</atitle><jtitle>IEEE transactions on affective computing</jtitle><stitle>TAFFC</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>14</volume><issue>2</issue><spage>969</spage><epage>985</epage><pages>969-985</pages><issn>1949-3045</issn><eissn>1949-3045</eissn><coden>ITACBQ</coden><abstract>Automatic analysis of job interviews has gained in interest amongst academic and industrial research. The particular case of asynchronous video interviews allows to collect vast corpora of videos where candidates answer standardized questions in monologue videos, enabling the use of deep learning algorithms. On the other hand, state-of-the-art approaches still face some obstacles, among which the fusion of information from multiple modalities and the interpretability of the predictions. We study the task of predicting candidates performance in asynchronous video interviews using three modalities (verbal content, prosody and facial expressions) independently or simultaneously, using data from real interviews which take place in real conditions. We propose a sequential and multimodal deep neural network model, called Multimodal HireNet. We compare this model to state-of-the-art approaches and show a clear improvement of the performance. Moreover, the architecture we propose is based on attention mechanism, which provides interpretability about which questions, moments and modalities contribute the most to the output of the network. While other deep learning systems use attention mechanisms to offer a visualization of moments with attention values, the proposed methodology enables an in-depth interpretation of the predictions by an overall analysis of the features of social signals contained in these moments.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TAFFC.2021.3113159</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-8763-9453</orcidid><orcidid>https://orcid.org/0000-0003-4850-3398</orcidid><orcidid>https://orcid.org/0000-0002-7157-0727</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1949-3045
ispartof	IEEE transactions on affective computing, 2023-04, Vol.14 (2), p.969-985
issn	1949-3045 1949-3045
language	eng
recordid	cdi_proquest_journals_2821067426
source	IEEE Electronic Library (IEL)
subjects	Algorithms Artificial Intelligence Artificial neural networks Computer Science Decision analysis Deep learning employment Face recognition Feature extraction human resources Impact analysis Industrial research interpretability Interviews job interviews Machine learning multimodal systems neural nets Neural networks Nonverbal signals Performance prediction Questions Video Visualization
title	Multimodal Hierarchical Attention Neural Network: Looking for Candidates Behaviour Which Impact Recruiter's Decision
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T12%3A43%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Hierarchical%20Attention%20Neural%20Network:%20Looking%20for%20Candidates%20Behaviour%20Which%20Impact%20Recruiter's%20Decision&rft.jtitle=IEEE%20transactions%20on%20affective%20computing&rft.au=Hemamou,%20Leo&rft.date=2023-04-01&rft.volume=14&rft.issue=2&rft.spage=969&rft.epage=985&rft.pages=969-985&rft.issn=1949-3045&rft.eissn=1949-3045&rft.coden=ITACBQ&rft_id=info:doi/10.1109/TAFFC.2021.3113159&rft_dat=%3Cproquest_RIE%3E2821067426%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2821067426&rft_id=info:pmid/&rft_ieee_id=9540240&rfr_iscdi=true