Explaining word embeddings with perfect fidelity: Case study in research impact prediction

Best performing approaches for scholarly document quality prediction are based on embedding models, which do not allow direct explanation of classifiers as distinct words no longer correspond to the input features for model training. Although model-agnostic explanation methods such as Local interpre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Dvorackova, Lucie, Joachimiak, Marcin P, Cerny, Michal, Kubecova, Adriana, Sklenak, Vilem, Kliegr, Tomas
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Dvorackova, Lucie Joachimiak, Marcin P Cerny, Michal Kubecova, Adriana Sklenak, Vilem Kliegr, Tomas
description	Best performing approaches for scholarly document quality prediction are based on embedding models, which do not allow direct explanation of classifiers as distinct words no longer correspond to the input features for model training. Although model-agnostic explanation methods such as Local interpretable model-agnostic explanations (LIME) can be applied, these produce results with questionable correspondence to the ML model. We introduce a new feature importance method, Self-model Rated Entities (SMER), for logistic regression-based classification models trained on word embeddings. We show that SMER has theoretically perfect fidelity with the explained model, as its prediction corresponds exactly to the average of predictions for individual words in the text. SMER allows us to reliably determine which words or entities positively contribute to predicting impactful articles. Quantitative and qualitative evaluation is performed through five diverse experiments conducted on 50.000 research papers from the CORD-19 corpus. Through an AOPC curve analysis, we experimentally demonstrate that SMER produces better explanations than LIME for logistic regression.
doi_str_mv	10.48550/arxiv.2409.15912
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_15912</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_15912</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_159123</originalsourceid><addsrcrecordid>eNqFzjEOgkAQheFtLIx6ACvnAiIgJGJLMB7Ayoas7KxMAstmdhW4vUjsrV5e8hefENsoDJJTmoYHyQO9gzgJsyBKsyheinsx2EaSIfOEvmMF2D5Qqek66MnXYJE1Vh40KWzIj2fIpUNw_qVGIAOMDiVXNVBr5dRZRkWVp86sxULLxuHmtyuxuxS3_LqfFaVlaiWP5VdTzprj_-IDBDtA7w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Explaining word embeddings with perfect fidelity: Case study in research impact prediction</title><source>arXiv.org</source><creator>Dvorackova, Lucie ; Joachimiak, Marcin P ; Cerny, Michal ; Kubecova, Adriana ; Sklenak, Vilem ; Kliegr, Tomas</creator><creatorcontrib>Dvorackova, Lucie ; Joachimiak, Marcin P ; Cerny, Michal ; Kubecova, Adriana ; Sklenak, Vilem ; Kliegr, Tomas</creatorcontrib><description>Best performing approaches for scholarly document quality prediction are based on embedding models, which do not allow direct explanation of classifiers as distinct words no longer correspond to the input features for model training. Although model-agnostic explanation methods such as Local interpretable model-agnostic explanations (LIME) can be applied, these produce results with questionable correspondence to the ML model. We introduce a new feature importance method, Self-model Rated Entities (SMER), for logistic regression-based classification models trained on word embeddings. We show that SMER has theoretically perfect fidelity with the explained model, as its prediction corresponds exactly to the average of predictions for individual words in the text. SMER allows us to reliably determine which words or entities positively contribute to predicting impactful articles. Quantitative and qualitative evaluation is performed through five diverse experiments conducted on 50.000 research papers from the CORD-19 corpus. Through an AOPC curve analysis, we experimentally demonstrate that SMER produces better explanations than LIME for logistic regression.</description><identifier>DOI: 10.48550/arxiv.2409.15912</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.15912$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.15912$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Dvorackova, Lucie</creatorcontrib><creatorcontrib>Joachimiak, Marcin P</creatorcontrib><creatorcontrib>Cerny, Michal</creatorcontrib><creatorcontrib>Kubecova, Adriana</creatorcontrib><creatorcontrib>Sklenak, Vilem</creatorcontrib><creatorcontrib>Kliegr, Tomas</creatorcontrib><title>Explaining word embeddings with perfect fidelity: Case study in research impact prediction</title><description>Best performing approaches for scholarly document quality prediction are based on embedding models, which do not allow direct explanation of classifiers as distinct words no longer correspond to the input features for model training. Although model-agnostic explanation methods such as Local interpretable model-agnostic explanations (LIME) can be applied, these produce results with questionable correspondence to the ML model. We introduce a new feature importance method, Self-model Rated Entities (SMER), for logistic regression-based classification models trained on word embeddings. We show that SMER has theoretically perfect fidelity with the explained model, as its prediction corresponds exactly to the average of predictions for individual words in the text. SMER allows us to reliably determine which words or entities positively contribute to predicting impactful articles. Quantitative and qualitative evaluation is performed through five diverse experiments conducted on 50.000 research papers from the CORD-19 corpus. Through an AOPC curve analysis, we experimentally demonstrate that SMER produces better explanations than LIME for logistic regression.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzjEOgkAQheFtLIx6ACvnAiIgJGJLMB7Ayoas7KxMAstmdhW4vUjsrV5e8hefENsoDJJTmoYHyQO9gzgJsyBKsyheinsx2EaSIfOEvmMF2D5Qqek66MnXYJE1Vh40KWzIj2fIpUNw_qVGIAOMDiVXNVBr5dRZRkWVp86sxULLxuHmtyuxuxS3_LqfFaVlaiWP5VdTzprj_-IDBDtA7w</recordid><startdate>20240924</startdate><enddate>20240924</enddate><creator>Dvorackova, Lucie</creator><creator>Joachimiak, Marcin P</creator><creator>Cerny, Michal</creator><creator>Kubecova, Adriana</creator><creator>Sklenak, Vilem</creator><creator>Kliegr, Tomas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240924</creationdate><title>Explaining word embeddings with perfect fidelity: Case study in research impact prediction</title><author>Dvorackova, Lucie ; Joachimiak, Marcin P ; Cerny, Michal ; Kubecova, Adriana ; Sklenak, Vilem ; Kliegr, Tomas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_159123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Dvorackova, Lucie</creatorcontrib><creatorcontrib>Joachimiak, Marcin P</creatorcontrib><creatorcontrib>Cerny, Michal</creatorcontrib><creatorcontrib>Kubecova, Adriana</creatorcontrib><creatorcontrib>Sklenak, Vilem</creatorcontrib><creatorcontrib>Kliegr, Tomas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dvorackova, Lucie</au><au>Joachimiak, Marcin P</au><au>Cerny, Michal</au><au>Kubecova, Adriana</au><au>Sklenak, Vilem</au><au>Kliegr, Tomas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Explaining word embeddings with perfect fidelity: Case study in research impact prediction</atitle><date>2024-09-24</date><risdate>2024</risdate><abstract>Best performing approaches for scholarly document quality prediction are based on embedding models, which do not allow direct explanation of classifiers as distinct words no longer correspond to the input features for model training. Although model-agnostic explanation methods such as Local interpretable model-agnostic explanations (LIME) can be applied, these produce results with questionable correspondence to the ML model. We introduce a new feature importance method, Self-model Rated Entities (SMER), for logistic regression-based classification models trained on word embeddings. We show that SMER has theoretically perfect fidelity with the explained model, as its prediction corresponds exactly to the average of predictions for individual words in the text. SMER allows us to reliably determine which words or entities positively contribute to predicting impactful articles. Quantitative and qualitative evaluation is performed through five diverse experiments conducted on 50.000 research papers from the CORD-19 corpus. Through an AOPC curve analysis, we experimentally demonstrate that SMER produces better explanations than LIME for logistic regression.</abstract><doi>10.48550/arxiv.2409.15912</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2409.15912
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2409_15912
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Explaining word embeddings with perfect fidelity: Case study in research impact prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T22%3A26%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Explaining%20word%20embeddings%20with%20perfect%20fidelity:%20Case%20study%20in%20research%20impact%20prediction&rft.au=Dvorackova,%20Lucie&rft.date=2024-09-24&rft_id=info:doi/10.48550/arxiv.2409.15912&rft_dat=%3Carxiv_GOX%3E2409_15912%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true