Interpretable Classification of Wiki-Review Streams

Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023-01, Vol.11, p.141137-141151
Hauptverfasser: Garcia-Mendez, Silvia, Leal, Fatima, Malheiro, Benedita, Burguillo-Rial, Juan Carlos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 141151
container_issue
container_start_page 141137
container_title IEEE access
container_volume 11
creator Garcia-Mendez, Silvia
Leal, Fatima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
description Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and {F} -measure).
doi_str_mv 10.1109/ACCESS.2023.3342472
format Article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2904418270</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10356073</ieee_id><doaj_id>oai_doaj_org_article_0400dab2c00144c1a8922d2ad6ff0d18</doaj_id><sourcerecordid>2904418270</sourcerecordid><originalsourceid>FETCH-LOGICAL-c388t-c88996b7fde88a98807b275bae4dc2eb3a7df0341aea566221c974aa173c8cb13</originalsourceid><addsrcrecordid>eNpNkE9Lw0AQxRdRsNR-Aj0EPKfuv2Q3xxKqFgqCVTwuk81EtqbdupsqfntTU6RzmeEx783wI-Sa0SljtLibleV8tZpyysVUCMml4mdkxFlepCIT-fnJfEkmMa5pX7qXMjUiYrHtMOwCdlC1mJQtxOgaZ6Fzfpv4JnlzHy59xi-H38mqCwibeEUuGmgjTo59TF7v5y_lY7p8eliUs2VqhdZdarUuirxSTY1aQ6E1VRVXWQUoa8uxEqDqhgrJACHLc86ZLZQEYEpYbSsmxmQx5NYe1mYX3AbCj_HgzJ_gw7uB0DnboqGS0hoqbillUloGuuC85lDnTUNrpvus2yFrF_znHmNn1n4ftv37hhdUSqa5ov2WGLZs8DEGbP6vMmoOsM0A2xxgmyPs3nUzuBwinjhEllMlxC9gR3mq</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2904418270</pqid></control><display><type>article</type><title>Interpretable Classification of Wiki-Review Streams</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Garcia-Mendez, Silvia ; Leal, Fatima ; Malheiro, Benedita ; Burguillo-Rial, Juan Carlos</creator><creatorcontrib>Garcia-Mendez, Silvia ; Leal, Fatima ; Malheiro, Benedita ; Burguillo-Rial, Juan Carlos</creatorcontrib><description>Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;{F} &lt;/tex-math&gt;&lt;/inline-formula&gt;-measure).</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3342472</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Classification ; Classification algorithms ; Data reliability and fairness ; data-stream processing and classification ; Electronic publishing ; Encyclopedias ; Feature extraction ; Natural language processing ; Real-time systems ; Streams ; Synthetic data ; transparency ; Vandalism ; wikis</subject><ispartof>IEEE access, 2023-01, Vol.11, p.141137-141151</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c388t-c88996b7fde88a98807b275bae4dc2eb3a7df0341aea566221c974aa173c8cb13</cites><orcidid>0000-0001-9083-4292 ; 0000-0003-0533-1303 ; 0000-0003-4418-2590</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10356073$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Garcia-Mendez, Silvia</creatorcontrib><creatorcontrib>Leal, Fatima</creatorcontrib><creatorcontrib>Malheiro, Benedita</creatorcontrib><creatorcontrib>Burguillo-Rial, Juan Carlos</creatorcontrib><title>Interpretable Classification of Wiki-Review Streams</title><title>IEEE access</title><addtitle>Access</addtitle><description>Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;{F} &lt;/tex-math&gt;&lt;/inline-formula&gt;-measure).</description><subject>Algorithms</subject><subject>Classification</subject><subject>Classification algorithms</subject><subject>Data reliability and fairness</subject><subject>data-stream processing and classification</subject><subject>Electronic publishing</subject><subject>Encyclopedias</subject><subject>Feature extraction</subject><subject>Natural language processing</subject><subject>Real-time systems</subject><subject>Streams</subject><subject>Synthetic data</subject><subject>transparency</subject><subject>Vandalism</subject><subject>wikis</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkE9Lw0AQxRdRsNR-Aj0EPKfuv2Q3xxKqFgqCVTwuk81EtqbdupsqfntTU6RzmeEx783wI-Sa0SljtLibleV8tZpyysVUCMml4mdkxFlepCIT-fnJfEkmMa5pX7qXMjUiYrHtMOwCdlC1mJQtxOgaZ6Fzfpv4JnlzHy59xi-H38mqCwibeEUuGmgjTo59TF7v5y_lY7p8eliUs2VqhdZdarUuirxSTY1aQ6E1VRVXWQUoa8uxEqDqhgrJACHLc86ZLZQEYEpYbSsmxmQx5NYe1mYX3AbCj_HgzJ_gw7uB0DnboqGS0hoqbillUloGuuC85lDnTUNrpvus2yFrF_znHmNn1n4ftv37hhdUSqa5ov2WGLZs8DEGbP6vMmoOsM0A2xxgmyPs3nUzuBwinjhEllMlxC9gR3mq</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Garcia-Mendez, Silvia</creator><creator>Leal, Fatima</creator><creator>Malheiro, Benedita</creator><creator>Burguillo-Rial, Juan Carlos</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9083-4292</orcidid><orcidid>https://orcid.org/0000-0003-0533-1303</orcidid><orcidid>https://orcid.org/0000-0003-4418-2590</orcidid></search><sort><creationdate>20230101</creationdate><title>Interpretable Classification of Wiki-Review Streams</title><author>Garcia-Mendez, Silvia ; Leal, Fatima ; Malheiro, Benedita ; Burguillo-Rial, Juan Carlos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c388t-c88996b7fde88a98807b275bae4dc2eb3a7df0341aea566221c974aa173c8cb13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Classification algorithms</topic><topic>Data reliability and fairness</topic><topic>data-stream processing and classification</topic><topic>Electronic publishing</topic><topic>Encyclopedias</topic><topic>Feature extraction</topic><topic>Natural language processing</topic><topic>Real-time systems</topic><topic>Streams</topic><topic>Synthetic data</topic><topic>transparency</topic><topic>Vandalism</topic><topic>wikis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Garcia-Mendez, Silvia</creatorcontrib><creatorcontrib>Leal, Fatima</creatorcontrib><creatorcontrib>Malheiro, Benedita</creatorcontrib><creatorcontrib>Burguillo-Rial, Juan Carlos</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Garcia-Mendez, Silvia</au><au>Leal, Fatima</au><au>Malheiro, Benedita</au><au>Burguillo-Rial, Juan Carlos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interpretable Classification of Wiki-Review Streams</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>11</volume><spage>141137</spage><epage>141151</epage><pages>141137-141151</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;{F} &lt;/tex-math&gt;&lt;/inline-formula&gt;-measure).</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3342472</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-9083-4292</orcidid><orcidid>https://orcid.org/0000-0003-0533-1303</orcidid><orcidid>https://orcid.org/0000-0003-4418-2590</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023-01, Vol.11, p.141137-141151
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2904418270
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Classification
Classification algorithms
Data reliability and fairness
data-stream processing and classification
Electronic publishing
Encyclopedias
Feature extraction
Natural language processing
Real-time systems
Streams
Synthetic data
transparency
Vandalism
wikis
title Interpretable Classification of Wiki-Review Streams
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T11%3A40%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interpretable%20Classification%20of%20Wiki-Review%20Streams&rft.jtitle=IEEE%20access&rft.au=Garcia-Mendez,%20Silvia&rft.date=2023-01-01&rft.volume=11&rft.spage=141137&rft.epage=141151&rft.pages=141137-141151&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3342472&rft_dat=%3Cproquest_doaj_%3E2904418270%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2904418270&rft_id=info:pmid/&rft_ieee_id=10356073&rft_doaj_id=oai_doaj_org_article_0400dab2c00144c1a8922d2ad6ff0d18&rfr_iscdi=true