Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych

One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Studia Socjologiczne 2022, Vol.245 (2), p.137-164
1. Verfasser: Matuszewski, Paweł
Format: Artikel
Sprache:pol
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 164
container_issue 2
container_start_page 137
container_title Studia Socjologiczne
container_volume 245
creator Matuszewski, Paweł
description One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.
doi_str_mv 10.24425/sts.2022.141426
format Article
fullrecord <record><control><sourceid>ceeol_proqu</sourceid><recordid>TN_cdi_proquest_journals_2682416687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ceeol_id>1049195</ceeol_id><sourcerecordid>1049195</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2036-8b966c4a02427b444bad756f56a1eefdba1dd5d9a1e4108e995dd9ba7c63e273</originalsourceid><addsrcrecordid>eNpFkc1KxDAUhYMoOI7u3QgBt3ZM0jSdLkX8A8GN4LKkyS2mnWlqYyl1JQO-hMM8hY-gfRGfxNYKri73cM534R6EDimZMc5ZcOqe3YwRxmaUU87EFpqwgAceC0OyjSaE-JHn-yHdRXvOZYRwwiM6QZvufWlAGVxI3EDWrZWpT3C3_hfbUfx-3czwQ9mt2gZnMreDiHOrofr8aAajfjGyW8nCAHZjGOrKFran4nwhXZuavM6-3lS_N21pGwN9BDd4CdpI9YhdabsVqJfil22bVj3uo51ULhwc_M0pur-8uD-_9m7vrm7Oz249xYgvvHkSCaG4JIyzMOGcJ1KHgUgDISlAqhNJtQ501G-ckjlEUaB1lMhQCR9Y6E_R8YgtK_tUg3uOM1tXRX8xZmLOOBViPrjI6FKVda6CNC4rs5RVG1MS_3YQ9x3EQwfx2EEfOfqLANjFP5UOz48C_wfd2Y_A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2682416687</pqid></control><display><type>article</type><title>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</title><source>Central and Eastern European Online Library</source><source>Alma/SFX Local Collection</source><source>Sociological Abstracts</source><creator>Matuszewski, Paweł</creator><creatorcontrib>Matuszewski, Paweł</creatorcontrib><description>One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.</description><identifier>ISSN: 0039-3371</identifier><identifier>ISSN: 2545-2770</identifier><identifier>EISSN: 2545-2770</identifier><identifier>DOI: 10.24425/sts.2022.141426</identifier><language>pol</language><publisher>Warsaw: Institute of Philosophy and Sociology, Polish Academy of Sciences</publisher><subject>Classifiers ; Neural networks ; Sentiment analysis ; Simulation ; Social media ; Social Sciences ; Wastes</subject><ispartof>Studia Socjologiczne, 2022, Vol.245 (2), p.137-164</ispartof><rights>Copyright Polska Akademia Nauk, Instytut Filozofii i Socjologii 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-0069-157X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://www.ceeol.com//api/image/getissuecoverimage?id=picture_2022_68094.png</thumbnail><link.rule.ids>314,776,780,4010,21341,27321,27900,27901,27902,33751</link.rule.ids></links><search><creatorcontrib>Matuszewski, Paweł</creatorcontrib><title>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</title><title>Studia Socjologiczne</title><addtitle>Sociological Studies</addtitle><description>One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.</description><subject>Classifiers</subject><subject>Neural networks</subject><subject>Sentiment analysis</subject><subject>Simulation</subject><subject>Social media</subject><subject>Social Sciences</subject><subject>Wastes</subject><issn>0039-3371</issn><issn>2545-2770</issn><issn>2545-2770</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>REL</sourceid><sourceid>BENPR</sourceid><sourceid>BHHNA</sourceid><recordid>eNpFkc1KxDAUhYMoOI7u3QgBt3ZM0jSdLkX8A8GN4LKkyS2mnWlqYyl1JQO-hMM8hY-gfRGfxNYKri73cM534R6EDimZMc5ZcOqe3YwRxmaUU87EFpqwgAceC0OyjSaE-JHn-yHdRXvOZYRwwiM6QZvufWlAGVxI3EDWrZWpT3C3_hfbUfx-3czwQ9mt2gZnMreDiHOrofr8aAajfjGyW8nCAHZjGOrKFran4nwhXZuavM6-3lS_N21pGwN9BDd4CdpI9YhdabsVqJfil22bVj3uo51ULhwc_M0pur-8uD-_9m7vrm7Oz249xYgvvHkSCaG4JIyzMOGcJ1KHgUgDISlAqhNJtQ501G-ckjlEUaB1lMhQCR9Y6E_R8YgtK_tUg3uOM1tXRX8xZmLOOBViPrjI6FKVda6CNC4rs5RVG1MS_3YQ9x3EQwfx2EEfOfqLANjFP5UOz48C_wfd2Y_A</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Matuszewski, Paweł</creator><general>Institute of Philosophy and Sociology, Polish Academy of Sciences</general><general>Instytut Filozofii i Socjologii Polskiej Akademii Nauk</general><general>Polska Akademia Nauk, Instytut Filozofii i Socjologii</general><scope>AE2</scope><scope>BIXPP</scope><scope>REL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7U4</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AIMQZ</scope><scope>ALSLI</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BHHNA</scope><scope>CCPQU</scope><scope>DWI</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HEHIP</scope><scope>JBE</scope><scope>LIQON</scope><scope>M2R</scope><scope>M2S</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>WZK</scope><orcidid>https://orcid.org/0000-0003-0069-157X</orcidid></search><sort><creationdate>2022</creationdate><title>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</title><author>Matuszewski, Paweł</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2036-8b966c4a02427b444bad756f56a1eefdba1dd5d9a1e4108e995dd9ba7c63e273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>pol</language><creationdate>2022</creationdate><topic>Classifiers</topic><topic>Neural networks</topic><topic>Sentiment analysis</topic><topic>Simulation</topic><topic>Social media</topic><topic>Social Sciences</topic><topic>Wastes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Matuszewski, Paweł</creatorcontrib><collection>Central and Eastern European Online Library (C.E.E.O.L.) (DFG Nationallizenzen)</collection><collection>CEEOL: Open Access</collection><collection>Central and Eastern European Online Library</collection><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>Sociological Abstracts (pre-2017)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest One Literature</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Sociological Abstracts</collection><collection>ProQuest One Community College</collection><collection>Sociological Abstracts</collection><collection>ProQuest Central Korea</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>Sociology Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest One Literature - U.S. Customers Only</collection><collection>Social Science Database</collection><collection>Sociology Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Sociological Abstracts (Ovid)</collection><jtitle>Studia Socjologiczne</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Matuszewski, Paweł</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</atitle><jtitle>Studia Socjologiczne</jtitle><addtitle>Sociological Studies</addtitle><date>2022</date><risdate>2022</risdate><volume>245</volume><issue>2</issue><spage>137</spage><epage>164</epage><pages>137-164</pages><issn>0039-3371</issn><issn>2545-2770</issn><eissn>2545-2770</eissn><abstract>One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.</abstract><cop>Warsaw</cop><pub>Institute of Philosophy and Sociology, Polish Academy of Sciences</pub><doi>10.24425/sts.2022.141426</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-0069-157X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0039-3371
ispartof Studia Socjologiczne, 2022, Vol.245 (2), p.137-164
issn 0039-3371
2545-2770
2545-2770
language pol
recordid cdi_proquest_journals_2682416687
source Central and Eastern European Online Library; Alma/SFX Local Collection; Sociological Abstracts
subjects Classifiers
Neural networks
Sentiment analysis
Simulation
Social media
Social Sciences
Wastes
title Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A31%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ceeol_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=%C5%9Amieci%20na%20wej%C5%9Bciu,%20%C5%9Bmieci%20na%20wyj%C5%9Bciu%E2%80%9D.%20Wp%C5%82yw%20jako%C5%9Bci%20koder%C3%B3w%20na%20dzia%C5%82anie%20sieci%20neuronowej%20klasyfikuj%C4%85cej%20wypowiedzi%20w%20mediach%20spo%C5%82eczno%C5%9Bciowych&rft.jtitle=Studia%20Socjologiczne&rft.au=Matuszewski,%20Pawe%C5%82&rft.date=2022&rft.volume=245&rft.issue=2&rft.spage=137&rft.epage=164&rft.pages=137-164&rft.issn=0039-3371&rft.eissn=2545-2770&rft_id=info:doi/10.24425/sts.2022.141426&rft_dat=%3Cceeol_proqu%3E1049195%3C/ceeol_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2682416687&rft_id=info:pmid/&rft_ceeol_id=1049195&rfr_iscdi=true