Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych
One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness...
Gespeichert in:
Veröffentlicht in: | Studia Socjologiczne 2022, Vol.245 (2), p.137-164 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | pol |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 164 |
---|---|
container_issue | 2 |
container_start_page | 137 |
container_title | Studia Socjologiczne |
container_volume | 245 |
creator | Matuszewski, Paweł |
description | One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets. |
doi_str_mv | 10.24425/sts.2022.141426 |
format | Article |
fullrecord | <record><control><sourceid>ceeol_proqu</sourceid><recordid>TN_cdi_proquest_journals_2682416687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ceeol_id>1049195</ceeol_id><sourcerecordid>1049195</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2036-8b966c4a02427b444bad756f56a1eefdba1dd5d9a1e4108e995dd9ba7c63e273</originalsourceid><addsrcrecordid>eNpFkc1KxDAUhYMoOI7u3QgBt3ZM0jSdLkX8A8GN4LKkyS2mnWlqYyl1JQO-hMM8hY-gfRGfxNYKri73cM534R6EDimZMc5ZcOqe3YwRxmaUU87EFpqwgAceC0OyjSaE-JHn-yHdRXvOZYRwwiM6QZvufWlAGVxI3EDWrZWpT3C3_hfbUfx-3czwQ9mt2gZnMreDiHOrofr8aAajfjGyW8nCAHZjGOrKFran4nwhXZuavM6-3lS_N21pGwN9BDd4CdpI9YhdabsVqJfil22bVj3uo51ULhwc_M0pur-8uD-_9m7vrm7Oz249xYgvvHkSCaG4JIyzMOGcJ1KHgUgDISlAqhNJtQ501G-ckjlEUaB1lMhQCR9Y6E_R8YgtK_tUg3uOM1tXRX8xZmLOOBViPrjI6FKVda6CNC4rs5RVG1MS_3YQ9x3EQwfx2EEfOfqLANjFP5UOz48C_wfd2Y_A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2682416687</pqid></control><display><type>article</type><title>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</title><source>Central and Eastern European Online Library</source><source>Alma/SFX Local Collection</source><source>Sociological Abstracts</source><creator>Matuszewski, Paweł</creator><creatorcontrib>Matuszewski, Paweł</creatorcontrib><description>One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.</description><identifier>ISSN: 0039-3371</identifier><identifier>ISSN: 2545-2770</identifier><identifier>EISSN: 2545-2770</identifier><identifier>DOI: 10.24425/sts.2022.141426</identifier><language>pol</language><publisher>Warsaw: Institute of Philosophy and Sociology, Polish Academy of Sciences</publisher><subject>Classifiers ; Neural networks ; Sentiment analysis ; Simulation ; Social media ; Social Sciences ; Wastes</subject><ispartof>Studia Socjologiczne, 2022, Vol.245 (2), p.137-164</ispartof><rights>Copyright Polska Akademia Nauk, Instytut Filozofii i Socjologii 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-0069-157X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://www.ceeol.com//api/image/getissuecoverimage?id=picture_2022_68094.png</thumbnail><link.rule.ids>314,776,780,4010,21341,27321,27900,27901,27902,33751</link.rule.ids></links><search><creatorcontrib>Matuszewski, Paweł</creatorcontrib><title>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</title><title>Studia Socjologiczne</title><addtitle>Sociological Studies</addtitle><description>One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.</description><subject>Classifiers</subject><subject>Neural networks</subject><subject>Sentiment analysis</subject><subject>Simulation</subject><subject>Social media</subject><subject>Social Sciences</subject><subject>Wastes</subject><issn>0039-3371</issn><issn>2545-2770</issn><issn>2545-2770</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>REL</sourceid><sourceid>BENPR</sourceid><sourceid>BHHNA</sourceid><recordid>eNpFkc1KxDAUhYMoOI7u3QgBt3ZM0jSdLkX8A8GN4LKkyS2mnWlqYyl1JQO-hMM8hY-gfRGfxNYKri73cM534R6EDimZMc5ZcOqe3YwRxmaUU87EFpqwgAceC0OyjSaE-JHn-yHdRXvOZYRwwiM6QZvufWlAGVxI3EDWrZWpT3C3_hfbUfx-3czwQ9mt2gZnMreDiHOrofr8aAajfjGyW8nCAHZjGOrKFran4nwhXZuavM6-3lS_N21pGwN9BDd4CdpI9YhdabsVqJfil22bVj3uo51ULhwc_M0pur-8uD-_9m7vrm7Oz249xYgvvHkSCaG4JIyzMOGcJ1KHgUgDISlAqhNJtQ501G-ckjlEUaB1lMhQCR9Y6E_R8YgtK_tUg3uOM1tXRX8xZmLOOBViPrjI6FKVda6CNC4rs5RVG1MS_3YQ9x3EQwfx2EEfOfqLANjFP5UOz48C_wfd2Y_A</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Matuszewski, Paweł</creator><general>Institute of Philosophy and Sociology, Polish Academy of Sciences</general><general>Instytut Filozofii i Socjologii Polskiej Akademii Nauk</general><general>Polska Akademia Nauk, Instytut Filozofii i Socjologii</general><scope>AE2</scope><scope>BIXPP</scope><scope>REL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7U4</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AIMQZ</scope><scope>ALSLI</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BHHNA</scope><scope>CCPQU</scope><scope>DWI</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HEHIP</scope><scope>JBE</scope><scope>LIQON</scope><scope>M2R</scope><scope>M2S</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>WZK</scope><orcidid>https://orcid.org/0000-0003-0069-157X</orcidid></search><sort><creationdate>2022</creationdate><title>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</title><author>Matuszewski, Paweł</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2036-8b966c4a02427b444bad756f56a1eefdba1dd5d9a1e4108e995dd9ba7c63e273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>pol</language><creationdate>2022</creationdate><topic>Classifiers</topic><topic>Neural networks</topic><topic>Sentiment analysis</topic><topic>Simulation</topic><topic>Social media</topic><topic>Social Sciences</topic><topic>Wastes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Matuszewski, Paweł</creatorcontrib><collection>Central and Eastern European Online Library (C.E.E.O.L.) (DFG Nationallizenzen)</collection><collection>CEEOL: Open Access</collection><collection>Central and Eastern European Online Library</collection><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>Sociological Abstracts (pre-2017)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest One Literature</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Sociological Abstracts</collection><collection>ProQuest One Community College</collection><collection>Sociological Abstracts</collection><collection>ProQuest Central Korea</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>Sociology Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest One Literature - U.S. Customers Only</collection><collection>Social Science Database</collection><collection>Sociology Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Sociological Abstracts (Ovid)</collection><jtitle>Studia Socjologiczne</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Matuszewski, Paweł</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych</atitle><jtitle>Studia Socjologiczne</jtitle><addtitle>Sociological Studies</addtitle><date>2022</date><risdate>2022</risdate><volume>245</volume><issue>2</issue><spage>137</spage><epage>164</epage><pages>137-164</pages><issn>0039-3371</issn><issn>2545-2770</issn><eissn>2545-2770</eissn><abstract>One of the critical decisions when manually coding text data is whether to verify the coders’ work. In the case of supervised models, this leads to a significant dilemma: is it better to provide the model with a large number of cases on which it will learn at the expense of verifying the correctness of the data, or whether it is better to code each case n-times, which will allow to compare the codes and check their correctness but at the same time will reduce the training dataset by n-fold. Such a decision not only affect the final results of the classifier. From the researchers’ point of view, it is also crucial because, realistically assuming that research has limited funding, it cannot be undone. The study uses a simulation approach and provides conclusions and recommendations based on 100,000 unique and hand-coded tweets.</abstract><cop>Warsaw</cop><pub>Institute of Philosophy and Sociology, Polish Academy of Sciences</pub><doi>10.24425/sts.2022.141426</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0003-0069-157X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0039-3371 |
ispartof | Studia Socjologiczne, 2022, Vol.245 (2), p.137-164 |
issn | 0039-3371 2545-2770 2545-2770 |
language | pol |
recordid | cdi_proquest_journals_2682416687 |
source | Central and Eastern European Online Library; Alma/SFX Local Collection; Sociological Abstracts |
subjects | Classifiers Neural networks Sentiment analysis Simulation Social media Social Sciences Wastes |
title | Śmieci na wejściu, śmieci na wyjściu”. Wpływ jakości koderów na działanie sieci neuronowej klasyfikującej wypowiedzi w mediach społecznościowych |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A31%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ceeol_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=%C5%9Amieci%20na%20wej%C5%9Bciu,%20%C5%9Bmieci%20na%20wyj%C5%9Bciu%E2%80%9D.%20Wp%C5%82yw%20jako%C5%9Bci%20koder%C3%B3w%20na%20dzia%C5%82anie%20sieci%20neuronowej%20klasyfikuj%C4%85cej%20wypowiedzi%20w%20mediach%20spo%C5%82eczno%C5%9Bciowych&rft.jtitle=Studia%20Socjologiczne&rft.au=Matuszewski,%20Pawe%C5%82&rft.date=2022&rft.volume=245&rft.issue=2&rft.spage=137&rft.epage=164&rft.pages=137-164&rft.issn=0039-3371&rft.eissn=2545-2770&rft_id=info:doi/10.24425/sts.2022.141426&rft_dat=%3Cceeol_proqu%3E1049195%3C/ceeol_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2682416687&rft_id=info:pmid/&rft_ceeol_id=1049195&rfr_iscdi=true |