Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials
Background Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the pro...
Gespeichert in:
Veröffentlicht in: | Research synthesis methods 2020-05, Vol.11 (3), p.484-493 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 493 |
---|---|
container_issue | 3 |
container_start_page | 484 |
container_title | Research synthesis methods |
container_volume | 11 |
creator | Armijo‐Olivo, Susan Craig, Rodger Campbell, Sandy |
description | Background
Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers.
Methods
Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers.
Results
The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%).
Conclusion
RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers. |
doi_str_mv | 10.1002/jrsm.1398 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2356595737</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ1253736</ericid><sourcerecordid>2356595737</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</originalsourceid><addsrcrecordid>eNp1kMtKAzEUhoMotqgLH0AJuNFF20nSSSZLKd6KInhZD7na1JlJTWaU-vSmVrsQPJtz4P_O7QfgEGVDlGV4NA-xHiLCiy3QRyznA1wUbHtTM94DBzHOsxSEU0zZLugRnNGcEdwHcuLrhQiueYG1UDPXGCgaDWddLRoYzLszHyZE2Hpo3kXVidbAdmZgcPEVegulExG6RKYmX7tPo6HyTRt8VaWyDU5UcR_s2JTMwU_eA8-XF0-T68Ht_dXN5Px2oAjLi4GUjFGZj7EmWktrEbYZpZozTKTFluZKMaIwkgXiBdGWIKUVpwW2YzzWUpM9cLqeuwj-rTOxLWsXlakq0RjfxRKTnOY8vc0SevIHnfsuNOm6RHHOM4ooT9TZmlLBxxiMLRfB1SIsS5SVK-_LlfflyvvEHv9M7GRt9Ib8dToBR2vABKc28sUU4TwdRJM-WusfrjLL_zeV04fHu--VXzHImEc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2399906169</pqid></control><display><type>article</type><title>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</title><source>Wiley Online Library - AutoHoldings Journals</source><creator>Armijo‐Olivo, Susan ; Craig, Rodger ; Campbell, Sandy</creator><creatorcontrib>Armijo‐Olivo, Susan ; Craig, Rodger ; Campbell, Sandy</creatorcontrib><description>Background
Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers.
Methods
Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers.
Results
The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%).
Conclusion
RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.</description><identifier>ISSN: 1759-2879</identifier><identifier>EISSN: 1759-2887</identifier><identifier>DOI: 10.1002/jrsm.1398</identifier><identifier>PMID: 32065732</identifier><language>eng</language><publisher>England: Wiley-Blackwell</publisher><subject>Accuracy ; Agreements ; artificial intelligence ; Bias ; Clinical trials ; Comparative Analysis ; Computer Software ; Decision Making ; Evaluation ; Evaluation Methods ; Evaluators ; Evidence ; Health ; health technology assessment (HTA) ; Information Technology ; Interrater Reliability ; inter‐rater reliability ; Literature reviews ; Personnel ; Policy Formation ; Randomization ; randomized controlled trial ; Randomized Controlled Trials ; Risk ; Risk assessment ; risk of bias ; Software ; systematic review ; Technology assessment</subject><ispartof>Research synthesis methods, 2020-05, Vol.11 (3), p.484-493</ispartof><rights>2020 John Wiley & Sons, Ltd</rights><rights>2020 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</citedby><cites>FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</cites><orcidid>0000-0001-7942-433X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fjrsm.1398$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fjrsm.1398$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ1253736$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32065732$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Armijo‐Olivo, Susan</creatorcontrib><creatorcontrib>Craig, Rodger</creatorcontrib><creatorcontrib>Campbell, Sandy</creatorcontrib><title>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</title><title>Research synthesis methods</title><addtitle>Res Synth Methods</addtitle><description>Background
Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers.
Methods
Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers.
Results
The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%).
Conclusion
RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.</description><subject>Accuracy</subject><subject>Agreements</subject><subject>artificial intelligence</subject><subject>Bias</subject><subject>Clinical trials</subject><subject>Comparative Analysis</subject><subject>Computer Software</subject><subject>Decision Making</subject><subject>Evaluation</subject><subject>Evaluation Methods</subject><subject>Evaluators</subject><subject>Evidence</subject><subject>Health</subject><subject>health technology assessment (HTA)</subject><subject>Information Technology</subject><subject>Interrater Reliability</subject><subject>inter‐rater reliability</subject><subject>Literature reviews</subject><subject>Personnel</subject><subject>Policy Formation</subject><subject>Randomization</subject><subject>randomized controlled trial</subject><subject>Randomized Controlled Trials</subject><subject>Risk</subject><subject>Risk assessment</subject><subject>risk of bias</subject><subject>Software</subject><subject>systematic review</subject><subject>Technology assessment</subject><issn>1759-2879</issn><issn>1759-2887</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp1kMtKAzEUhoMotqgLH0AJuNFF20nSSSZLKd6KInhZD7na1JlJTWaU-vSmVrsQPJtz4P_O7QfgEGVDlGV4NA-xHiLCiy3QRyznA1wUbHtTM94DBzHOsxSEU0zZLugRnNGcEdwHcuLrhQiueYG1UDPXGCgaDWddLRoYzLszHyZE2Hpo3kXVidbAdmZgcPEVegulExG6RKYmX7tPo6HyTRt8VaWyDU5UcR_s2JTMwU_eA8-XF0-T68Ht_dXN5Px2oAjLi4GUjFGZj7EmWktrEbYZpZozTKTFluZKMaIwkgXiBdGWIKUVpwW2YzzWUpM9cLqeuwj-rTOxLWsXlakq0RjfxRKTnOY8vc0SevIHnfsuNOm6RHHOM4ooT9TZmlLBxxiMLRfB1SIsS5SVK-_LlfflyvvEHv9M7GRt9Ib8dToBR2vABKc28sUU4TwdRJM-WusfrjLL_zeV04fHu--VXzHImEc</recordid><startdate>202005</startdate><enddate>202005</enddate><creator>Armijo‐Olivo, Susan</creator><creator>Craig, Rodger</creator><creator>Campbell, Sandy</creator><general>Wiley-Blackwell</general><general>Wiley Subscription Services, Inc</general><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-7942-433X</orcidid></search><sort><creationdate>202005</creationdate><title>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</title><author>Armijo‐Olivo, Susan ; Craig, Rodger ; Campbell, Sandy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Agreements</topic><topic>artificial intelligence</topic><topic>Bias</topic><topic>Clinical trials</topic><topic>Comparative Analysis</topic><topic>Computer Software</topic><topic>Decision Making</topic><topic>Evaluation</topic><topic>Evaluation Methods</topic><topic>Evaluators</topic><topic>Evidence</topic><topic>Health</topic><topic>health technology assessment (HTA)</topic><topic>Information Technology</topic><topic>Interrater Reliability</topic><topic>inter‐rater reliability</topic><topic>Literature reviews</topic><topic>Personnel</topic><topic>Policy Formation</topic><topic>Randomization</topic><topic>randomized controlled trial</topic><topic>Randomized Controlled Trials</topic><topic>Risk</topic><topic>Risk assessment</topic><topic>risk of bias</topic><topic>Software</topic><topic>systematic review</topic><topic>Technology assessment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Armijo‐Olivo, Susan</creatorcontrib><creatorcontrib>Craig, Rodger</creatorcontrib><creatorcontrib>Campbell, Sandy</creatorcontrib><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Research synthesis methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Armijo‐Olivo, Susan</au><au>Craig, Rodger</au><au>Campbell, Sandy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ1253736</ericid><atitle>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</atitle><jtitle>Research synthesis methods</jtitle><addtitle>Res Synth Methods</addtitle><date>2020-05</date><risdate>2020</risdate><volume>11</volume><issue>3</issue><spage>484</spage><epage>493</epage><pages>484-493</pages><issn>1759-2879</issn><eissn>1759-2887</eissn><abstract>Background
Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers.
Methods
Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers.
Results
The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%).
Conclusion
RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.</abstract><cop>England</cop><pub>Wiley-Blackwell</pub><pmid>32065732</pmid><doi>10.1002/jrsm.1398</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-7942-433X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1759-2879 |
ispartof | Research synthesis methods, 2020-05, Vol.11 (3), p.484-493 |
issn | 1759-2879 1759-2887 |
language | eng |
recordid | cdi_proquest_miscellaneous_2356595737 |
source | Wiley Online Library - AutoHoldings Journals |
subjects | Accuracy Agreements artificial intelligence Bias Clinical trials Comparative Analysis Computer Software Decision Making Evaluation Evaluation Methods Evaluators Evidence Health health technology assessment (HTA) Information Technology Interrater Reliability inter‐rater reliability Literature reviews Personnel Policy Formation Randomization randomized controlled trial Randomized Controlled Trials Risk Risk assessment risk of bias Software systematic review Technology assessment |
title | Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T03%3A00%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparing%20machine%20and%20human%20reviewers%20to%20evaluate%20the%20risk%20of%20bias%20in%20randomized%20controlled%20trials&rft.jtitle=Research%20synthesis%20methods&rft.au=Armijo%E2%80%90Olivo,%20Susan&rft.date=2020-05&rft.volume=11&rft.issue=3&rft.spage=484&rft.epage=493&rft.pages=484-493&rft.issn=1759-2879&rft.eissn=1759-2887&rft_id=info:doi/10.1002/jrsm.1398&rft_dat=%3Cproquest_cross%3E2356595737%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2399906169&rft_id=info:pmid/32065732&rft_ericid=EJ1253736&rfr_iscdi=true |