Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials

Background Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the pro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Research synthesis methods 2020-05, Vol.11 (3), p.484-493
Hauptverfasser: Armijo‐Olivo, Susan, Craig, Rodger, Campbell, Sandy
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 493
container_issue 3
container_start_page 484
container_title Research synthesis methods
container_volume 11
creator Armijo‐Olivo, Susan
Craig, Rodger
Campbell, Sandy
description Background Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers. Methods Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers. Results The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%). Conclusion RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.
doi_str_mv 10.1002/jrsm.1398
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2356595737</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ1253736</ericid><sourcerecordid>2356595737</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</originalsourceid><addsrcrecordid>eNp1kMtKAzEUhoMotqgLH0AJuNFF20nSSSZLKd6KInhZD7na1JlJTWaU-vSmVrsQPJtz4P_O7QfgEGVDlGV4NA-xHiLCiy3QRyznA1wUbHtTM94DBzHOsxSEU0zZLugRnNGcEdwHcuLrhQiueYG1UDPXGCgaDWddLRoYzLszHyZE2Hpo3kXVidbAdmZgcPEVegulExG6RKYmX7tPo6HyTRt8VaWyDU5UcR_s2JTMwU_eA8-XF0-T68Ht_dXN5Px2oAjLi4GUjFGZj7EmWktrEbYZpZozTKTFluZKMaIwkgXiBdGWIKUVpwW2YzzWUpM9cLqeuwj-rTOxLWsXlakq0RjfxRKTnOY8vc0SevIHnfsuNOm6RHHOM4ooT9TZmlLBxxiMLRfB1SIsS5SVK-_LlfflyvvEHv9M7GRt9Ib8dToBR2vABKc28sUU4TwdRJM-WusfrjLL_zeV04fHu--VXzHImEc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2399906169</pqid></control><display><type>article</type><title>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</title><source>Wiley Online Library - AutoHoldings Journals</source><creator>Armijo‐Olivo, Susan ; Craig, Rodger ; Campbell, Sandy</creator><creatorcontrib>Armijo‐Olivo, Susan ; Craig, Rodger ; Campbell, Sandy</creatorcontrib><description>Background Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers. Methods Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers. Results The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%). Conclusion RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.</description><identifier>ISSN: 1759-2879</identifier><identifier>EISSN: 1759-2887</identifier><identifier>DOI: 10.1002/jrsm.1398</identifier><identifier>PMID: 32065732</identifier><language>eng</language><publisher>England: Wiley-Blackwell</publisher><subject>Accuracy ; Agreements ; artificial intelligence ; Bias ; Clinical trials ; Comparative Analysis ; Computer Software ; Decision Making ; Evaluation ; Evaluation Methods ; Evaluators ; Evidence ; Health ; health technology assessment (HTA) ; Information Technology ; Interrater Reliability ; inter‐rater reliability ; Literature reviews ; Personnel ; Policy Formation ; Randomization ; randomized controlled trial ; Randomized Controlled Trials ; Risk ; Risk assessment ; risk of bias ; Software ; systematic review ; Technology assessment</subject><ispartof>Research synthesis methods, 2020-05, Vol.11 (3), p.484-493</ispartof><rights>2020 John Wiley &amp; Sons, Ltd</rights><rights>2020 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</citedby><cites>FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</cites><orcidid>0000-0001-7942-433X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fjrsm.1398$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fjrsm.1398$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ1253736$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32065732$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Armijo‐Olivo, Susan</creatorcontrib><creatorcontrib>Craig, Rodger</creatorcontrib><creatorcontrib>Campbell, Sandy</creatorcontrib><title>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</title><title>Research synthesis methods</title><addtitle>Res Synth Methods</addtitle><description>Background Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers. Methods Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers. Results The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%). Conclusion RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.</description><subject>Accuracy</subject><subject>Agreements</subject><subject>artificial intelligence</subject><subject>Bias</subject><subject>Clinical trials</subject><subject>Comparative Analysis</subject><subject>Computer Software</subject><subject>Decision Making</subject><subject>Evaluation</subject><subject>Evaluation Methods</subject><subject>Evaluators</subject><subject>Evidence</subject><subject>Health</subject><subject>health technology assessment (HTA)</subject><subject>Information Technology</subject><subject>Interrater Reliability</subject><subject>inter‐rater reliability</subject><subject>Literature reviews</subject><subject>Personnel</subject><subject>Policy Formation</subject><subject>Randomization</subject><subject>randomized controlled trial</subject><subject>Randomized Controlled Trials</subject><subject>Risk</subject><subject>Risk assessment</subject><subject>risk of bias</subject><subject>Software</subject><subject>systematic review</subject><subject>Technology assessment</subject><issn>1759-2879</issn><issn>1759-2887</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp1kMtKAzEUhoMotqgLH0AJuNFF20nSSSZLKd6KInhZD7na1JlJTWaU-vSmVrsQPJtz4P_O7QfgEGVDlGV4NA-xHiLCiy3QRyznA1wUbHtTM94DBzHOsxSEU0zZLugRnNGcEdwHcuLrhQiueYG1UDPXGCgaDWddLRoYzLszHyZE2Hpo3kXVidbAdmZgcPEVegulExG6RKYmX7tPo6HyTRt8VaWyDU5UcR_s2JTMwU_eA8-XF0-T68Ht_dXN5Px2oAjLi4GUjFGZj7EmWktrEbYZpZozTKTFluZKMaIwkgXiBdGWIKUVpwW2YzzWUpM9cLqeuwj-rTOxLWsXlakq0RjfxRKTnOY8vc0SevIHnfsuNOm6RHHOM4ooT9TZmlLBxxiMLRfB1SIsS5SVK-_LlfflyvvEHv9M7GRt9Ib8dToBR2vABKc28sUU4TwdRJM-WusfrjLL_zeV04fHu--VXzHImEc</recordid><startdate>202005</startdate><enddate>202005</enddate><creator>Armijo‐Olivo, Susan</creator><creator>Craig, Rodger</creator><creator>Campbell, Sandy</creator><general>Wiley-Blackwell</general><general>Wiley Subscription Services, Inc</general><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-7942-433X</orcidid></search><sort><creationdate>202005</creationdate><title>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</title><author>Armijo‐Olivo, Susan ; Craig, Rodger ; Campbell, Sandy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3758-bb776b542d3ddbff12f066d9723bf2f65cc73c21b81983df31cdc9682f424dbd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Agreements</topic><topic>artificial intelligence</topic><topic>Bias</topic><topic>Clinical trials</topic><topic>Comparative Analysis</topic><topic>Computer Software</topic><topic>Decision Making</topic><topic>Evaluation</topic><topic>Evaluation Methods</topic><topic>Evaluators</topic><topic>Evidence</topic><topic>Health</topic><topic>health technology assessment (HTA)</topic><topic>Information Technology</topic><topic>Interrater Reliability</topic><topic>inter‐rater reliability</topic><topic>Literature reviews</topic><topic>Personnel</topic><topic>Policy Formation</topic><topic>Randomization</topic><topic>randomized controlled trial</topic><topic>Randomized Controlled Trials</topic><topic>Risk</topic><topic>Risk assessment</topic><topic>risk of bias</topic><topic>Software</topic><topic>systematic review</topic><topic>Technology assessment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Armijo‐Olivo, Susan</creatorcontrib><creatorcontrib>Craig, Rodger</creatorcontrib><creatorcontrib>Campbell, Sandy</creatorcontrib><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Research synthesis methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Armijo‐Olivo, Susan</au><au>Craig, Rodger</au><au>Campbell, Sandy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ1253736</ericid><atitle>Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials</atitle><jtitle>Research synthesis methods</jtitle><addtitle>Res Synth Methods</addtitle><date>2020-05</date><risdate>2020</risdate><volume>11</volume><issue>3</issue><spage>484</spage><epage>493</epage><pages>484-493</pages><issn>1759-2879</issn><eissn>1759-2887</eissn><abstract>Background Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi‐autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers. Methods Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers. Results The study included 372 RCTs. Inter‐rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB). RobotReviewer was found to use a high percentage of “irrelevant supporting quotations” to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%). Conclusion RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.</abstract><cop>England</cop><pub>Wiley-Blackwell</pub><pmid>32065732</pmid><doi>10.1002/jrsm.1398</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-7942-433X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1759-2879
ispartof Research synthesis methods, 2020-05, Vol.11 (3), p.484-493
issn 1759-2879
1759-2887
language eng
recordid cdi_proquest_miscellaneous_2356595737
source Wiley Online Library - AutoHoldings Journals
subjects Accuracy
Agreements
artificial intelligence
Bias
Clinical trials
Comparative Analysis
Computer Software
Decision Making
Evaluation
Evaluation Methods
Evaluators
Evidence
Health
health technology assessment (HTA)
Information Technology
Interrater Reliability
inter‐rater reliability
Literature reviews
Personnel
Policy Formation
Randomization
randomized controlled trial
Randomized Controlled Trials
Risk
Risk assessment
risk of bias
Software
systematic review
Technology assessment
title Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T03%3A00%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparing%20machine%20and%20human%20reviewers%20to%20evaluate%20the%20risk%20of%20bias%20in%20randomized%20controlled%20trials&rft.jtitle=Research%20synthesis%20methods&rft.au=Armijo%E2%80%90Olivo,%20Susan&rft.date=2020-05&rft.volume=11&rft.issue=3&rft.spage=484&rft.epage=493&rft.pages=484-493&rft.issn=1759-2879&rft.eissn=1759-2887&rft_id=info:doi/10.1002/jrsm.1398&rft_dat=%3Cproquest_cross%3E2356595737%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2399906169&rft_id=info:pmid/32065732&rft_ericid=EJ1253736&rfr_iscdi=true