Improving the Reliability of Peer Review Without a Gold Standard
Peer review plays a crucial role in accreditation and credentialing processes as it can identify outliers and foster a peer learning approach, facilitating error analysis and knowledge sharing. However, traditional peer review methods may fall short in effectively addressing the interpretive variabi...
Gespeichert in:
Veröffentlicht in: | Journal of Imaging Informatics in Medicine 2024-04, Vol.37 (2), p.489-503 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 503 |
---|---|
container_issue | 2 |
container_start_page | 489 |
container_title | Journal of Imaging Informatics in Medicine |
container_volume | 37 |
creator | Äijö, Tarmo Elgort, Daniel Becker, Murray Herzog, Richard Brown, Richard K. J. Odry, Benjamin L. Vianu, Ron |
description | Peer review plays a crucial role in accreditation and credentialing processes as it can identify outliers and foster a peer learning approach, facilitating error analysis and knowledge sharing. However, traditional peer review methods may fall short in effectively addressing the interpretive variability among reviewing and primary reading radiologists, hindering scalability and effectiveness. Reducing this variability is key to enhancing the reliability of results and instilling confidence in the review process. In this paper, we propose a novel statistical approach called “Bayesian Inter-Reviewer Agreement Rate” (BIRAR) that integrates radiologist variability. By doing so, BIRAR aims to enhance the accuracy and consistency of peer review assessments, providing physicians involved in quality improvement and peer learning programs with valuable and reliable insights. A computer simulation was designed to assign predefined interpretive error rates to hypothetical interpreting and peer-reviewing radiologists. The Monte Carlo simulation then sampled (100 samples per experiment) the data that would be generated by peer reviews. The performances of BIRAR and four other peer review methods for measuring interpretive error rates were then evaluated, including a method that uses a gold standard diagnosis. Application of the BIRAR method resulted in 93% and 79% higher relative accuracy and 43% and 66% lower relative variability, compared to “Single/Standard” and “Majority Panel” peer review methods, respectively. Accuracy was defined by the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. Variability was defined by the 95% CI around the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. BIRAR is a practical and scalable peer review method that produces more accurate and less variable assessments of interpretive quality by accounting for variability within the group’s radiologists, implicitly applying a standard derived from the level of consensus within the group across various types of interpretive findings. |
doi_str_mv | 10.1007/s10278-024-00971-9 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_miscellaneous_2922948590</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3041683191</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-57ff71af4ac30409518f9390607fcf04cd763b423120c36df56e071323f2d6a73</originalsourceid><addsrcrecordid>eNp9kU1PGzEQhi1E1SDKH-gBrdQLl6Vjz669PtEqKgEpEogWcbScXTsx2qxTezdV_j0OSYH2gC-2PM-88_ES8pnCOQUQXyMFJqocWJEDSEFzeUCOmCyqnEnEwzfvETmJ0c2gRF6xghcfyQgrpDydI_LterkKfu26edYvTHZnWqdnrnX9JvM2uzUmpL-1M3-yB9cv_NBnOpv4tsl-9rprdGg-kQ9Wt9Gc7O9jcn_549f4Kp_eTK7H36d5jSXv81JYK6i2ha4RCpAlraxECRyErS0UdSM4zgqGlEGNvLElNyAoMrSs4VrgMbnY6a6G2dI0ten6oFu1Cm6pw0Z57dS_kc4t1NyvFaWAtESaFM72CsH_Hkzs1dLF2rSt7owfomKSbXdWSkjol__QRz-ELs2nUveUp_XJrSDbUXXwMQZjX7qhoLYmqZ1JKpmknk1SMiWdvp3jJeWvJQnAHRBTqJub8Fr7HdknEz6bQg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3041683191</pqid></control><display><type>article</type><title>Improving the Reliability of Peer Review Without a Gold Standard</title><source>PubMed Central</source><source>SpringerLink Journals - AutoHoldings</source><creator>Äijö, Tarmo ; Elgort, Daniel ; Becker, Murray ; Herzog, Richard ; Brown, Richard K. J. ; Odry, Benjamin L. ; Vianu, Ron</creator><creatorcontrib>Äijö, Tarmo ; Elgort, Daniel ; Becker, Murray ; Herzog, Richard ; Brown, Richard K. J. ; Odry, Benjamin L. ; Vianu, Ron</creatorcontrib><description>Peer review plays a crucial role in accreditation and credentialing processes as it can identify outliers and foster a peer learning approach, facilitating error analysis and knowledge sharing. However, traditional peer review methods may fall short in effectively addressing the interpretive variability among reviewing and primary reading radiologists, hindering scalability and effectiveness. Reducing this variability is key to enhancing the reliability of results and instilling confidence in the review process. In this paper, we propose a novel statistical approach called “Bayesian Inter-Reviewer Agreement Rate” (BIRAR) that integrates radiologist variability. By doing so, BIRAR aims to enhance the accuracy and consistency of peer review assessments, providing physicians involved in quality improvement and peer learning programs with valuable and reliable insights. A computer simulation was designed to assign predefined interpretive error rates to hypothetical interpreting and peer-reviewing radiologists. The Monte Carlo simulation then sampled (100 samples per experiment) the data that would be generated by peer reviews. The performances of BIRAR and four other peer review methods for measuring interpretive error rates were then evaluated, including a method that uses a gold standard diagnosis. Application of the BIRAR method resulted in 93% and 79% higher relative accuracy and 43% and 66% lower relative variability, compared to “Single/Standard” and “Majority Panel” peer review methods, respectively. Accuracy was defined by the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. Variability was defined by the 95% CI around the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. BIRAR is a practical and scalable peer review method that produces more accurate and less variable assessments of interpretive quality by accounting for variability within the group’s radiologists, implicitly applying a standard derived from the level of consensus within the group across various types of interpretive findings.</description><identifier>ISSN: 2948-2933</identifier><identifier>ISSN: 0897-1889</identifier><identifier>ISSN: 2948-2925</identifier><identifier>EISSN: 2948-2933</identifier><identifier>EISSN: 1618-727X</identifier><identifier>DOI: 10.1007/s10278-024-00971-9</identifier><identifier>PMID: 38316666</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Accuracy ; Assessments ; Bayesian analysis ; Computer simulation ; Data analysis ; Error analysis ; Imaging ; Learning programs ; Measurement methods ; Median (statistics) ; Medicine ; Medicine & Public Health ; Monte Carlo simulation ; Outliers (statistics) ; Peer review ; Quality control ; Radiology ; Reliability ; Reviewing ; Reviews ; Variability</subject><ispartof>Journal of Imaging Informatics in Medicine, 2024-04, Vol.37 (2), p.489-503</ispartof><rights>The Author(s) 2024</rights><rights>2024. The Author(s).</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c356t-57ff71af4ac30409518f9390607fcf04cd763b423120c36df56e071323f2d6a73</cites><orcidid>0000-0002-3008-4634</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11031531/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11031531/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27922,27923,53789,53791</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38316666$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Äijö, Tarmo</creatorcontrib><creatorcontrib>Elgort, Daniel</creatorcontrib><creatorcontrib>Becker, Murray</creatorcontrib><creatorcontrib>Herzog, Richard</creatorcontrib><creatorcontrib>Brown, Richard K. J.</creatorcontrib><creatorcontrib>Odry, Benjamin L.</creatorcontrib><creatorcontrib>Vianu, Ron</creatorcontrib><title>Improving the Reliability of Peer Review Without a Gold Standard</title><title>Journal of Imaging Informatics in Medicine</title><addtitle>J Digit Imaging. Inform. med</addtitle><addtitle>J Imaging Inform Med</addtitle><description>Peer review plays a crucial role in accreditation and credentialing processes as it can identify outliers and foster a peer learning approach, facilitating error analysis and knowledge sharing. However, traditional peer review methods may fall short in effectively addressing the interpretive variability among reviewing and primary reading radiologists, hindering scalability and effectiveness. Reducing this variability is key to enhancing the reliability of results and instilling confidence in the review process. In this paper, we propose a novel statistical approach called “Bayesian Inter-Reviewer Agreement Rate” (BIRAR) that integrates radiologist variability. By doing so, BIRAR aims to enhance the accuracy and consistency of peer review assessments, providing physicians involved in quality improvement and peer learning programs with valuable and reliable insights. A computer simulation was designed to assign predefined interpretive error rates to hypothetical interpreting and peer-reviewing radiologists. The Monte Carlo simulation then sampled (100 samples per experiment) the data that would be generated by peer reviews. The performances of BIRAR and four other peer review methods for measuring interpretive error rates were then evaluated, including a method that uses a gold standard diagnosis. Application of the BIRAR method resulted in 93% and 79% higher relative accuracy and 43% and 66% lower relative variability, compared to “Single/Standard” and “Majority Panel” peer review methods, respectively. Accuracy was defined by the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. Variability was defined by the 95% CI around the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. BIRAR is a practical and scalable peer review method that produces more accurate and less variable assessments of interpretive quality by accounting for variability within the group’s radiologists, implicitly applying a standard derived from the level of consensus within the group across various types of interpretive findings.</description><subject>Accuracy</subject><subject>Assessments</subject><subject>Bayesian analysis</subject><subject>Computer simulation</subject><subject>Data analysis</subject><subject>Error analysis</subject><subject>Imaging</subject><subject>Learning programs</subject><subject>Measurement methods</subject><subject>Median (statistics)</subject><subject>Medicine</subject><subject>Medicine & Public Health</subject><subject>Monte Carlo simulation</subject><subject>Outliers (statistics)</subject><subject>Peer review</subject><subject>Quality control</subject><subject>Radiology</subject><subject>Reliability</subject><subject>Reviewing</subject><subject>Reviews</subject><subject>Variability</subject><issn>2948-2933</issn><issn>0897-1889</issn><issn>2948-2925</issn><issn>2948-2933</issn><issn>1618-727X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><recordid>eNp9kU1PGzEQhi1E1SDKH-gBrdQLl6Vjz669PtEqKgEpEogWcbScXTsx2qxTezdV_j0OSYH2gC-2PM-88_ES8pnCOQUQXyMFJqocWJEDSEFzeUCOmCyqnEnEwzfvETmJ0c2gRF6xghcfyQgrpDydI_LterkKfu26edYvTHZnWqdnrnX9JvM2uzUmpL-1M3-yB9cv_NBnOpv4tsl-9rprdGg-kQ9Wt9Gc7O9jcn_549f4Kp_eTK7H36d5jSXv81JYK6i2ha4RCpAlraxECRyErS0UdSM4zgqGlEGNvLElNyAoMrSs4VrgMbnY6a6G2dI0ten6oFu1Cm6pw0Z57dS_kc4t1NyvFaWAtESaFM72CsH_Hkzs1dLF2rSt7owfomKSbXdWSkjol__QRz-ELs2nUveUp_XJrSDbUXXwMQZjX7qhoLYmqZ1JKpmknk1SMiWdvp3jJeWvJQnAHRBTqJub8Fr7HdknEz6bQg</recordid><startdate>20240401</startdate><enddate>20240401</enddate><creator>Äijö, Tarmo</creator><creator>Elgort, Daniel</creator><creator>Becker, Murray</creator><creator>Herzog, Richard</creator><creator>Brown, Richard K. J.</creator><creator>Odry, Benjamin L.</creator><creator>Vianu, Ron</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-3008-4634</orcidid></search><sort><creationdate>20240401</creationdate><title>Improving the Reliability of Peer Review Without a Gold Standard</title><author>Äijö, Tarmo ; Elgort, Daniel ; Becker, Murray ; Herzog, Richard ; Brown, Richard K. J. ; Odry, Benjamin L. ; Vianu, Ron</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-57ff71af4ac30409518f9390607fcf04cd763b423120c36df56e071323f2d6a73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Assessments</topic><topic>Bayesian analysis</topic><topic>Computer simulation</topic><topic>Data analysis</topic><topic>Error analysis</topic><topic>Imaging</topic><topic>Learning programs</topic><topic>Measurement methods</topic><topic>Median (statistics)</topic><topic>Medicine</topic><topic>Medicine & Public Health</topic><topic>Monte Carlo simulation</topic><topic>Outliers (statistics)</topic><topic>Peer review</topic><topic>Quality control</topic><topic>Radiology</topic><topic>Reliability</topic><topic>Reviewing</topic><topic>Reviews</topic><topic>Variability</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Äijö, Tarmo</creatorcontrib><creatorcontrib>Elgort, Daniel</creatorcontrib><creatorcontrib>Becker, Murray</creatorcontrib><creatorcontrib>Herzog, Richard</creatorcontrib><creatorcontrib>Brown, Richard K. J.</creatorcontrib><creatorcontrib>Odry, Benjamin L.</creatorcontrib><creatorcontrib>Vianu, Ron</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of Imaging Informatics in Medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Äijö, Tarmo</au><au>Elgort, Daniel</au><au>Becker, Murray</au><au>Herzog, Richard</au><au>Brown, Richard K. J.</au><au>Odry, Benjamin L.</au><au>Vianu, Ron</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving the Reliability of Peer Review Without a Gold Standard</atitle><jtitle>Journal of Imaging Informatics in Medicine</jtitle><stitle>J Digit Imaging. Inform. med</stitle><addtitle>J Imaging Inform Med</addtitle><date>2024-04-01</date><risdate>2024</risdate><volume>37</volume><issue>2</issue><spage>489</spage><epage>503</epage><pages>489-503</pages><issn>2948-2933</issn><issn>0897-1889</issn><issn>2948-2925</issn><eissn>2948-2933</eissn><eissn>1618-727X</eissn><abstract>Peer review plays a crucial role in accreditation and credentialing processes as it can identify outliers and foster a peer learning approach, facilitating error analysis and knowledge sharing. However, traditional peer review methods may fall short in effectively addressing the interpretive variability among reviewing and primary reading radiologists, hindering scalability and effectiveness. Reducing this variability is key to enhancing the reliability of results and instilling confidence in the review process. In this paper, we propose a novel statistical approach called “Bayesian Inter-Reviewer Agreement Rate” (BIRAR) that integrates radiologist variability. By doing so, BIRAR aims to enhance the accuracy and consistency of peer review assessments, providing physicians involved in quality improvement and peer learning programs with valuable and reliable insights. A computer simulation was designed to assign predefined interpretive error rates to hypothetical interpreting and peer-reviewing radiologists. The Monte Carlo simulation then sampled (100 samples per experiment) the data that would be generated by peer reviews. The performances of BIRAR and four other peer review methods for measuring interpretive error rates were then evaluated, including a method that uses a gold standard diagnosis. Application of the BIRAR method resulted in 93% and 79% higher relative accuracy and 43% and 66% lower relative variability, compared to “Single/Standard” and “Majority Panel” peer review methods, respectively. Accuracy was defined by the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. Variability was defined by the 95% CI around the median difference of Monte Carlo simulations between measured and pre-defined “actual” interpretive error rates. BIRAR is a practical and scalable peer review method that produces more accurate and less variable assessments of interpretive quality by accounting for variability within the group’s radiologists, implicitly applying a standard derived from the level of consensus within the group across various types of interpretive findings.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>38316666</pmid><doi>10.1007/s10278-024-00971-9</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-3008-4634</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2948-2933 |
ispartof | Journal of Imaging Informatics in Medicine, 2024-04, Vol.37 (2), p.489-503 |
issn | 2948-2933 0897-1889 2948-2925 2948-2933 1618-727X |
language | eng |
recordid | cdi_proquest_miscellaneous_2922948590 |
source | PubMed Central; SpringerLink Journals - AutoHoldings |
subjects | Accuracy Assessments Bayesian analysis Computer simulation Data analysis Error analysis Imaging Learning programs Measurement methods Median (statistics) Medicine Medicine & Public Health Monte Carlo simulation Outliers (statistics) Peer review Quality control Radiology Reliability Reviewing Reviews Variability |
title | Improving the Reliability of Peer Review Without a Gold Standard |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T18%3A48%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20the%20Reliability%20of%20Peer%20Review%20Without%20a%20Gold%20Standard&rft.jtitle=Journal%20of%20Imaging%20Informatics%20in%20Medicine&rft.au=%C3%84ij%C3%B6,%20Tarmo&rft.date=2024-04-01&rft.volume=37&rft.issue=2&rft.spage=489&rft.epage=503&rft.pages=489-503&rft.issn=2948-2933&rft.eissn=2948-2933&rft_id=info:doi/10.1007/s10278-024-00971-9&rft_dat=%3Cproquest_pubme%3E3041683191%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3041683191&rft_id=info:pmid/38316666&rfr_iscdi=true |