Identifying Fraudulent Respondents in Preference Surveys: An Example in Inflammatory Bowel Disease
Background Social media and online surveys are commonly used to recruit and collect data for health preference studies. Online data fraud (i.e., intentional duplicate responses/straight-lining, hots, professional survey takers who provide fraudulent responses to meet study eligibility) is increasing...
Gespeichert in:
Veröffentlicht in: | The patient : patient-centered outcomes research 2024-11, Vol.17 (6), p.719-720 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background Social media and online surveys are commonly used to recruit and collect data for health preference studies. Online data fraud (i.e., intentional duplicate responses/straight-lining, hots, professional survey takers who provide fraudulent responses to meet study eligibility) is increasing and difficult to identify. We developed a fraud identification algorithm and verification process and demonstrate the impact of fraudulent respondents on data and results. Methods We administered an online best-worst scaling (BWS) survey on healthcare processes for managing inflammatory bowel disease (IBD) to Canadian IBD patients. Recruitment was done in clinic and online (mailing lists, social media). A gift card was offered for participation which resulted in an influx of fraudulent respondents. We developed a fraud identification algorithm with 13 binary 'red flag' variables related to respondent age, year of IBD diagnosis, postal code, survey duration, responses to open text questions, email address, and Qualtrics fraud variables. These variables were used to generate a fraudulent response score (FRS; range-0 (most likely real (LR)) to 13 (most likely fraudulent (LF)). Respondents with FRS >3 were categorized as LF. Data of respondents with FRS ?3 were further reviewed to determine categorization (LF, LR, unsure). Next, respondents categorized as LR or unsure underwent age verification; those who correctly verified age remained categorized as LR. BWS data were analyzed using conditional logit. We explored differences in results by FRS (>3 vs ?3) and categorization (LF, LR, unsure). Results Based on FRS, 75% (n = 3258) of the 4334 respondents were initially categorized as LF, 17% (n = 727) as unsure, and 8% (n = 349) as LR. After age verification, 76% (n = 3297) were categorized as LF, 14% (n = 592) as unsure, 10% (n = 442) as LR, and |
---|---|
ISSN: | 1178-1653 1178-1661 |