Viewing Study NCT06963957


Ignite Creation Date: 2025-12-24 @ 6:50 PM
Ignite Modification Date: 2025-12-26 @ 7:36 AM
Study NCT ID: NCT06963957
Status: COMPLETED
Last Update Posted: 2025-08-22
First Post: 2025-04-23
Is NOT Gene Therapy: True
Has Adverse Events: False

Brief Title: Automation Bias in Physician-LLM Diagnostic Reasoning
Sponsor:
Organization:

Raw JSON

{'hasResults': False, 'derivedSection': {'miscInfoModule': {'versionHolder': '2025-12-24'}, 'conditionBrowseModule': {'meshes': [{'id': 'D004194', 'term': 'Disease'}], 'ancestors': [{'id': 'D010335', 'term': 'Pathologic Processes'}, {'id': 'D013568', 'term': 'Pathological Conditions, Signs and Symptoms'}]}}, 'documentSection': {'largeDocumentModule': {'largeDocs': [{'date': '2025-05-28', 'size': 168583, 'label': 'Study Protocol and Statistical Analysis Plan', 'hasIcf': False, 'hasSap': True, 'filename': 'Prot_SAP_000.pdf', 'typeAbbrev': 'Prot_SAP', 'uploadDate': '2025-05-28T04:00', 'hasProtocol': True}]}}, 'protocolSection': {'designModule': {'phases': ['NA'], 'studyType': 'INTERVENTIONAL', 'designInfo': {'allocation': 'RANDOMIZED', 'maskingInfo': {'masking': 'SINGLE', 'whoMasked': ['OUTCOMES_ASSESSOR'], 'maskingDescription': 'Single (Outcomes Assessor)'}, 'primaryPurpose': 'DIAGNOSTIC', 'interventionModel': 'PARALLEL', 'interventionModelDescription': 'The trial will be designed as a randomized, two-arm, single-blind parallel group study.'}, 'enrollmentInfo': {'type': 'ACTUAL', 'count': 44}}, 'statusModule': {'overallStatus': 'COMPLETED', 'startDateStruct': {'date': '2025-06-20', 'type': 'ACTUAL'}, 'expandedAccessInfo': {'hasExpandedAccess': False}, 'statusVerifiedDate': '2025-08', 'completionDateStruct': {'date': '2025-08-15', 'type': 'ACTUAL'}, 'lastUpdateSubmitDate': '2025-08-21', 'studyFirstSubmitDate': '2025-04-23', 'studyFirstSubmitQcDate': '2025-04-30', 'lastUpdatePostDateStruct': {'date': '2025-08-22', 'type': 'ACTUAL'}, 'studyFirstPostDateStruct': {'date': '2025-05-09', 'type': 'ACTUAL'}, 'primaryCompletionDateStruct': {'date': '2025-08-15', 'type': 'ACTUAL'}}, 'outcomesModule': {'primaryOutcomes': [{'measure': 'Diagnostic reasoning', 'timeFrame': 'Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.', 'description': 'The primary outcome will be the percent correct for each case, ranging from 0 to 100%, where higher scores indicate better diagnostic performance. For each case, participants will be asked for their three leading diagnoses, findings that support each diagnosis, and findings that oppose each diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for each correct response. Participants will then be asked to name their top diagnosis they believe is most likely, earning 9 points for a reasonable response and 18 points for the most accurate response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with 0.5 point awarded for a partially correct response and 1 point for a completely correct response. The primary outcome will be compared at the case-level between the randomized groups.'}], 'secondaryOutcomes': [{'measure': 'Top choice diagnosis accuracy score', 'timeFrame': 'Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.', 'description': "The secondary outcome will measure participants' performance in identifying the most likely diagnosis for each clinical vignette. After evaluating each case, participants will select their single most likely diagnosis, which will be scored on a pre-specified Three-Tier Diagnostic Accuracy Scale: 18 points for the most accurate diagnosis, 9 points for a clinically reasonable alternative, and 0 points for an incorrect diagnosis. For each participant, a Top Choice Diagnosis Accuracy Score is calculated as (total points earned ÷ maximum possible points) × 100, yielding a 0-100 % range in which higher scores indicate greater diagnostic accuracy. This percentage score will be compared at the case-level between randomized groups to quantify the impact of automation bias on diagnostic decision-making."}]}, 'oversightModule': {'oversightHasDmc': False, 'isFdaRegulatedDrug': False, 'isFdaRegulatedDevice': False}, 'conditionsModule': {'keywords': ['clinical reasoning', 'large language models', 'automation bias', 'computer-assisted diagnosis'], 'conditions': ['Diagnosis']}, 'descriptionModule': {'briefSummary': 'This study aims to systematically measure the extent and patterns of automation bias among physicians when utilizing ChatGPT-4o in clinical decision-making.', 'detailedDescription': "Diagnostic errors represent a significant cause of preventable patient harm in healthcare systems worldwide. Recent advances in Large Language Models (LLMs) have shown promise in enhancing medical decision-making processes.\n\nHowever, there remains a critical gap in our understanding of how automation bias -- the tendency to over-rely on technological suggestions -- influences medical doctors' diagnostic reasoning when incorporating these AI tools into clinical practice.\n\nAutomation bias presents substantial risks in clinical environments, particularly as AI tools become more integrated into healthcare workflows. Although LLMs such as ChatGPT-4o offer potential advantages in reducing errors and improving efficiency, their lack of rigorous medical validation raises concerns about potentially amplifying cognitive biases through the generation of incorrect or misleading information.\n\nMultiple contextual factors can exacerbate automation bias in medical settings: time constraints in high-volume clinical settings, financial incentives that prioritize efficiency over thoroughness, cognitive fatigue during extended shifts, and diminished vigilance when confronting diagnostically challenging cases.\n\nThese factors may interact with psychological mechanisms that include the diffusion of responsibility, overconfidence in technological solutions, and cognitive offloading---collectively increasing the risk of uncritical acceptance of AI-generated recommendations.\n\nThis randomized controlled trial (RCT) aims to systematically measure the extent and patterns of automation bias among physicians when utilizing ChatGPT-4o in clinical decision-making. The investigators will assess how access to LLM-generated information influences diagnostic reasoning through a novel methodology that precisely quantifies automation bias. In this study, participants will be randomly assigned to one of two groups. The treatment group will receive LLM-generated recommendations containing deliberately introduced errors in a subset of cases, while the control group will receive LLM-generated recommendations without such deliberately introduced errors. Participants will evaluate six clinical vignettes randomly sequenced to prevent detection patterns. The flawed vignettes provided to the treatment group will incorporate subtle yet clinically significant errors that should be identifiable by trained doctors. This will enable investigators to quantify the degree of automation bias by measuring the differential in diagnostic accuracy scores between the treatment and control groups.\n\nPrior to participation, all physicians will complete a comprehensive training program covering LLM capabilities, prompt engineering techniques, and output evaluation strategies. Responses will be evaluated by blinded reviewers using a validated assessment rubric specifically designed to detect uncritical acceptance of erroneous information, with greater score disparities indicating stronger automation bias. This naturalistic approach will yield insights directly applicable to real clinical workflows, where mounting cognitive demands may progressively impact diagnostic decision quality."}, 'eligibilityModule': {'sex': 'ALL', 'stdAges': ['CHILD', 'ADULT', 'OLDER_ADULT'], 'healthyVolunteers': True, 'eligibilityCriteria': "Inclusion Criteria:\n\n* Completed Bachelor of Medicine, Bachelor of Surgery (MBBS) Exam. The equivalent degree of MBBS in US and Canada is called Doctor of Medicine (MD).\n* Full or Provisionally Registered Medical Practitioners with the Pakistan Medical and Dental Council (PMDC).\n* Participants must have completed a structured training program on the use of ChatGPT (or a comparable large language model), totaling at least 10 hours of instruction. The program must include hands-on practice related to LLM's aspects, specifically prompt engineering and content evaluation.\n\nExclusion Criteria:\n\n* Any other Registered Medical Practitioners (Full or Provisional) with PMDC (e.g., Professionals with Bachelor of Dental Surgery or BDS)."}, 'identificationModule': {'nctId': 'NCT06963957', 'briefTitle': 'Automation Bias in Physician-LLM Diagnostic Reasoning', 'organization': {'class': 'OTHER', 'fullName': 'Lahore University of Management Sciences'}, 'officialTitle': 'Trust or Verify? Automation Bias in Physician-LLM Diagnostic Reasoning', 'orgStudyIdInfo': {'id': 'IRB-0374'}}, 'armsInterventionsModule': {'armGroups': [{'type': 'ACTIVE_COMPARATOR', 'label': 'ChatGPT-4o Recommendations with Hallucinations', 'description': 'Participants will evaluate six clinical vignettes. During the trial, they will have access to clinical recommendations from a specific, commercially available LLM (ChatGPT-4o) in addition to conventional diagnostic resources. LLM recommendations for three vignettes will contain deliberately flawed diagnostic information and for three vignettes it will contain accurate recommendations). The cases will be presented in random order.', 'interventionNames': ['Other: ChatGPT-4o Recommendations with Hallucinations']}, {'type': 'NO_INTERVENTION', 'label': 'ChatGPT-4o Recommendations without Hallucinations', 'description': 'Participants will evaluate the same six clinical vignettes as in the intervention arm. During the trial, they will have access to clinical recommendations from a specific, commercially available LLM (ChatGPT-4o) in addition to conventional diagnostic resources. However, the LLM-generated recommendations will not contain any deliberately introduced errors. The cases will be presented in random order.'}], 'interventions': [{'name': 'ChatGPT-4o Recommendations with Hallucinations', 'type': 'OTHER', 'description': "ChatGPT-4o's differential diagnoses of six clinical vignettes, three of which will contain deliberately introduced inaccurate information.", 'armGroupLabels': ['ChatGPT-4o Recommendations with Hallucinations']}]}, 'contactsLocationsModule': {'locations': [{'zip': '54000', 'city': 'Lahore', 'state': 'Punjab Province', 'country': 'Pakistan', 'facility': 'Lahore University of Management Sciences', 'geoPoint': {'lat': 31.558, 'lon': 74.35071}}], 'overallOfficials': [{'name': 'Ihsan Ayyub Qazi, PhD', 'role': 'PRINCIPAL_INVESTIGATOR', 'affiliation': 'Lahore University of Management Sciences (LUMS)'}, {'name': 'Ayesha Ali, PhD', 'role': 'PRINCIPAL_INVESTIGATOR', 'affiliation': 'Lahore University of Management Sciences (LUMS)'}, {'name': 'Muhammad Asadullah Khawaja, MBBS', 'role': 'PRINCIPAL_INVESTIGATOR', 'affiliation': 'King Edward Medical University'}, {'name': 'Ali Zafar Sheikh, MBBS', 'role': 'PRINCIPAL_INVESTIGATOR', 'affiliation': 'Lahore General Hospital'}, {'name': 'Muhammad Junaid Akhtar, MBBS', 'role': 'PRINCIPAL_INVESTIGATOR', 'affiliation': "Children's Hospital, Lahore"}]}, 'ipdSharingStatementModule': {'ipdSharing': 'NO'}, 'sponsorCollaboratorsModule': {'leadSponsor': {'name': 'Lahore University of Management Sciences', 'class': 'OTHER'}, 'responsibleParty': {'type': 'PRINCIPAL_INVESTIGATOR', 'investigatorTitle': 'Full Professor, PhD', 'investigatorFullName': 'Ihsan Ayyub Qazi, PhD', 'investigatorAffiliation': 'Lahore University of Management Sciences'}}}}