Validating AI-assisted evaluation of open science practices in brain sciences: ChatGPT, Claude, and human expert comparisons
This talk was presented at the University of Manchester Open Research Conference, 9-10 June 2025.
This study examines the efficacy of advanced AI chatbots—ChatGPT 4 and Claude 3.5 Sonnet—in assessing open science practices within brain science research, providing a detailed comparison with human expert evaluations across 100 articles. We employed a 6-item transparency checklist to evaluate the sharing of code, data, and materials, as well as preregistration, sample size rationales, and the inclusion of data availability statements, comparing AI outputs with those of three human coders.
Results reveal strong correlations between human and AI overall ratings, with notable item-level concordance in assessing sharing of code and materials, preregistration and sample size rationale. However, both chatbots struggled to accurately identify inclusion of data availability statements and verify data sharing, underscoring areas where human oversight remains crucial.
Our findings demonstrate that readily accessible AI chatbots can potentially accelerate large-scale assessment of research transparency, allowing institutions to monitor open science adoption more efficiently. Yet challenges remain in automated evaluation of nuanced transparency elements requiring contextual understanding. This research contributes to open science advancement by validating assessment methodologies that make transparency evaluation more scalable and inclusive, potentially supporting broader adoption of open research practices across academia.
ASSOCIATED LINK