Publication
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond
Nils Dycke; Matej Zecevic; Ilia Kuznetsov; Beatrix Suess; Kristian Kersting; Iryna Gurevych
In: Wanxiang Che; Joyce Nabende; Ekaterina Shutova; Mohammad Taher Pilehvar (Hrsg.). Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), Pages 22687-22727, Association for Computational Linguistics, 2025.
Abstract
Critical text assessment is at the core of many
expert activities, such as fact-checking, peer
review, and essay grading. Yet, existing work
treats critical text assessment as a black box
problem, limiting interpretability and human-
AI collaboration. To close this gap, we intro-
duce Structured Reasoning In Critical Text
Assessment (STRICTA), a novel specification
framework to model text assessment as an ex-
plicit, step-wise reasoning process. STRICTA
breaks down the assessment into a graph of
interconnected reasoning steps drawing on
causality theory (Pearl, 1995). This graph
is populated based on expert interaction data
and used to study the assessment process and
facilitate human-AI collaboration. We for-
mally define STRICTA and apply it in a study
on biomedical paper assessment, resulting in
a dataset of over 4000 reasoning steps from
roughly 40 biomedical experts on more than
20 papers. We use this dataset to empirically
study expert reasoning in critical text assess-
ment, and investigate if LLMs are able to imi-
tate and support experts within these workflows.
The resulting tools and datasets pave the way
for studying collaborative expert-AI reasoning
in text assessment, in peer review and beyond.
