Gabriel Recchia

I'm a cognitive scientist working on the evaluation and alignment of large language models as the director of Modulo Research. We recently released a dataset of expert-annotated valid and invalid solutions involving long-form reasoning intended to facilitate scalable oversight research, and had two papers related to scalable oversight accepted to AAAI 2026 [1, 2]. You can sign up to be notified when we release future datasets.

I'm also grateful to have had the opportunity to contribute to Usman et al.'s monumental agenda paper, Foundational Challenges in Assuring Alignment and Safety of Large Language Models, and to contribute to some of a leading frontier lab's Frontier Red Team evaluation/demo projects as part of collaborations with Hidden Variable Limited.

See my Google Scholar profile for a list of my most cited works, and the bottom of this page for recent updates that may not be reflected there.

Recognition

My sole-authored preprint "Teaching autoregressive language models complex tasks by demonstration" has been cited by papers out of Google Brain and DeepMind and was discussed on Machine Learning Street Talk
One of four winners of the AI Impacts essay competition on the Automation of Wisdom and Philosophy (out of 90 entries)
Third Prize recipient in the Inverse Scaling Prize competition, which focused on identifying tasks where larger language models exhibit decreased performance
Co-authored "Risk perceptions of COVID-19 around the world", referenced by U.S. News, The Telegraph, The Daily Mail, BBC Future and 130 other outlets

"Are you the same Gabriel Recchia who...?"

In a former life, I did things like:

leading on user testing research/evaluation of patient-friendly genetic reports and the widely used prognostic tool Predict: Breast Cancer at the University of Cambridge's Winton Centre for Risk and Evidence Communication
investigating capabilities, properties, and applications of distributional models trained on lots of text
conducted various studies of human semantic memory and how risk is communicated, perceived, and predicted
writing an alphabet book about exoplanets (sadly uncalibrated to the reading level of any child young enough to still be interested in alphabet books)

Recent papers, preprints, and work in progress

Recchia, G., Mangat, C. S., Nyachhyon, J., Sharma, M., Canavan, C., Epstein-Gross, D., & Abdulbari, M. (2026). Confirmation Bias: A Challenge for Scalable Oversight. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37877-37886. https://doi.org/10.1609/aaai.v40i44.41124. Presents results of two sandwiching-like experiments intended to establish baselines for simple approaches. Link to extended arXiv version.
Recchia, G., Mangat, C. S., Li, I., & Krishnakumar, G. (2026). FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37867-37876. https://doi.org/10.1609/aaai.v40i44.41123. Link to extended arXiv version.
Schoenegger, P., Salvi, F., Liu, J., Nan, X., Debnath, R., Fasolo, B., Leivada, E., Recchia, G., Günther, F., et al. (2025). Large language models are more persuasive than incentivized human persuaders. Analysis team lead. arXiv:2505.09662. Under review in PNAS Nexus. Link
Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., ... & Verbeken, B. (2025). Humanity's Last Exam. arXiv:2501.14249. Link. Co-author on account of contributing question(s) that were selected for the dataset.
Anwar, U., Saparov, A., Rando, J., Paleka, D., Turpin, M., Hase, P., ... & Krueger, D. (2024). Foundational challenges in assuring alignment and safety of large language models. Transactions on Machine Learning Research, 2835-8856. Link
McKenzie, I. R., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A., Prabhu, A., ... & Perez, E. (2023). Inverse scaling: When bigger isn't better. Transactions on Machine Learning Research. Link Co-author on account of submitting a winning task (e.g., identifying a task on which language model performance decreases with scale).
Proto, R., Recchia, G., Dryhurst, S., Freeman, A.L. (2023). Do colored cells in risk matrices affect decision‐making and risk perception? Insights from randomized controlled studies. Risk Analysis. Link
Recchia, G., Lawrence A. C. E., Capacchione, L., & Freeman, A.L.J. (2022). Making BRCA1 genetic test reports easier to understand through user-centered design: A randomized trial. Genetics in Medicine. Link
Recchia, G. (2021). Teaching autoregressive language models complex tasks by demonstration. arXiv:2109.02102. Link. Early preprint demonstrating an example of capability elicitation via fine-tuning. Cited by papers out of DeepMind and Google Research.

Google Scholar

Collaboration

Contact me!

Gabriel Recchia

Director, Modulo Research

Recognition

"Are you the same Gabriel Recchia who...?"

Recent papers, preprints, and work in progress

Collaboration