I'm a cognitive scientist working on the evaluation and alignment of large language models as the director of Modulo Research. We recently released a dataset of expert-annotated valid and invalid solutions involving long-form reasoning intended to facilitate scalable oversight research, and had two papers related to scalable oversight accepted to AAAI 2026 [1, 2]. You can sign up to be notified when we release future datasets.
I'm also grateful to have had the opportunity to contribute to Usman et al.'s monumental agenda paper, Foundational Challenges in Assuring Alignment and Safety of Large Language Models, and to contribute to some of a leading frontier lab's Frontier Red Team evaluation/demo projects as part of collaborations with Hidden Variable Limited.
See my Google Scholar profile for a list of my most cited works, and the bottom of this page for recent updates that may not be reflected there.
Recognition
"Are you the same Gabriel Recchia who...?"
In a former life, I did things like:
Recent papers, preprints, and work in progress
- Recchia, G., Mangat, C. S., Nyachhyon, J., Sharma, M., Canavan, C., Epstein-Gross, D., & Abdulbari, M. (2026). Confirmation Bias: A Challenge for Scalable Oversight. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37877-37886. https://doi.org/10.1609/aaai.v40i44.41124. Presents results of two sandwiching-like experiments intended to establish baselines for simple approaches. Link to extended arXiv version.
- Recchia, G., Mangat, C. S., Li, I., & Krishnakumar, G. (2026). FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37867-37876. https://doi.org/10.1609/aaai.v40i44.41123. Link to extended arXiv version.
- Schoenegger, P., Salvi, F., Liu, J., Nan, X., Debnath, R., Fasolo, B., Leivada, E., Recchia, G., Günther, F., et al. (2025). Large language models are more persuasive than incentivized human persuaders. Analysis team lead. arXiv:2505.09662. Under review in PNAS Nexus. Link
- Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., ... & Verbeken, B. (2025). Humanity's Last Exam. arXiv:2501.14249. Link. Co-author on account of contributing question(s) that were selected for the dataset.
- Anwar, U., Saparov, A., Rando, J., Paleka, D., Turpin, M., Hase, P., ... & Krueger, D. (2024). Foundational challenges in assuring alignment and safety of large language models. Transactions on Machine Learning Research, 2835-8856. Link
- McKenzie, I. R., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A., Prabhu, A., ... & Perez, E. (2023). Inverse scaling: When bigger isn't better. Transactions on Machine Learning Research. Link Co-author on account of submitting a winning task (e.g., identifying a task on which language model performance decreases with scale).
- Proto, R., Recchia, G., Dryhurst, S., Freeman, A.L. (2023). Do colored cells in risk matrices affect decision‐making and risk perception? Insights from randomized controlled studies. Risk Analysis. Link
- Recchia, G., Lawrence A. C. E., Capacchione, L., & Freeman, A.L.J. (2022). Making BRCA1 genetic test reports easier to understand through user-centered design: A randomized trial. Genetics in Medicine. Link
- Recchia, G. (2021). Teaching autoregressive language models complex tasks by demonstration. arXiv:2109.02102. Link. Early preprint demonstrating an example of capability elicitation via fine-tuning. Cited by papers out of DeepMind and Google Research.
More at Google Scholar
Collaboration
Contact me!