I'm a cognitive scientist working on the evaluation and alignment of large language models as the director of Modulo Research. We recently released a dataset of expert-annotated valid and invalid solutions involving long-form reasoning intended to facilitate scalable oversight research (preprint here). We're now finalizing a dataset of textual representations of the research processes followed by high-performing participants in an experiment involving an online research task — for use in improving LLM capability elicitations — and writing up the results of our associated experiments. You can sign up to be notified when we release future datasets.
I'm also grateful to have had the opportunity to contribute to Usman et al.'s monumental agenda paper, Foundational Challenges in Assuring Alignment and Safety of Large Language Models, and to contribute to some of Anthropic's Frontier Red Team evaluation/demo projects as part of collaborations with Hidden Variable Limited.
See my Google Scholar profile for a list of my most cited works, and the bottom of this page for recent updates that may not be reflected there.
Recognition
"Are you the same Gabriel Recchia who...?"
In a former life, I did things like:
Recent papers, preprints, and work in progress
- Recchia, G., Mangat, C., Nyachhyon, J., Sharma, M., Canavan, C., Epstein-Gross, D., and Abdulbari, M. (in prep.) Automation bias: A challenge for scalable oversight. Presents results of two sandwiching-like experiments intended to establish baselines for simple approaches.
- Recchia, G., Mangat, C. S., Li, I., & Krishnakumar, G. (2025). FindTheFlaws: Annotated errors for use in scalable oversight research. arXiv:2503.22989. Link
- Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., ... & Verbeken, B. (2025). Humanity's Last Exam. arXiv:2501.14249. Link. Co-author on account of contributing question(s) that were selected for the dataset.
- Anwar, U., Saparov, A., Rando, J., Paleka, D., Turpin, M., Hase, P., ... & Krueger, D. (2024). Foundational challenges in assuring alignment and safety of large language models. Transactions on Machine Learning Research, 2835-8856. Link
- McKenzie, I. R., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A., Prabhu, A., ... & Perez, E. (2023). Inverse scaling: When bigger isn't better. Transactions on Machine Learning Research. Link Co-author on account of submitting a winning task (e.g., identifying a task on which language model performance decreases with scale).
- Proto, R., Recchia, G., Dryhurst, S., Freeman, A.L. (2023). Do colored cells in risk matrices affect decisionāmaking and risk perception? Insights from randomized controlled studies. Risk Analysis. Link
- Recchia, G., Lawrence A. C. E., Capacchione, L., & Freeman, A.L.J. (2022). Making BRCA1 genetic test reports easier to understand through user-centered design: A randomized trial. Genetics in Medicine. Link
- Recchia, G. (2021). Teaching autoregressive language models complex tasks by demonstration. arXiv:2109.02102. Link. Early preprint demonstrating an example of capability elicitation via fine-tuning. Cited by papers out of DeepMind and Google Research.
More at Google Scholar
Collaboration
Contact me!