Lorenzo Pacchiardi

prof_pic.jpg

Assistant Research Professor, University of Cambridge

I am an Assistant Research Professor at the Leverhulme Centre for the Future of Intelligence at the University of Cambridge. I lead a research project (funded by Open Philanthropy) on developing a benchmark for measuring the ability of LLMs to perform data science tasks. I am more broadly interested in AI evaluation, particularly in predictability and cognitive evaluation, and I closely collaborate with Prof José Hernández-Orallo and Prof Lucy Cheke. I contribute to the AI evaluation newsletter.

I am deeply familiar with EU AI policy (having been involved in several initiatives), and am one of the co-founders of the Italian AI policy think tank CePTE. I also collaborate with The Unjournal to make impactful research more rigorous, and I co-founded AcademicJobsItaly.com to make the Italian academic job market more accessible.

I previously worked on detecting lying in large language models with Dr Owain Evans (through the MATS programme) and on technical standards for AI for the EU AI Act at the Future of Life Institute. I have also shortly advised RAND on AI evaluation.

I obtained a PhD in Statistics and Machine Learning at Oxford, during which I worked on Bayesian simulation-based inference, generative models and probabilistic forecasting (with applications to meteorology). My supervisors were Prof. Ritabrata Dutta (Uni. Warwick) and Prof. Geoff Nicholls (Uni. Oxford).

Before my PhD studies, I obtained a Bachelor’s degree in Physical Engineering from Politecnico di Torino (Italy) and an MSc in Physics of Complex Systems from Politecnico di Torino and Université Paris-Sud, France. I did my MSc thesis at LightOn, a machine learning startup in Paris.

news

May 16, 2025 Our survey on AI evaluation was accepted at IJCAI 2025 survey track and our PredictaBoard was accepted at ACL 2025 Findings. :tada:
Mar 11, 2025 Our new preprint shows how to extract the most predictive and explanatory power from AI benchmarks by automatically annotating the demands posed by each question. Check it out!
Feb 21, 2025 Two new arXiv preprints:
one surveying AI evaluation and identifying six main paradigms, the other one introducing a benchmark for jointly evaluating the performance of LLMs and its predictability on individual instances.
Jan 15, 2025 Our survey paper “Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents” has been accepted and published in Transactions on Machine Learning Research! 🎉
Oct 15, 2024 We have two new preprints on arXiv! One on predicting the performance of LLMs on individual instances, the other one on predicting the answers of LLM benchmarks from simple features.

selected publications

  1. arXiv
    Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies
    Miles Brundage, Noemi Dreksler, Aidan Homewood, Sean McGregor, Patricia Paskov, Conrad Stosz, Girish Sastry, A. Feder Cooper, George Balston, Steven Adler, Stephen Casper, Markus Anderljung, Grace Werner, Soren Mindermann, Vasilios Mavroudis, Ben Bucknall, Charlotte Stix, Jonas Freund, Lorenzo Pacchiardi, Jose Hernandez-Orallo, Matteo Pistillo, Michael Chen, Chris Painter, Dean W. Ball, Cullen O’Keefe, Gabriel Weil, Ben Harack, Graeme Finley, Ryan Hassan, Scott Emmons, Charles Foster, Anka Reuel, Bri Treece, Yoshua Bengio, Daniel Reti, Rishi Bommasani, Cristian Trout, Ali Shahin Shamsabadi, Rajiv Dattani, Adrian Weller, Robert Trager, Jaime Sevilla, Lauren Wagner, Lisa Soder, Ketan Ramakrishnan, Henry Papadatos, Malcolm Murray, and Ryan Tovcimak
    2026
  2. NeurIPS Workshop
    A Framework for the Categorisation of General-Purpose AI Models under the EU AI Act
    Lorenzo Pacchiardi, John Burden, Fernando Martı́nez-Plumed, Jose Hernandez-Orallo, Emilia Gomez, and David Fernández-Llorca
    In NeurIPS 2025 Workshop on Regulatable ML , 2025
  3. TMLR
    Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents
    Irene Testini*, José Hernández-Orallo, and Lorenzo Pacchiardi*
    Transactions on Machine Learning Research, 2025
  4. arXiv
    General Scales Unlock AI Evaluation with Explanatory and Predictive Power
    Lexin Zhou, Lorenzo Pacchiardi, Fernando Martínez-Plumed, Katherine M. Collins, Yael Moros-Daval, Seraphina Zhang, Qinlin Zhao, Yitian Huang, Luning Sun, Jonathan E. Prunty, Zongqian Li, Pablo Sánchez-García, Kexin Jiang Chen, Pablo A. M. Casares, Jiyun Zu, John Burden, Behzad Mehrbakhsh, David Stillwell, Manuel Cebrian, Jindong Wang, Peter Henderson, Sherry Tongshuang Wu, Patrick C. Kyllonen, Lucy Cheke, Xing Xie, and José Hernández-Orallo
    arXiv preprint arXiv:2503.06378, 2025
  5. ACL Findings
    PredictaBoard: Benchmarking LLM Score Predictability
    Lorenzo Pacchiardi, Konstantinos Voudouris, Ben Slater, Fernando Martínez-Plumed, José Hernández-Orallo, Lexin Zhou, and Wout Schellaert
    Findings of the Association for Computational Linguistics: ACL 2025, 2025
  6. IJCAI
    Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
    John Burden*, Marko Tešić*, Lorenzo Pacchiardi*, and José Hernández-Orallo
    IJCAI 2025 Survey Track, 2025
  7. ICLR 2024
    How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
    Lorenzo Pacchiardi*, Alex J Chan*, Sören Mindermann, Ilan Moscovitz, Alexa Y Pan, Yarin Gal, Owain Evans, and Jan Brauner
    The Twelfth International Conference on Learning Representations, 2024