Announcement_21

February 21, 2025

2025

Two new arXiv preprints:
one surveying AI evaluation and identifying six main paradigms, the other one introducing a benchmark for jointly evaluating the performance of LLMs and its predictability on individual instances.