Evaluation of OpenAI O1: Opportunities and Challenges of AGI arxiv.org 3 points by nopinsight 5 hours ago
nopinsight 5 hours ago The paper introduces AGI-Benchmark 1.0."AGI-Benchmark 1.0 is designed to assess a model’s ability to tackle intricate, multi-step reasoning problems across a diverse set of domains."See pp 13-14 for the list of tasks in 27 categories. It's diverse indeed.
The paper introduces AGI-Benchmark 1.0.
"AGI-Benchmark 1.0 is designed to assess a model’s ability to tackle intricate, multi-step reasoning problems across a diverse set of domains."
See pp 13-14 for the list of tasks in 27 categories. It's diverse indeed.
[dead]