Benchmarks AI

WAKKER WORDEN!!!! (LinkedIn) Bijdrage van Marcel Mutsaarts 21.12.2024

- Prachtige post waarin ik me helemaal kan vinden..
- Voor de 'AI-nerds' onder ons:
  - 🎯 o3 model van OpenAI scores:
    - 87.5% on ARC-AGI (the human threshold is 85%)
    - 25.2% of EpochAI's Frontier Math problems (when no other model breaks 2%)
    - 96.7% on AIME 2024 (missed one question)
    - 71.7% on software engineer (o1 was 48.9)
    - 87.7% on PhD-level science (above human expert scores)
https://sites.google.com/view/eurekajohn/onderwerpen/ai-benchmarks

Comprehensive Report: OpenAI Achieves AGI with the O3 Model – A Historic Milestone in Artificial Intelligence 20.12.2024

🚀 The AI Arms Race: Breaking Down the Battle for Top Large Language Models (LLMs) in 2024 21.09.2024
- Useful Links:
  - Learn more about Elo ratings and Chatbot Arena: https://lmsys.org
  - OpenAI vs Google – Who's Winning?: https://huggingface.co/.../lmsys/chatbot-arena-leaderboard
  - Google’s AI architecture innovations: https://artificialanalysis.ai
  - Global AI market growth statistics: https://huggingface.co/collections/open-llm-leaderboard
  - Details on xAI’s rise and Gemini’s impact: https://lmsys.org

https://lifearchitect.ai/iq-testing-ai/ - Dr Alan D. Thompson *** het beste, actuele, overzicht++ voor mij ***

Artikelen

AI Inference Competition Heats Up First MLPerf benchmarks for Nvidia Blackwell, AMD, Google, Untether AIDina Genkina 28 Aug 2024
Claude.ai benchmarks 21 jun 2024

AI Benchmarks

https://datatunnel.io/product/phi-3-mini-128k-2/

Chatbot Arena Leaderboard - a Hugging Face Space by lmsys

2024-09-17

Rank* (UB)

Model

Arena Score

95% CI

Votes

Organization

License

Knowledge Cutoff

1

1355

+12/-11

2991

OpenAI

Proprietary

2023/10

AI Model & API Providers Analysis | Artificial AnalysisComparison and analysis of AI models and API hosting providers. Independent benchmarks across key performance metrics including quality, price, output speed & latency.

Update 22.12.2024

Page updated

Google Sites

Report abuse