A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out? Is it for real, and are AI agents about to grab your mouse and shake your cursor? Weights and Biases' Weave: https://wandb.me/ai_explained Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the hype. AI Insiders: https://www.patreon.com/AIExplained Chapters: 00:00 – Introduction 00:57 – Claude 3.5 Sonnet (New) Paper 02:06 – Demo 02:58 – OSWorld 04:29 – Benchmarks compared + OpenAI Response 08:30 – Tau-Bench 13:09 – SimpleBench Results 17:05 – Yellowstone Detour 17:29 – Runway Act-One 18:44 – HeyGen Interactive Avatars + Demo 21:06 – NotebookLM Update New Claude: https://www.anthropic.com/news/3-5-models-and-computer-use https://www.anthropic.com/research/developing-computer-use Paper: https://assets.anthropic.com/m/1cd9d098ac3e6467/original/Claude-3-Model-Card-October-Addendum.pdf Demo Diversion: https://x.com/AnthropicAI/status/1848742761278611504 https://www.youtube.com/watch?v=jqx18KgIzAE o1 Comparison: https://openai.com/index/learning-to-reason-with-llms/ https://www.swebench.com/ Tau Bench: https://arxiv.org/pdf/2406.12045 OSWorld: https://arxiv.org/pdf/2404.07972 GSM Reasoning: https://arxiv.org/pdf/2410.05229 Sierra Valuation: https://www.theinformation.com/articles/bret-taylors-ai-agent-startup-nears-deal-that-could-value-it-at-over-4-billion?rc=sy0ihq Claude Impressions: https://x.com/skirano/status/1848750867245133982 o1 System Card: https://assets.ctfassets.net/kftzwdyauwt9/67qJD51Aur3eIc96iOfeOP/71551c3d223cd97e591aa89567306912/o1_system_card.pdf NotebookLM: https://notebooklm.google/ Runway Act-One: https://runwayml.com/research/introducing-act-one HeyGen Zoom: https://labs.heygen.com/interactive-avatar/vicky Ministral Comparison: https://x.com/armandjoulin/status/1846581336909230255 My Coursera Course - The 8 Most Controversial Terms in AI: https://imp.i384100.net/m57g3M Non-hype Newsletter: https://signaltonoise.beehiiv.com/ I use Descript to edit my videos (no pauses or filler words!): https://get.descript.com/ldgxfuj2bhnb Many people expense AI Insiders for work. Feel free to use the Template in the 'About Section' of my Patreon. https://www.patreon.com/AIExplained