LLM-based chatbots’ capabilities have been advancing every month. These improvements are mostly measured by benchmarks like MMLU, HumanEval, and MATH (e.g. sonnet 3.5, gpt-4o). However, as these measures get more and more saturated, is user experience increasing in proportion to these scores? If we envision a future
What's Missing From LLM Chatbots: A Sense of Purpose
References
This article was originally published at The Gradient. For the full piece, read the original article.
Discussion
Sign in to comment. Your account must be at least 1 day old.