Skip to main content

The AI Arms Race: When Every Chatbot Looks the Same

By Matt Seitz

February 23, 2025

TL;DR Last week Elon Musk’s xAI launched a new version of its Grok model which shot to the #1 on the user ranking leaderboard.  The model was trained on its new “Colossus” supercomputer, the largest GPU cluster in the world.  US labs continue to rely on brute-force compute to develop better models, with the top 4 giants committing over $315B in CapEx for 2025.  Model quality is converging, shifting differentiation to user experience and real-world applications.  Commoditization is also raising monetization questions as firms explore advertising and bundling in addition to subscription revenue models.

Key Dynamics to Watch

  • Benchmarks Busted: Most AI model performance tests have been surpassed, making it harder to quantify model advancements.  As competition intensifies, core models are becoming commoditized, with chatbots offering increasingly similar features.  User “bake-off” comparisons, industry specific benchmarks and cost / performance metrics are emerging as success measures.
  • Massive Investments: Amazon, Google, Meta, and Microsoft have announced CapEx commitments of $315+ billion for 2025; more than the GDP of 139 countries including Finland, Portugal and Hungary and not counting spend from OpenAI’s “Project Stargate” and xAI.  
  • Revenue Questions:  Subscriptions are under pressure, this week OpenAI announced free versions of “Deep Research” tools which were unprofitable at the $200/month level.  Companies are exploring alternate revenue models like advertising and embedding into existing products.  Advertising is an obvious approach, but it’s unclear how to insert ads into a conversation flow without impacting user trust.
  • Telecom Comparison: Analysts are drawing comparisons to the 1990s telecom bust where valuations for firms like WorldCom, Global Crossing and Lucent crashed as they couldn’t recoup investments in overbuilt networks.  Firms with strong financial management like Cisco have since rebounded.
Four big tech companies, Amazon, Alphabet, Microsoft, and Meta, plan on spending more than $315 billion on Capex this year.
Source: Chartr

My Take

Christmas 2022 was the ChatGPT moment in my family. Gatherings were buzzing with questions about ChatGPT, which had launched just a month earlier. So many questions – “What’s going to happen to homework?”, “Will writing jobs disappear?”.  We had fun playing with it: “Summarize this article”, “Ask it to write a limerick about Dad!” alternating between amazement at responses and laughter at hallucinations.

Now, two years later, ChatGPT has grown to 400M active users, but the fundamental question has shifted: how will OpenAI make money?  As competition escalates, platform firms like Google, Meta, Amazon, and Microsoft can deploy and monetize AI across their existing products and user bases.  In addition, they have proprietary datasets they can harness as legal battles over content access escalate.  

AI models are improving every year
Source: Chartr

The path forward for OpenAI is straightforward: maintain your lead in user activity through branding, technical advancement and competitive pricing.  Launch an advertising product to monetize user activity without forcing a subscription model.  Oh, and transition to a for-profit company, navigate an increasingly dicey relationship with Microsoft and ensure access to content for training.  Simple, right?  

Other independent firms like Anthropic and Perplexity need to don their thinking caps. How can I carve out a profitable niche?  Claude’s conversational style is preferred by many in Silicon Valley.  Is this a path to durable subscription revenue?  Can Perplexity’s citation-based answers challenge Google’s search dominance?  

The next 18 months will be key for independent AI companies to establish sustainable business models before their capital runs out. The winners won’t be decided by who has the best technology, but by who finds a way to make money. Some may join larger tech platforms, while others will need to find niches where customers will pay premium prices. What’s clear is that technical excellence alone won’t be enough.

Animated graph showing changes in Elo scores for the top 9 AI chatbot models from August 2024 to February 2025.
Source: Flourish 

Articles

Elon Musk xAI Unveils Grok 3 to Challenge Rivals

AI Business

“According to Musk, Grok 3 boasts 10 times the processing power of its predecessor, Grok 2. This substantial increase in computational capacity enables Grok 3 to perform tasks more efficiently and handle a broader range of applications.  xAI is building a massive data center in Memphis, a ‘gigafactory of compute’ with more than 200,000 GPUs dedicated to training Grok. The release of Grok 3 marks a significant milestone for Musk’s xAI, which was created to compete with established AI companies after Musk parted ways with OpenAI in 2023.  

‘Perhaps the most interesting detail here is that xAI has gone from a standing start to being competitive very quickly. But so has DeepSeek,’ commented Alexander Harrowell, principal analyst, advanced computing for AI at Omdia.  ‘To be clear, xAI took on the considerably bigger challenge of building a new foundation LLM rather than fine-tuning an existing one like R1. The lesson, though, is that there are going to be many, many more models and once a major new feature appears – such as test-time reasoning – it will be replicated everywhere else.'”

X Post on Grok 3

Andrej Karpathy

“Summary. As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI’s strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented. Do also keep in mind the caveats – the models are stochastic and may give slightly different answers each time, and it is very early, so we’ll have to wait for a lot more evaluations over a period of the next few days/weeks. The early LM arena results look quite encouraging indeed. For now, big congrats to the xAI team, they clearly have huge velocity and momentum and I am excited to add Grok 3 to my ‘LLM council’ and hear what it thinks going forward.

Grok 3 and an accelerating AI roadmap

Interconnects

“If these AI models, and the industry writ large, are accelerating, it is important to wonder where they are accelerating toward. Most of the evals we use now to launch leading models are not that representative, in many cases they’re actually 100% out of distribution to normal life. What is the value in solving a competition math problem like AIME or so-called ‘Google Proof’ questions? Time will tell, but the case for usefulness to average users is definitely stretched.  In fact, in the case of some of the latest evaluations from the research community, it seems like evaluations are being designed more around being hard than being useful. It is a natural response to models being super powerful to try and find something to challenge them with, but it makes tracking progress and communication far harder.”