<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[James Peterson's Newsletter]]></title><description><![CDATA[My personal Substack]]></description><link>https://www.jamespeterson.blog</link><image><url>https://substackcdn.com/image/fetch/$s_!bgAw!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f7eb810-ae59-453e-a17f-7c4281b0de02_400x400.png</url><title>James Peterson&apos;s Newsletter</title><link>https://www.jamespeterson.blog</link></image><generator>Substack</generator><lastBuildDate>Tue, 21 Apr 2026 20:44:23 GMT</lastBuildDate><atom:link href="https://www.jamespeterson.blog/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[James Peterson]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[jamesnotes@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[jamesnotes@substack.com]]></itunes:email><itunes:name><![CDATA[James Peterson]]></itunes:name></itunes:owner><itunes:author><![CDATA[James Peterson]]></itunes:author><googleplay:owner><![CDATA[jamesnotes@substack.com]]></googleplay:owner><googleplay:email><![CDATA[jamesnotes@substack.com]]></googleplay:email><googleplay:author><![CDATA[James Peterson]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Irony of the LLM treadmill]]></title><description><![CDATA[A strange new burden has crept into software teams: the LLM treadmill. Many models retire within months, so developers now continuously migrate features they only just shipped.]]></description><link>https://www.jamespeterson.blog/p/the-irony-of-the-llm-treadmill</link><guid isPermaLink="false">https://www.jamespeterson.blog/p/the-irony-of-the-llm-treadmill</guid><dc:creator><![CDATA[James Peterson]]></dc:creator><pubDate>Wed, 29 Oct 2025 14:08:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bgAw!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f7eb810-ae59-453e-a17f-7c4281b0de02_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A strange new burden has crept into software teams: the <em>LLM treadmill</em>. Many models retire within months, so developers now continuously migrate features they only just shipped.</p><p>Our team feels this new pain sharply. We support many LLM-based features, together measuring massive token volumes each month. Though I suspect even teams with small LLM dependencies feel this frustration.</p><p><strong>Example: A migration affected by the &#8220;jagged frontier&#8221;</strong></p><p>You <strong>can</strong> treat these migrations like any other software version bump. But users dislike adapting to change that is only <em>mostly</em> better. And since LLMs are weird and <a href="https://youtu.be/b6Doq2fz81U?si=nFgWBAei-ymkw4Zi&amp;t=1129">lately their upgrades jagged</a>, simply bumping the version can be quite messy.</p><p>Consider a common scenario: a feature in your product is powered by a clever, &#8220;vibe-based&#8221; prompt. It worked surprisingly well on a popular model, so you shipped it and iterated on it when users gave feedback.</p><p>Then came the model&#8217;s deprecation notice. Time to migrate. When you migrated the same feature last year, the version bump was a clear and easy win. Hopefully again!</p><p>Only this time the new model makes this feature feel different. It&#8217;s sometimes better, sometimes worse. The prior model had a special knack for the task. You worry about your users adapting to change.</p><p>This pushes you to graduate. You formalize the task, annotate high-quality examples, and fine-tune a replacement model. You now have a more robust solution with much-improved quality, all because the treadmill forced you to build it right.</p><p><strong>Was all that necessary?</strong></p><p>Migrations are risky opportunities.</p><p>A recent example is ChatGPT&#8217;s move to GPT-5. Chat became smarter, but lacked 4o&#8217;s personality. Many users were unhappy and wanted it rolled back<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. It&#8217;ll take another migration to fix properly<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.</p><p>So what should you do when your vibe-prompted LLM is to be sunset?</p><ul><li><p>If a newer model makes a mediocre feature feel great, take it quickly. </p></li><li><p>Otherwise, move beyond feel. Really break down what people like in your feature.</p></li></ul><p>And this takes serious effort &#8230; just to migrate.</p><p>But it pays for itself. Once the nuances of &#8220;good&#8221; are measurable, you can make your feature even better. In the above scenario, the new <strong>smarter</strong> model is often also <strong>10x cheaper</strong> and <strong>2x faster</strong>. And it&#8217;ll be easier to migrate next time, as you already have our nicely annotated dataset.</p><p>My team does this often. We ship a v1 with prompting. A model gets deprecated. We nail down &#8220;good&#8221; &#8594; measure it &#8594; kick off an optimization loop. </p><p>We try new prompts, alternative models, and sometimes go tune our own. We usually end up faster, cheaper, sturdier, and <strong>consistently</strong> higher quality than the vendor&#8217;s version bump that forced the whole process. </p><p>Awkwardly, that often churns spend from them.</p><p>And that&#8217;s the <em>irony</em> of the LLM treadmill: short model lifespans force even happy API customers to keep reconsidering. And the better customer they are (the more features they&#8217;ve built and maintain), the more forceful is this push away.</p><p>Seems like a hard way to do business.</p><h3>OpenAI&#8217;s push is gentler</h3><p>Each big AI lab<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> forces a different &#8220;treadmill&#8221;, with OpenAI&#8217;s so far offering the most self-determination.</p><p>Google&#8217;s Gemini models retire <a href="https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions">one year from release</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. New releases are often priced quite differently (&#8597;)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a></p><p>Anthropic&#8217;s retirements can occur <a href="https://docs.claude.com/en/docs/about-claude/model-deprecations">with just 60 days notice</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>, after a year from release. Pricing has been flat since Claude 3&#8217;s release (except for Haiku &#8593;).</p><p>And OpenAI&#8217;s treadmill is more developer friendly:</p><ul><li><p>Models are supported for longer (still supporting even <a href="https://platform.openai.com/docs/deprecations">GPT-3.5-Turbo and Davinci</a>).</p></li><li><p>Upgrades often arrive with a <strong>lower</strong> price &#8595;.</p></li></ul><h3>Labs&#8217; diverging focus</h3><p>The contrast between OpenAI and Anthropic becomes clearer when you look at how they position their models.</p><p>At their recent DevDay, OpenAI showcased <a href="https://x.com/deedydas/status/1975582169126019223">a long list of top customers</a>. One thing I observed is how varied the top customer list is. All kinds of unicorns &#8212; consumer, business platforms, productivity, developer tools, and more &#8212; seem to be heavily using OpenAI&#8217;s API. </p><p>This was consistent with how OpenAI <a href="https://openai.com/index/introducing-gpt-5/">positioned GPT-5</a> upon release; as a model intended to tackle a broad range of tasks.</p><p>Anthropic, in contrast, appears to be specializing. In their Claude 3 <a href="https://www.anthropic.com/news/claude-3-family">announcement</a>, Anthropic touted a wide array of uses. By their Claude Sonnet 4.5  <a href="https://www.anthropic.com/news/claude-sonnet-4-5">release</a>, they more narrowly positioned Claude as the best &#8220;coding model&#8221;. And code tools are reportedly <a href="https://www.theinformation.com/articles/anthropic-revenue-pace-nears-5-billion-run-mega-round?utm_source=chatgpt.com">an increasingly large part of their revenue</a>. </p><p>I think it&#8217;s not a coincidence that the friendlier-treadmill vendor has kept a wider base of software built on them. I also wonder if this is self-reinforcing in how the big labs iterate on their product-market fit.</p><h3>Where I think this is headed</h3><p>I think software teams will keep following their incentives. If model migrations cost more than they deliver, those teams will grow tired. They&#8217;ll reclaim control over quality and their roadmap prioritization, by either self-hosting models or moving to labs with friendlier policies.</p><p>That said, I&#8217;m also optimistic that the big AI labs will see this and fix the driver. I hope they&#8217;ll commit to long-term support of their models. The pain today might just be a growing pain of a new industry. The LLM treadmill may, in time, disappear.</p><p>The exception to this post is code tools. That crowd is seeing pure upside from each new model, so Anthropic&#8217;s focused bet on code will likely continue to compound. But for the rest of us building AI-powered app features, navigating the treadmill has become a very real and pressing problem.</p><p><em><strong>Find this work interesting? <a href="https://jobs.ashbyhq.com/fathom.video">We&#8217;re hiring</a>!</strong></em></p><p><em><strong>You can also <a href="https://x.com/hellofromjames">follow me on X</a>.</strong></em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.jamespeterson.blog/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading my post! Let me know if you&#8217;d like to receive new posts:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>As covered by e.g. <a href="https://www.nytimes.com/2025/08/19/business/chatgpt-gpt-5-backlash-openai.html">the NYT</a> </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://x.com/sama/status/1978129344598827128">OpenAI has alluded to a plan to fix this</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Not limited to the closed labs. E.g. <a href="https://inference-docs.cerebras.ai/support/deprecation">last week</a> a unicorn open-model inference provider gave just weeks notice for three deprecations. One of those was a &#8220;migrate to&#8221; suggestion just 85 days prior. Open models can be moved between vendors though.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>We&#8217;ve experienced capacity issues in the months ahead of model retirement, so out of an abundance of caution we now treat these as 10-month releases rather than 12-month.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>E.g. the leap from Gemini Pro 1.0 &#8594; 1.5, or the leap from Flash 1.5 &#8594; 2 &#8594; 2.5.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>As experienced with their popular Sonnet 3.5 models.</p></div></div>]]></content:encoded></item></channel></rss>