Large Language Models: A Series

Building products on LLMs and AI generally.

The Rise of Transparency

March 31, 2026

Finding signal in the firehose.

Small companies are, by default, very transparent. When there are 4 people working in a room, you have a direct line of sight on what everybody else is doing, and why. Your docs, Slack channels, and repositories are open to everybody. When the CEO has an epiphany that changes everything,...

6 min read →

Link: Maggie Appleton on Gas Town and Coding Agent Orchestration

February 13, 2026

Maggie was already perhaps the best writer on the intersection of engineering and design, but now that she’s joined Github Next, she’s also extremely keyed in to where tools for coding are going. Her piece on Gas Town and orchestrating coding agents is sharp and worth reading in full.

As the pace of software development speeds up, we’ll feel the pressure intensify in other parts of the pipeline: thoughtful design, critical thinking, user research, planning and coordination within teams, deciding what to build, and whether it’s been built well.

The most valuable tools in this new world won’t be the ones that generate the most code fastest. They’ll be the ones that help us think more clearly, plan more carefully, and keep the quality bar high while everything accelerates around us.

We’ve known for a couple years now that faster coding will mean non-coding work will increasingly be a bottleneck, and now it’s happening. Deciding what to build – and whether it’s been built well – was already one of the most important tasks on a software team.

But in the face of tools that can add anything to your product, desirable or not, this judgement becomes the core of the work.

A Box of Many Inputs

December 31, 2025

On browsers, local classifiers, and Roger Rabbit.

One of the interesting questions when designing AI-enabled software is, “What does search input mean?” This was once a simple question: if a user entered “squish” in a search box, it would of course return things that contained “squish”. Over time though, computers have improved to the point where a...

5 min read →

Link: Why is ChatGPT for Mac So… Bad?

December 5, 2025

Last week I wrote an exploration of Ben Thompson’s recent question, “Why is the ChatGPT Mac app so good?” A lot of people on the internet, it turns out, do not agree with this premise!

Many folks have been having problems with ⌘C not copying text. Hacker News sees the app as “not good at all”, to the point that my post about it being better than the alternatives was flagged off the site. X doesn’t like it either.

Beyond the bugs I mentioned in last week’s post, I’ve recently been plagued with a ChatGPT Mac bug of my own, where every time I start a new chat, it will pre-fill the text field with the first input I used last time I started a new chat on Mac.

All of this led me to an informative post by one of OpenAI’s Mac developers, Stephan Casas:

nearly everyone who works on the ChatGPT macOS app has been stretched thin, and hard at work building Atlas.

[…]

i’m thankful that our users appreciate our decision to develop a native app just as much as i’m thankful for the heightened expectations they hold because we did so

Apparently he merged a fix this week for the copy-paste bug that has been plaguing many folks, which is promising.

Something implied in last week’s article that’s worth saying explicitly: although many good Mac apps are native, being native is neither necessary nor sufficient for being a great app.

While OpenAI is investing more in desktop apps than any other model labs, they have much to do before they can transcend “better than the alternatives” and achieve “great.”

Why is ChatGPT for Mac So Good?

November 30, 2025

Claude, Copilot, and making a good desktop app.

This year, even as Anthropic, Google, and others have challenged OpenAI’s model performance crown, ChatGPT’s lead as an end-user product has only solidified. On the Dithering podcast last week (paywalled), Ben Thompson called out an aspect of why this is: I need someone to write the definitive article on why...

5 min read →

Spending Too Much Money on a Coding Agent

June 30, 2025

On making use of large thinking models.

For a year, I’d been coding almost every day with Cursor and Claude Sonnet. Anthropic’s 3.5 and 3.7 Sonnet each rightly earned their dominant place on the programming model charts: they were the least-bad coding models yet. In the earliest days of LLMs, there was tremendous interest in ever-larger model...

7 min read →

Post-Chat UI

April 30, 2025

How LLMs are making traditional apps feel broken.

First, there was the terminal. You typed text. Scrolling text came back. It was: Powerful Flexible for power users Easy to program But also, since it was centered around a blank input field, it was: Daunting Unintuitive Bad for selecting and manipulating stuff Fortunately, in the intervening decades our user...

8 min read →

The Era of Tab Continuation

January 31, 2025

Press tab to complete your work.

If you’ve used a code editor before, you’ve seen tab completion. When you start typing a keyword or phrase, the editor might offer to complete the rest of what you’re typing: If I press tab, Sublime Text will complete “InfoRow” for me. Neat. An analogous thing happens in your browser:...

4 min read →

It’s Good for Apple, and Okay for You

November 30, 2024

Apple Intelligence, so far.

The first big wave of Apple Intelligence features are arriving shortly, with iOS 18.2. For the last month, a beta has been available, offering a peek into this new AI-powered future. I’ve been curious what Apple’s ML teams have been cooking, especially given the industry-leading security and privacy commitments they’ve...

9 min read →

Testing the Untestable

October 31, 2024

The four phases of automated evals for LLM-powered features.

I gave a talk version of this article at the first Infer meetup earlier this month. Let’s say you want to build an LLM-powered app. With a modern model and common-sense prompting, it’s easy to get a demo going with reasonable results. Of course, before going live, you test various...

9 min read →

Link: Infer, an AI Eng Meetup in Vancouver

October 2, 2024

Next week, we’ll be kicking off a new speaker series in Vancouver called Infer. The goal of the meetup is to bring together folks who are doing great AI engineering work, so we can learn from one another.

The format will be familiar to folks that have attended my previous meetups: two speakers, often one of whom will be visiting from out of town, with time to chat afterward. Events will happen roughly every two months, when we have compelling topics lined up.

If you’re building LLM-powered apps in Vancouver, you can subscribe to our event on Luma. There are still a few spots open for our first “beta” event on October 9th, and we’ll be hosting another during NeurIPS in December.

There’s something electric about getting smart people who are working in a rapidly-changing field in a room together. I recommend it.

Starting Forestwalk

August 16, 2024

A wild startup appears.

Last month, I started full-time on a new startup. It’s early days, but we’re having a lot of fun. A startup, fundamentally, is a search for a repeatable, scalable business model. You rapidly try things, run experiments, learn, and iterate your theories about how to build a useful product that...

2 min read →

Pushing the Frontier

July 31, 2024

If – and when – GPT-5 might eat your lunch

Lately I’ve been working with a lot of teams and founders that are building products on top of LLMs. It’s a lot of fun! To be an AI product engineer today is to constantly ask new questions that impact how you build products. Questions like: “Is there a way we...

5 min read →

LLMs Aren’t Just “Trained On the Internet” Anymore

May 31, 2024

A path to continued model improvement.

I often see a misconception when people try to reason about the capability of LLMs, and in particular how much future improvement to expect. It’s frequently said that that LLMs are “trained on the internet,” and so they’ll always be bad at producing content that is rare on the web....

5 min read →

From Chatbot to Everything Engine

January 10, 2024

A curious design constraint signals an ambitious future.

This morning, OpenAI launched the GPT Store: a simple way to browse and distribute customized versions of ChatGPT. GPTs – awkwardly named to solidify OpenAI’s claim to the trademark “GPT” – consist of a custom ChatGPT prompt, an icon, and optionally some reference data or hookups to external APIs. In...

5 min read →

Going Way Beyond ChatGPT

June 30, 2023

Techniques for building products on LLMs today.

Modern instruction-tuned language models, or LLMs, are the latest tool in software engineers’ toolboxes. Joining classics like databases, networking, hypertext, and async web applications, we now have a new enabling technology that seems wickedly powerful, but whose best applications aren’t yet clear. ChatGPT lets you poke at those possibilities. You...

11 min read →

32K of Context in Your Pocket

March 15, 2023

A wild large-context LLM appears.

One month ago, I wrote about on the limits of 4K-token AI models, and the wild capabilities and costs that large-context language models may one day have. Today, OpenAI not only debuted GPT-4 with a doubly large 8K token limit, but demoed and began trials of a version that supports...

2 min read →

A 175-Billion-Parameter Goldfish

February 16, 2023

The problem and opportunity of language model context.

It has been a wild week in AI. By now, we’re getting used to the plot twist that rather than the cold Spock-like AIs of science fiction, large language models tend to be charismatic fabulists with a tenuous understanding of facts. Into that environment, last week Microsoft launched a Bing...

13 min read →