Posted on

Table of Contents

control room

In the initial Anthropic-DoD/W standoff, both sides have good reasons to set hard lines: for Anthropic, its own position; for the DoD, a refusal to let a third party dictate its terms of use. Sam Altman's position, on the other hand, and the timing of the OpenAI-DoD deal after Anthropic was designated a supply chain risk, are not good reasons, and neither is the weaponization of the supply chain risk designation (Just Security).

So I cancelled my OpenAI subscription. I already have Claude, but I decided to give Gemini a chance since I'm also building on top of Pi: shittycodingagent.ai.

Google's product problem

My first attempt at getting Gemini through Google One was unsuccessful. It turns out my Google account is considered to be in a different country and I just can't change the damn country. I followed the official documentation, waited for the changes to propagate, and made sure I only had a payment method in the correct country.

The solution? A brand new Google account, great. At that point I figured I could just use login in Pi and everything would be easy, as it is for Claude Code and Codex. Wrong. You need to set up a GOOGLE_CLOUD_PROJECT environment variable. So now it's time to create a new project, set the correct IAM permissions, and wait for those changes to propagate too. I'm still waiting. I'll report back.

This isn't complicated, it's just very frustrating. It all feels like unnecessary friction on the user side. It's even more frustrating because other providers have a much more streamlined process. I've heard a lot of good feedback on Gemini from people I trust, so I do want to give it a proper shot. I also need a second frontier provider, so my hands are somewhat tied.

The promise of Pi

Life has been a bit bizarre lately. It feels like we're at the edge of something radically different. Coding agents are already good enough to fundamentally change the way we work. I've been able to build some incredibly useful harness on top of Claude Code to help with a lot of my tasks. The big worry is walled gardens. The ability to have a centralized coding assistant that can selectively use frontier models across providers is very important. You can do that in Cursor, but the terminal is comfier.

Enter Pi. Its promise is to provide a minimal harness you can build on top of. That's a very convincing pitch coming from an opinionated dev (Mario Zechner). It reminds me of suckless. These days I don't really want to spend time maintaining my own fork of a window manager or terminal. The promise of building my own coding agent, on the other hand, seems like a much better long-term investment. It's more akin to learning vim in college and spending hours building the best dotfiles known to man.

Driving Pi

Pi is opinionated and doesn't come with all the batteries included. The first thing I need is a set of extensions for URL fetching, downloading PDFs, and extracting text and images from them. Pi can do that itself, which is the point.

What drives me to Pi is the freedom to switch between models and have much tighter control over context management. Context bloat is real, and MCPs are often the worst offenders. A skill-first, CLI-first approach is already what I do in Claude Code, but now I have much better control over the hierarchy of context and how to keep it at a manageable level.

Context bloat, and the performance degradation that comes with it, is one of the biggest blockers in my daily workflow. Boris says here that he's been using the 1M context exclusively for months. My experience has been a slow but inevitable and irreversible degradation of capability as context grows. YMMV, but context management, especially long-term memory management, still seems like one of the most important unsolved problems. So instead of suffering through that degradation, let's build around it.

First impressions

My litmus test is a summarization service I built for myself. I still read papers, blog posts, and listen to podcasts, but I leave information extraction to LLMs and then review the outputs in my inbox, categorizing and tagging them accordingly.

I've been thinking about second brains for a while now, and this is the closest I've gotten to a system I'm actually happy with. A summarization job is a perfect task for subagents, and Pi's example for this is great: https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent/examples/extensions/subagent. On light coding, information retrieval, and summarization so far, I can't tell a meaningful difference between Pi and Codex or Claude Code. My context doesn't grow as much, and the output has been excellent.

In the next episode

  • Will Gemini finally work?
  • Can I get Pi to replace Claude Code even at work?