Blog

Writing

Thoughts on AI tooling, token economics, and local-first development.

AllEngineeringProductInsightsRelease

How we cut token costs by 60% without changing models

A breakdown of the compression techniques we use internally, and why context pruning beats prompt engineering for cost reduction.

Why every serious development team should be running AI infrastructure on their own hardware, and what trade-offs to expect.

A deep dive into how embedding-based caching works, when it helps, and how we tune the similarity threshold for precision.

We analyzed 100 anonymized codebases to understand where token spend goes. The results were surprising.

After six months in private beta, woozcode is now available to everyone. Here is what we built and where we are going next.