How we cut token costs by 60% without changing models

When we started building woozcode, the goal was simple: reduce what developers pay to run AI without changing a single line of their workflow. No model swaps, no prompt rewriting, no cloud dependency.

The number we kept coming back to was context size. Most AI requests in development environments carry enormous amounts of redundant content: repeated imports, commented-out blocks, full file trees when only a function is relevant. That redundancy compounds directly into cost.

What we actually compress

The compression pipeline works in three stages.

**Stage 1: Structural pruning.** We parse the input as code and remove elements that carry no semantic weight in the current context. Commented-out lines, duplicate import declarations, and empty blocks go first. This alone accounts for 20-30% of the reduction in typical TypeScript projects.

**Stage 2: Context windowing.** Long files get trimmed to the relevant scope. If the query concerns a specific function, we keep that function, its direct dependencies, and a narrow surrounding window. The rest is summarized with a token-efficient placeholder.

**Stage 3: Deduplication.** In multi-turn conversations, the same boilerplate often gets resent with every message. We diff against the previous turn and strip unchanged segments, replacing them with a compact reference.

Combined, these stages produce a 40-60% reduction on typical developer prompts.

What we do not touch

We never compress the user's explicit instruction. The directive stays verbatim. We also preserve anything the model would need to reason structurally, such as type signatures, interface definitions, and error messages.

Results in the wild

Across our beta cohort, median token reduction was 54%. The range was wide: projects with lots of documentation comments saw 68% reductions; lean, well-factored codebases with few comments saw around 35%. Either way, the bill went down.

The one consistent surprise: teams using woozcode with GPT-4o reported no measurable quality degradation at balanced compression. At aggressive compression, about 8% of teams reported occasional context gaps, which they resolved by switching specific tasks back to balanced mode.

What comes next

We are working on a smarter context selection model that learns which parts of your codebase are relevant to which kinds of tasks. Early results on our internal dataset are promising. Expect a post when we have more to share.