Documentation
Everything you need to install, configure, and extend woozcode in your development workflow.
Installation
Woozcode installs globally as a CLI tool via npm, yarn, or pnpm. Node.js 18 or later is required.
npm
$ npm install -g woozcodeyarn
$ yarn global add woozcodepnpm
$ pnpm add -g woozcodeVerify the installation:
$ wooz --version
1.0.0Setup Guide
Once installed, initialize woozcode in your project directory. This creates a local config file that you can version-control.
$ cd your-project
$ wooz init
woozcode initialized
Config written to .woozcode.json
Run `wooz start` to begin compressionConfiguration file
The generated .woozcode.json controls compression behavior:
{
"model": "gpt-4o",
"compression": {
"level": "balanced",
"stripComments": true,
"deduplicateImports": true,
"contextWindow": 8000
},
"cache": {
"enabled": false,
"ttl": 3600
}
}| Option | Values | Description |
|---|---|---|
| compression.level | light / balanced / aggressive | Controls how aggressively tokens are trimmed. |
| stripComments | true / false | Removes inline and block comments from code. |
| deduplicateImports | true / false | Collapses duplicate import statements. |
| contextWindow | number | Target token limit for compressed output. |
Concepts
Compression pipeline
Woozcode operates as a proxy layer between your code and any LLM API. When you send a request, the compression pipeline strips tokens that carry no semantic weight: duplicate imports, verbose docstrings, commented-out code, and repeated context. The resulting prompt is functionally identical but 40-60% smaller.
Local-first architecture
All processing runs in a local daemon (wooz start). No payload ever reaches a woozcode server. Your prompts, source files, and API keys remain entirely on your machine. The daemon communicates with your AI provider directly using your own credentials.
Semantic cache
When enabled, the cache stores embeddings of previous requests and returns cached responses for semantically similar queries. This can eliminate API calls entirely for repeated or near-repeated questions. Cache entries expire after the configured TTL.
API Usage
Woozcode exposes a local REST API on port 7350 when the daemon is running. You can also use the Node.js SDK directly.
SDK (Node.js)
import { Woozcode } from 'woozcode'
const wooz = new Woozcode()
const result = await wooz.compress({
prompt: longPromptString,
level: 'balanced',
})
console.log(result.compressed) // compressed string
console.log(result.tokensSaved) // number
console.log(result.ratio) // 0.61REST API
POST http://localhost:7350/v1/compress
Content-Type: application/json
{
"prompt": "...",
"level": "balanced"
}
// Response
{
"compressed": "...",
"tokensSaved": 760,
"ratio": 0.61,
"latencyMs": 2
}