Claude
Long context and strong instruction-following, where that gap actually matters.
Claude is the model I reach for when the task requires holding a lot of context coherently — multi-document reasoning, structured extraction over long inputs, or anything where instruction fidelity across a complex system prompt is load-bearing. For simpler classification or short-form tasks, lighter models are often the right call.

My Claude integrations are built around three recurring failure modes I've seen in production: prompt brittleness (output format breaks on edge-case inputs), context mismanagement (relevant information gets pushed out of the effective window), and silent degradation (quality drifts when inputs shift, nobody notices). I address these concretely: system prompts are structured with a strict role/task/output-schema separation, and output is always parsed against a Zod schema at the application boundary so malformed responses fail loudly rather than corrupting downstream state. Context budget is managed explicitly — I track what's in the window and summarize or prune older turns rather than hoping the model handles it. For prompt regressions, I maintain a golden set of input/output pairs in version control and run evals on every prompt change, treating prompt updates with the same review discipline as code changes.
Context Architecture
I design system prompts around an explicit context budget: what the model needs to know at all times goes in the system prompt, dynamic content is injected into user turns with a defined token ceiling, and anything beyond that gets summarized or windowed out. Output schemas are defined in Zod first, then described in the prompt — the schema is the contract, not the prose.
Eval Pipeline
I maintain a golden eval set alongside the prompt in the same repo. On every prompt change, I run the set through the model and diff outputs against expected responses — both exact-match where format is deterministic and LLM-graded scoring where it isn't. Cost monitoring is a separate CloudWatch dashboard tracking token counts per feature, with alerts before spend hits a configurable threshold.
Prompt Version Control
Prompts live in Git as plain text files with a structured frontmatter block (model, temperature, max tokens, schema version). Regression suites run in CI on pull requests touching prompt files. Model upgrades go through the same suite before any traffic is cut over — no assumptions about backwards compatibility.
Document Processing
Long-form document extraction is where Claude's context window earns its keep. The pattern is: load the full document into context, define a strict output schema in Zod, parse the response at the boundary. No regex, no brittle string matching — field-level validation catches extraction failures before they propagate.
Structured Assistants
Conversational features fail in production when the model drifts off-task or returns unstructured prose when structured data was expected. The fix is enforcing output format on every turn via schema validation and explicit fallback behavior when confidence signals are absent from the response.
Private Knowledge Retrieval
RAG over private corpora requires keeping embeddings and retrieval fully self-hosted when the data can't leave the network. Claude handles synthesis over retrieved chunks cleanly, but the retrieval layer — chunking strategy, embedding model, similarity threshold — is where most quality problems originate.
Let's talk Claude.
No pitch. Just a technical conversation about the problem you're working on.