Gemini
Google's multimodal foundation model.
Gemini is the model I reach for when the input isn't just text — PDFs, images, and video frames sent in a single API call alongside a prompt, with structured output back. That native multimodal context window is what separates it from text-only pipelines where you'd otherwise need a separate vision model or OCR step.

Most of my Gemini work falls into two buckets: document extraction pipelines and Workspace integrations. For extraction — pulling structured data from invoices, contracts, or mixed-content PDFs — Gemini's ability to reason across layout, tables, and embedded images in one pass consistently outperforms OCR-then-LLM approaches I've tried. Where I still reach for Claude instead is long-form reasoning or tasks where output consistency under varied prompting matters more than input modality. Vision calls on Gemini have higher latency than a text-only request, and structured output (JSON mode) can drift on complex schemas — both things worth benchmarking before committing to a production pattern. For Google Workspace integrations, Gemini is the obvious fit: the API plugs directly into Apps Script and Workspace Add-ons, and the authentication story is simpler than routing a third-party model through a Workspace context.
Multimodal Feature Design
I design extraction and analysis features around Gemini's vision and document input — deciding what goes into the prompt context, how to structure the output schema, and where to add validation layers when the model's JSON output drifts on complex documents.
LLM Model Selection
I benchmark candidate models on representative samples of the actual task before committing to one. That means evaluating accuracy, output consistency, per-token cost at expected volume, and any data-residency or compliance constraints that rule out certain providers.
Google Workspace AI Integration
I build Workspace integrations using the Gemini API with Apps Script or Workspace Add-ons, handling OAuth scopes, service account auth, and the quota limits that surface once a Workspace integration moves beyond a single user.
Document Intelligence
Extract, classify, and analyze information from PDFs, invoices, contracts, and forms at scale using Gemini's document understanding.
Image and Video Analysis
Quality control, content moderation, and visual search features that use Gemini's vision capabilities without custom model training.
Google Workspace Automation
Summarize meeting notes, draft responses, and automate report generation using Gemini triggered from Apps Script or a Workspace Add-on sidebar.
Let's talk Gemini.
No pitch. Just a technical conversation about the problem you're working on.