The journal

Writing on AI agents and automation

Longer reads on things that are hard to cover in a tweet thread. Reproducible where possible, honest about limitations, no vendor-sponsored angles.

Agents May 2025 8 min

Tool-calling in production: where LLM agents actually break

A breakdown of failure modes we encountered running tool-calling agents on real business data — retries, hallucinated function names, and context window pressure.

James Whitfield Read →

Automation April 2025 11 min

Mapping business processes before you touch an LLM

Most automation projects fail before a single API call is made. We walk through the process audit we do before any AI integration engagement.

Clara Novak Read →

Prompt Engineering March 2025 9 min

Prompt chaining vs. single-shot: a practical comparison

When does breaking a task into a prompt chain actually outperform one carefully constructed prompt? We tested across five task categories.

Daniel Reeves Read →

Architecture February 2025 13 min

Memory architectures for long-running AI agents

How different memory approaches — short-term, episodic, and semantic — affect agent performance across multi-session tasks. A design-level overview with implementation notes.

Amara Osei Read →

Evaluation January 2025 10 min

Evaluating LLM outputs: beyond vibes-based QA

A practical introduction to structured evaluation frameworks for LLM outputs — what metrics actually matter, and how to automate quality checks at scale.

Tomasz Kowalski Read →